A wee thought on machine learning and data provenance.


This morning on twitter I read a rather amusing story.

This would go someway, but not all the way, to explain this. (BBC Voice recognition comedy).


In all seriousness though. If you are looking at using machine learning, / AI from an HRTECH vendor, you need to ask a lot more questions about the provenance of the data. How did they get the data exactly, how have they cleaned it, what assumptions have they made about data quality, how will the data be augmented over time. Oh and do mention our friend GDPR.