I was so ready to dunk on yet another data quality definition, but I think this is the first one I have seen that talks about fitness for purpose — and doesn’t drone on about the number of nulls.
Fun pet peeves with data quality:
- masked missing values, where instead of a null you have some plausible default value, is so much worse than nulls
- imputation of missing values aimed at one use case can make the data unusable for other purposes
I think it is weird that the discussions on data quality rarely mention the original purpose of the data or understanding the process that generates it.
I was so ready to dunk on yet another data quality definition, but I think this is the first one I have seen that talks about fitness for purpose — and doesn’t drone on about the number of nulls.
Fun pet peeves with data quality:
- masked missing values, where instead of a null you have some plausible default value, is so much worse than nulls
- imputation of missing values aimed at one use case can make the data unusable for other purposes
I think it is weird that the discussions on data quality rarely mention the original purpose of the data or understanding the process that generates it.