Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures
Last updated: 18 Jul 2024
Development Stage
Pre-draft
Draft
Published
Scope
This document provides a data quality model, data quality measures, and guidance on reporting data quality in the context of analytics and machine learning (ML). This document builds on ISO 8000 series, ISO/IEC 25012 and ISO/IEC 25024.
The aim of this document is to enable organizations to achieve their data quality objectives and is applicable to all types of organizations. ©ISO/IEC 2022. All rights reserved.
Purpose
Data quality is necessary for ML and Big Data systems to be safe, reliable, and interoperable. ISO/IEC 20547-3 states that “Data quality management is essential to big data systems, as poor data quality such as incomplete, false or outdated data can disable effective data mining processes, prevent useful findings or lead to wrong output”. Moreover, ISO/IEC TR 24028 addresses challenges of data quality in AI systems based on machine learning, including bias in the data used to train the AI system, data poisoning, and adversarial attacks. Organizations can use this document to select and implement the data quality measures that meet their requirements in data analytics and ML.
This standard defines data quality characteristics for analytics and ML upon data, especially big data, quality measurements related to the data quality characteristics, and guidelines to evaluate and report data qualities in the data workflow of analytics and ML. ©ISO/IEC 2022. All rights reserved.