An international alliance of the world’s leading cybersecurity agencies has joined together to release a set of guidelines that they claim will help to counter the increasing risk to data integrity posed by artificial intelligence (AI) systems.
The advisory, which was collectively issued by agencies such as India’s CERT-In, the US Cybersecurity and Infrastructure Security Agency (CISA), the UK’s National Cyber Security Centre (NCSC), and Australia’s Australian Cyber Security Centre (ACSC), highlights the need to protect data used for AI underpinnings so as to make AI outcomes reliable and trustworthy.
The recommendations emphasise that the correctness and robustness of decisions being made by AI depend on the quality and safety of data, while the safety and effectiveness of AI are interconnected, with both increasing as input data quality improves.
The potential impact of corrupted data integrity has gained significance as AI becomes more and more pervasive across domains such as critical infrastructure and governance.
The advisory identifies three distinct risk areas:
Data Supply Chain Risks: The report heavy handedly recommends organizations to thoroughly screen their vendors and put in place sound provenance tracking.
It will also log the origin and flow of data through the AI system lifecycle, such as to identify tampering and maintain accountability. Append Only Provenance Databases are recommended to limit unauthorized data tampering.
Maliciously Modified (Poisoned) Data: The agencies caution that bad actors might poison training data with corrupt and/or biased examples to induce similarly flawed behavior by AI models.
To address this, the recommendations recommend deploying digital signatures to verify trusted data versions and ongoing quality verification of the data along the AI lifecycle.
Data Drift: The advisory goes on to consider the more subtle danger of data drift—when the statistical properties of data change over time, causing models to lose accuracy and becoming prone to bias. Continuous model updates with new, typical data, and statistics to identify drift, are key preventive measures.
The newly issued guidelines offer non-binding guidance for companies designing and implementing AI systems, such as:
Encryption of Data: Strong encryption of data at rest, in transit and while in processing can help to maintain confidentiality and integrity.
Digital Signatures: Using quantum-secure digital signatures to authenticate and verify data sets that are both trained and updated by AI models.
Data Lineage: Having transparent, auditable paths of where data comes from and how it is transformed.
Safe Storage: The use of approved storage systems that comply with very high cryptological standards.
Secure Infrastructure: Using secure computing environments, with Zero Trust design, to secure the processing of data.
Data Classification and Access: Classify data by sensitivity and enact stringent access controls to prevent exposure.
Dynamic Risk Assessments: Assessing and updating controls as and when new threats emerge.
Cybersecurity experts stress that these are not recommendations so much as a checklist that building the resilience of trust of an AI ecosystem requires. Neglecting data integrity risks can result in faulty AI outputs, breaches in security, operational shutdowns and loss of public confidence in AI systems.
Companies are encouraged to make the following a priority for operationalizing these guidelines, so that they can protect their investment in AI and advance AI responsibly.