ISO 27001 Annex A 8.33 for AI Companies: Secure Test Information

ISO 27001 Annex A 8.33 is a security control that governs the protection of information used for testing, ensuring that the secure selection and management of test data acts as a primary implementation requirement to provide the business benefit of preventing production data leaks during the development lifecycle.

Artificial intelligence companies operate at a scale that most auditors don’t understand. You are dealing with massive, sensitive datasets required for training and testing complex models. This data, which ranges from proprietary code to personal customer information, is your most valuable asset and your biggest liability. In a fast-paced AI environment, the line between development and production often blurs, creating massive security holes.

ISO 27001 Annex A 8.33, “Test Information,” is the framework for managing the data in your non-production environments. It establishes the rules for how data used to test systems, applications, and algorithms is selected, secured, and managed. The goal is simple: prevent leaks, breaches, and the misuse of data before it ever reaches the live environment.

This guide breaks down the requirements of Annex A 8.33 for AI companies. We provide a practical path to compliance that turns security into a competitive advantage, proving to your enterprise customers that you can be trusted with their data.

The “No-BS” Translation: Decoding the Requirement

The Official ISO Text: “Test information should be appropriately selected, protected and managed.”

The Auditor’s View

I want to see a documented process for how you choose test data. I need proof that you aren’t just cloning your production database because it’s easier. I will look for evidence of authorisation, access controls, and a clear deletion record once the testing or training phase is over.

The AI Company View

Stop using real customer PII to fine-tune models on a local MacBook. If you are using AWS S3 buckets for training data, they need the same level of lockdown as your production environment. If you are using Slack or Jira to move data snippets around for troubleshooting, you are doing it wrong. Use synthetic data where possible, and if you must use the real stuff, it needs to be masked, encrypted, and deleted the second the sprint is over.

DO IT YOURSELF ISO 27001

All the templates, tools, support and knowledge you need to do it yourself.

The Ultimate ISO 27001 Toolkit

The Business Case: Why This Matters for AI Scaling

The Sales Angle

When you try to close a deal with a Tier-1 bank or a healthcare provider, their Security Questionnaire will grill you on data isolation. If you admit that developers use live customer data for testing machine learning models, the deal is dead. Demonstrating strict Annex A 8.33 compliance allows you to answer “Yes” to data protection questions, shortening your sales cycle by months.

The Risk Angle

The “Nightmare Scenario” for an AI company isn’t just a data leak: it is a model inversion attack or data poisoning. If your test environment is weak, an attacker can exfiltrate the training sets that cost you millions to curate. If you lose your intellectual property or leak PII from a “dev” server, your reputation and your bank account will take a hit you might not recover from.

Relevance to DORA, NIS2, and the EU AI Act

Compliance isn’t just about ISO 27001 anymore. For AI companies operating in or with the UK and EU, the regulatory landscape is tightening:

EU AI Act: This requires high-quality datasets and strict data governance. Annex A 8.33 provides the foundation for the “data governance” requirements, ensuring training and testing data is handled with integrity.
DORA (Digital Operational Resilience Act): If you provide AI services to financial institutions, DORA mandates the strict separation of ICT environments. Annex A 8.33 is the primary control used to prove this separation.
NIS2: This focuses on supply chain security. Your customers will use NIS2 as a reason to audit how you handle the data they give you for “testing” or “integration.”

Why the ISO 27001 Toolkit Beats SaaS GRC Platforms

GRC platforms want to turn security into a “click-the-box” exercise. For a high-growth AI company, that is a trap. Here is why a document-led approach using the ISO 27001 Toolkit is superior:

Feature	The ISO 27001 Toolkit	Online SaaS GRC Platform
Ownership	You keep your files forever. You own your IP.	You rent your compliance. If you stop paying, your data is gone.
Simplicity	Uses Word and Excel. Everyone already knows how to use them.	Requires hours of training for a complex, proprietary UI.
Cost	One-off fee. No hidden costs.	Expensive monthly subscriptions that never end.
Freedom	No vendor lock-in. Move your docs anywhere.	Trapped in their ecosystem. Impossible to export meaningfully.

I’ve sat in the Auditor’s chair for 30 years. Use the exact system and tools I use to guarantee a pass.

Learn More

Stuart Barker - High Table - ISO27001 Director

Top 3 Non-Conformities When Using SaaS GRC

I often see AI companies fail audits because they relied on a SaaS platform to “automate” Annex A 8.33. Here are the common traps:

The “Ghost Control” Error: The SaaS platform marks the control as “Compliant” because a policy exists, but the auditor finds developers are using live API keys in their local test scripts. The platform doesn’t see what’s actually happening in your code.
Automated Evidence Gaps: SaaS tools often pull “screenshots” of settings, but they miss the context. An auditor doesn’t care that an S3 bucket is private: they care about what is in the bucket and who authorised it being there.
The “Set and Forget” Trap: Companies trust the dashboard’s green lights. When I ask for a record of the last three times test data was securely deleted, the team scrambles because the SaaS platform didn’t prompt them to actually do the work.

The Tools We Use.

100% Audit Success. Zero AI Guesswork.

Get the Auditor-Proven Toolkit.

The Evidence Locker: What the Auditor Needs to See

Do not wait for audit week to find these. You need to have these ready in a simple folder structure:

Configurations: Screenshots of data masking configurations in your ETL pipelines or MLOps environment.
Logs: CSV exports from AWS CloudTrail or GCP Audit Logs showing who accessed the “Test” data buckets over the last 6 months.
Tickets: At least three examples of Jira or Linear tickets where a developer requested a data subset and the CTO/CISO approved it.
Population Lists: A list of every developer who has access to the testing environment, mapped against your HR “Active Employee” list.

Handling Exceptions: The “Break Glass” Protocol

In AI, production breaks. Sometimes you need to debug using real data right now. You must have an emergency path:

The Emergency Path: If you must use production data, it requires a “Break Glass” ticket. This must be approved by the CTO.
The Paper Trail: The ticket must state exactly why synthetic data wasn’t enough, what data was used, and for how long.
Time Limits: Access is granted for a fixed window, such as 4 hours. Once the time is up, access is revoked, and a “Post-Incident Review” ticket is created to prove the data was wiped from the dev environment.

The Process Layer: SOP for AI Teams

Your Standard Operating Procedure (SOP) should follow this lifecycle:

Request: Developer submits a ticket in Linear/Jira for test data.
Selection: Data is selected. The default is synthetic data. If PII is needed, a risk assessment is attached.
Provisioning: Data is masked or tokenised via an automated script before being moved to the test VPC.
Maintenance: Access is restricted via Role-Based Access Control (RBAC). No permanent access.
Revocation & Deletion: Once the sprint or training run ends, the environment is torn down. A log entry is created confirming the deletion.

Annex A 8.33 FAQs for AI Companies

Can I use live customer data to train my models?

No, you should not use live customer data for model training unless it is a last resort and synthetic data alternatives have been proven technically insufficient. To maintain 100% ISO 27001 compliance, any live data must be fully anonymised so it no longer constitutes “personal data” under UK GDPR. Approximately 65% of major audit non-conformities result from unauthorised PII in training sets.

Does this control apply to my offshore data labelling team?

Yes, Annex A 8.33 applies to offshore data labelling teams as they are considered third-party handlers of your test information. You must provide 100% objective evidence that these teams follow your internal security rules and that your contracts include a “Right to Audit” clause. Unmanaged third-party data transfers account for roughly 30% of supply chain vulnerabilities in the AI sector.

What is the difference between data masking and tokenisation?

The core difference is that data masking modifies original values (e.g. changing “Steve” to “Bob”), while tokenisation replaces the data with a random, non-mathematical string. For AI companies, tokenisation is approximately 50% more effective at preventing re-identification during model training because it completely breaks the reversible link between the surrogate and the sensitive source data.

What is ISO 27001 Annex A 8.33 for AI companies?

ISO 27001 Annex A 8.33 requires AI companies to protect information used for testing, focusing on the integrity of training datasets and model parameters. For AI firms, this ensures 100% of data used during the machine learning lifecycle is selected and controlled to prevent accidental disclosure or unauthorised modification of intellectual property.

How does Annex A 8.33 align with the EU AI Act?

Annex A 8.33 provides the technical governance framework needed to satisfy the data quality and protection mandates in Article 10 of the EU AI Act. Documenting your test data selection process ensures 100% traceability for high-risk AI systems. This alignment can reduce regulatory documentation overhead by approximately 40% through unified data handling procedures.

About the author

Stuart Barker

ISO 27001 Ninja

Stuart Barker is a veteran practitioner with over 30 years of experience in systems security and risk management. Holding an MSc in Software and Systems Security, he combines academic rigor with extensive operational experience, including a decade leading Data Governance for General Electric (GE).

As a qualified ISO 27001 Lead Auditor, Stuart possesses distinct insight into the specific evidence standards required by certification bodies. His toolkits represent an auditor-verified methodology designed to minimise operational friction while guaranteeing compliance.

ISO 27001:2022 Annex A 8.33 Test Information for AI Companies