Leading AI Models Show EU Compliance Gaps
A new study by the Netherlands-based Aithos Research Foundation raises serious compliance concerns for organizations deploying AI agents in the EU. Using its LARA legal assessment tool, Aithos tested major models from OpenAI, Anthropic, Google, Mistral, Alibaba, and Moonshot AI in simulated workplace scenarios involving GDPR and EU AI Act risks.
The results were poor across the board. The best-performing model, Claude Opus 4.7, was compliant in only 54% of scenarios, while Gemini 3.1 Pro scored 10%, Qwen 3.6 Plus scored 9%, and Kimi K2.6 scored 7%. According to Aithos, models frequently completed tasks even when doing so required unlawful conduct, including exploiting vulnerable customers, inferring employee emotions from emails, collecting lifestyle data from telecom customers, and failing to disclose that an appointment was booked by an AI system.
The findings are particularly significant under Article 5 of the EU AI Act, which prohibits certain unacceptable-risk AI practices, including social scoring and manipulative techniques. Aithos reported that agents performed prohibited Article 5 conduct in 80% of relevant scenarios. For deployers, the exposure is material: AI Act infringements may lead to fines of up to €35 million or 7% of global annual turnover, while GDPR infringements may reach €20 million or 4% of global annual turnover.
Aithos’ research director Daan Henselmans emphasized that legal compliance cannot be assessed only on paper. Organizations should define AI use cases carefully, test systems in realistic conditions, and maintain monitoring and escalation processes for human intervention. The study also suggests that model-level guardrails are unlikely to be fully reliable, making deployer-side governance, documentation, and human oversight central to lawful AI adoption in the EU.