AI agents hacked McKinsey's chatbot in two hours

The real story is what this means for AI security.

AI agents hacked McKinsey's chatbot in two hours
Photo Credit: Code Wall.

A cybersecurity firm says its AI tool successfully hacked McKinsey's AI chatbot platform in two hours flat. Here's what happened, and why it matters.

AI on the offensive

Researchers at cybersecurity startup CodeWall pointed its AI agents at McKinsey's in-house chatbot and successfully gained access to internal messages and files. Specifically, they claimed to have achieved full read and write access to the entire production database within two hours of starting.

The scale of what was exposed is sobering: some 46.5 million chat messages, 728,000 files with potentially confidential client data, and 57,000 user accounts. For a firm whose business runs on discretion and client trust, this is not good at all.

McKinsey was chosen for its public responsible disclosure policy. The management consulting firm has since told The Register that all issues were fixed "within hours" of being notified.

How they got in

How did CodeWall breach the system? The AI agents first found publicly exposed API documentation, which included information about 22 endpoints that didn't require authentication.

One of these endpoints wrote user search queries, and the AI agent discovered that its fields were concatenated into SQL and vulnerable to injection in a way that would not be flagged by conventional cybersecurity tools. From there, the path to the database was open.

The compromise of the database meant that all files and details uploaded by employees for use with the AI chatbot were accessible. The database also contained 65 system prompts used to guide McKinsey's chatbot, and its design meant that an attacker could have invisibly modified the chatbot's output without anyone knowing.

The bigger picture

The story here is less about AI hacking an AI bot, and more about how AI was used to systematically break into a server containing a rich repository of confidential data. What would have taken a human penetration tester days or weeks to map out and exploit, AI agents achieved in two hours.

We are only starting to witness the impact of AI-powered tools in the hands of cyber attackers. If a well-resourced firm like McKinsey can be compromised this quickly, the implications for organisations with fewer resources and less mature security practices are deeply concerning.