Google’s AI security strategy

While AI is an unprecedented moment for science and innovation, bad actors see it as an unprecedented attack tool. Cybercriminals, scammers and state-backed attackers are already exploring ways to use AI to harm people and compromise systems around the world. From faster attacks to sophisticated social engineering, AI provides cybercriminals with potent new tools.

We believe that not only can these threats be countered, but also that AI can be a game-changing tool for cyber defense, and one that creates a new, decisive advantage for cyber defenders. That’s why today we’re sharing some of the new ways we’re tipping the scales in favor of AI for good. This includes the announcement of CodeMender, a new AI-powered agent that improves code security, automatically. We’re also announcing our new AI Vulnerability Reward Program; and the Secure AI Framework 2.0 and risk map, which brings two proven security approaches into the cutting edge of the AI era. Our focus is on secure by design AI agents, furthering the work of CoSAI principles, and leveraging AI to find and fix vulnerabilities before attackers can.

Autonomous defense: CodeMender

At Google, we build our systems to be secure by design, from the start. Our AI-based efforts like BigSleep and OSS-Fuzz have demonstrated AI’s ability to find new zero-day vulnerabilities in well-tested, widely used software. As we achieve more breakthroughs in AI-powered vulnerability discovery, it will become increasingly difficult for humans alone to keep up. We developed CodeMender to help tackle this. CodeMender is an AI-powered agent utilizing the advanced reasoning capabilities of our Gemini models to automatically fix critical code vulnerabilities. CodeMender scales security, accelerating time-to-patch across the open-source landscape. It represents a major leap in proactive AI-powered defense, including features like:

Root cause analysis: Uses Gemini to employ sophisticated methods, including fuzzing and theorem provers, to precisely identify the fundamental cause of a vulnerability, not just its surface symptoms.
Self-validated patching: Autonomously generates and applies effective code patches. These patches are then routed to specialized “critique” agents, which act as automated peer reviewers, rigorously validating the patch for correctness, security implications and adherence to code standards before it’s proposed for final human sign-off.

Doubling down on research: AI Vulnerability Reward Program (AI VRP)

The global security research community is an indispensable partner, and our VRPs have already paid out over $430,000 for AI-related issues. To further expand this collaboration, we are launching a dedicated AI VRP that clarifies which AI-related issues are in scope through a single, comprehensive set of rules and reward tables. This simplifies the reporting process and maximizes researcher incentive for finding and reporting high-impact flaws. Here’s what’s new about the AI VRP:

Unified abuse and security reward tables: AI-related issues previously covered by Google’s Abuse VRP have been moved to the new AI VRP, providing additional clarity as to which abuse-related issues are in-scope for the program.
The right reporting mechanism: We clarify that content-based safety concerns should be reported via the in-product feedback mechanism as it captures the necessary detailed metadata — like user context and model version — that our AI Safety teams need to diagnose the model’s behavior and implement the necessary long-term, model-wide safety training.

Securing AI agents

We’re expanding our Secure AI Framework to SAIF 2.0 to address the rapidly emerging risks posed by autonomous AI agents. SAIF 2.0 extends our proven AI security framework with new guidance on agent security risks and controls to mitigate them. It is supported by three new elements:

Agent risk map to help practitioners map agentic threats across the full-stack view of AI risks.
Security capabilities rolling out across Google agents to ensure they are secure by design and apply our three core principles: agents must have well-defined human controllers, their powers must be carefully limited, and their actions and planning must be observable.
Donation of SAIF’s risk map data to the Coalition for Secure AI Risk Map initiative to advance AI security across the industry.

Going forward: putting proactive AI tools to work with public and private partners

Our AI security work extends beyond mitigating new AI-related threats, our ambition is to use AI to make the world safer. As governments and civil society leaders look to AI to counter the growing threat from cybercriminals, scammers, and state-backed attackers, we’re committed to leading the way. That’s why we’ve shared our methods for building secure AI agents, partnered with agencies like DARPA, and played a leading role in industry alliances like the Coalition for Secure AI (CoSAI).

Our commitment to using AI to fundamentally tip the balance of cybersecurity in favor of defenders is a long-term, enduring effort to do what it takes to secure the cutting edge of technology. We are upholding this commitment by launching CodeMender for autonomous defense, strategically partnering with the global research community through the AI VRP, and expanding our industry framework with SAIF 2.0 to secure AI agents. With these and more initiatives to come, we’re making sure the power of AI remains a decisive advantage for security and safety.

Source link