Search

Showing top 132 results for "AI safety defenses"

All sources blog.google 11 theverge.com 10 techcrunch.com 10 wired.com 9 anthropic.com 9 huggingface.co 7 theregister.com 5 spectrum.ieee.org 5 techpowerup.com 4 arstechnica.com 4 restofworld.org 4 xda-developers.com 3

Videos

Anthropic just wrote itself a safety loophole

“Safety first” was the mantra that made Anthropic unique among its big AI competitors. …

Feb 25, 2026 · By Ben Patterson

Paper page - One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

… The following papers were recommended by the Semantic Scholar API SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics 2026 ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming 2026 Transient Turn Injection: Exposing Stateless Multi-… …

May 13, 2026

Google expands Gemini DoD partnership with Gem-like agents for unclassified projects

… This comes as the DoD/DoW recently came into partnership with OpenAI, ousting Anthropic due to concerns over red-line safety measures for citizens. …

Mar 10, 2026 · Andrew Romero

Paper page - MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

… To demonstrate its reconfigurability, we apply MASCing to two different safety objectives and observe consistent gains with negligible overhead across seven open-source MoE models. …

May 4, 2026

Discussions and forums

r/netsec · u/unknownhad · May 10, 2026

The compression of the exploit timeline: Why n-day gaps and 90-day embargoes are failing in practice.

The traditional vulnerability disclosure timeline relies on a fundamental assumption: exploit development and vulnerability discovery take time. Over the last 12 months the integration of LLMs into offensive tooling has …

r/Android · u/MishaalRahman · 4w ago

New features, emojis, & security improvements: Here’s everything new coming to Android!

Hi Reddit, We just wrapped up The Android Show | I/O Edition, and a core theme of the show was how we’re making your phone more helpful so that you can spend less time looking at it and more time living your life. To mak…

Gemini is stopping harmful ads before people ever see them

… Read the 2025 Ads Safety Report to learn how we're stopping threats and supporting businesses. Summaries were generated by Google AI. Generative AI is experimental. Bullet points "Gemini is stopping harmful ads before people ever see them" – this article explains how. …

Apr 16, 2026 · Keerat Sharma

Google Workspace’s continuous approach to mitigating indirect prompt injections

… Deterministic Defenses Deterministic defenses , including user confirmation, URL sanitization, and tool chaining policies, are designed for rapid response against new or emerging prompt injection attacks by relying on simple configuration updates. …

Apr 2, 2026 · Adam Gavish

Apple Provides Update on App Store, Highlights Key 2025 Safety Stats

… "Apple's Trust and Safety teams integrate AI throughout the entire moderation process to detect spam, offensive content, and inauthentic reviews at scale," the company explained. …

May 20, 2026 · Joe Rossignol

In the Wake of Anthropic’s Mythos, OpenAI Has a New Cybersecurity Model—and Strategy

… Over the long term, to ensure the ongoing sufficiency of AI safety in cybersecurity, we also expect the need for more expansive defenses for future models, whose capabilities will rapidly exceed even the best purpose-built models of today.” The company says that it has homed in on three pillars for… …

Apr 14, 2026 · Lily Hay Newman

White House reportedly considers mandatory government vetting of AI models before release — executive order under discussion

… Just this Monday, Dean Ball, a former Trump administration AI adviser, and Ben Buchanan, a former Biden White House AI adviser, co-authored a New York Times op-ed calling on Congress to mandate third-party audits of AI developers' safety claims. …

May 7, 2026 · Luke James

Google and Pentagon reportedly agree on deal for ‘any lawful’ use of AI

… The deal also requires Google to assist with making adjustments to its AI safety settings and filters at the government’s request. “We are proud to be part of a broad consortium of leading AI labs and technology and cloud companies providing AI services and infrastructure in support of national sec…

Apr 28, 2026 · Jess Weatherbed

Followed topics