AI Safety

11 articles in this category

AI NewsAI SafetyEngineering Security

Nine Seconds to Zero: Why AI Agents Need a Destructive-Action Proxy

An AI coding agent deleted a company's entire production database and backups in nine seconds via a single Railway API call, revealing critical agent safety flaws.

Apr 28, 2026

AI NewsAI SafetyCybersecurity

Addressing the Risks of AI Agent Non-Compliance and Human-Centric RLHF Sycophancy

Developer Achin Bansal identifies AI agents circumventing task constraints, highlighting safety risks linked to Anthropic's RLHF sycophancy research.

Apr 24, 2026

AI NewsGrantsAI Safety

OpenAI Launches €500,000 EMEA Youth & Wellbeing Grant

OpenAI announces a €500,000 grant program to fund initiatives improving youth safety and wellbeing in the age of AI across Europe, the Middle East, and Africa.

Jan 28, 2026

AI NewsApplication SecurityAI Safety

How CyberArk Protects AI Agents with Instruction Detectors and History-Aware Validation

CyberArk’s approach to AI agent security utilizes instruction detection and history-aware validation, blocking 99% of malicious inputs.

Jan 20, 2026

AI NewsAI SafetyMachine Learning

Gemma Scope 2: New Tools for LLM Interpretability

Google DeepMind releases Gemma Scope 2, an open suite of interpretability tools for the Gemma 3 family, built on 110 Petabytes of data.

Dec 19, 2025

AI NewsAI SafetyGovernment

Deepening AI Safety Research with UK AI Security Institute (AISI)

Google DeepMind and the UK AISI formalized a research partnership to address AI safety, focusing on monitoring reasoning and ethical implications.

Dec 11, 2025

AI NewsGovernmentAI Safety

DeepMind Deepens UK Government Partnership to Accelerate AI Innovation

DeepMind and the UK government are expanding their collaboration, aiming to accelerate progress in science, education, and national security with AI, demonstrated by a 5.5 percentage point increase in student problem-solving.

Dec 10, 2025

AI NewsAI SafetyNVIDIA

Custom Policy Enforcement with Reasoning: Faster, Safer AI Applications

NVIDIA’s Nemotron Content Safety Reasoning achieves 40% faster policy enforcement with dynamic, context-aware AI safety.

Dec 2, 2025

AI NewsAI SafetyCorporate Accountability

AI's Deadly Silence: How Corporate Negligence Enabled Tragedies

Two AI-related suicides exposed 377 flagged crisis messages and corporate inaction. A solo developer built a 90.9% accurate crisis detection system in 3 weeks.

Dec 1, 2025

AI NewsAI SafetyDigital Literacy

The Gen Z Privilege And The Blind Spot in AI Era

A shift in perspective reveals the critical need to bridge the 'Cognitive Gap' as AI scams increasingly target those unfamiliar with the technology.

Nov 25, 2025

AI NewsAI SafetyAgent AI

AI Agents Fail Manipulation Tests in Microsoft's Magentic Marketplace Simulation

Microsoft's Magentic Marketplace reveals significant vulnerabilities in LLM-based agents to manipulation, with GPT-4o fully redirected by prompt injection attacks.

Nov 20, 2025