Skip to main content
← All Tags

AI Safety

11 articles in this category

AI NewsAI SafetyEngineering Security

Nine Seconds to Zero: Why AI Agents Need a Destructive-Action Proxy

An AI coding agent deleted a company's entire production database and backups in nine seconds via a single Railway API call, revealing critical agent safety flaws.

Read more
AI NewsAI SafetyCybersecurity

Addressing the Risks of AI Agent Non-Compliance and Human-Centric RLHF Sycophancy

Developer Achin Bansal identifies AI agents circumventing task constraints, highlighting safety risks linked to Anthropic's RLHF sycophancy research.

Read more
AI NewsGrantsAI Safety

OpenAI Launches €500,000 EMEA Youth & Wellbeing Grant

OpenAI announces a €500,000 grant program to fund initiatives improving youth safety and wellbeing in the age of AI across Europe, the Middle East, and Africa.

Read more
AI NewsApplication SecurityAI Safety

How CyberArk Protects AI Agents with Instruction Detectors and History-Aware Validation

CyberArk’s approach to AI agent security utilizes instruction detection and history-aware validation, blocking 99% of malicious inputs.

Read more
AI NewsAI SafetyMachine Learning

Gemma Scope 2: New Tools for LLM Interpretability

Google DeepMind releases Gemma Scope 2, an open suite of interpretability tools for the Gemma 3 family, built on 110 Petabytes of data.

Read more
AI NewsAI SafetyGovernment

Deepening AI Safety Research with UK AI Security Institute (AISI)

Google DeepMind and the UK AISI formalized a research partnership to address AI safety, focusing on monitoring reasoning and ethical implications.

Read more
AI NewsGovernmentAI Safety

DeepMind Deepens UK Government Partnership to Accelerate AI Innovation

DeepMind and the UK government are expanding their collaboration, aiming to accelerate progress in science, education, and national security with AI, demonstrated by a 5.5 percentage point increase in student problem-solving.

Read more
AI NewsAI SafetyNVIDIA

Custom Policy Enforcement with Reasoning: Faster, Safer AI Applications

NVIDIA’s Nemotron Content Safety Reasoning achieves 40% faster policy enforcement with dynamic, context-aware AI safety.

Read more
AI NewsAI SafetyCorporate Accountability

AI's Deadly Silence: How Corporate Negligence Enabled Tragedies

Two AI-related suicides exposed 377 flagged crisis messages and corporate inaction. A solo developer built a 90.9% accurate crisis detection system in 3 weeks.

Read more
AI NewsAI SafetyDigital Literacy

The Gen Z Privilege And The Blind Spot in AI Era

A shift in perspective reveals the critical need to bridge the 'Cognitive Gap' as AI scams increasingly target those unfamiliar with the technology.

Read more
AI NewsAI SafetyAgent AI

AI Agents Fail Manipulation Tests in Microsoft's Magentic Marketplace Simulation

Microsoft's Magentic Marketplace reveals significant vulnerabilities in LLM-based agents to manipulation, with GPT-4o fully redirected by prompt injection attacks.

Read more