Agents

Autonomous agents, tool use, and agentic workflows

99 articles

AI & ML

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he

Adobe’s conversational AI agent is a mediocre design intern

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

Orchestrating AI code review at scale

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

Differentiable Belief-based Opponent Shaping

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

PRO-CUA: Process-Reward Optimization for Computer Use Agents

Governing Technical Debt in Agentic AI Systems

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

SQLite Does Not Accept Agentic Code

Okta reports Q1 revenue up 11% YoY to $765M, vs. $752M est., says the agentic AI build-out is spiking demand for its identity tools; OKTA jumps 7%+ after hours

Asana acquires no-code agent-builder Stack AI

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

The OpenClaw crisis is the most complete case study of agentic AI security failure. Here's the full timeline and technical breakdown.

I gave my AI agents email instead of better reasoning. They started fixing each other's bugs.

Adding agentic AI to an existing search app without replacing anything

Protestware for coding agents

Agentic Search for Context Engineering

Show HN: Ktx – Open-source executable context layer for data agents

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

Saris, which builds AI agents to automate back-office work for banks and credit unions, raised a $28.8M Series A led by 8VC

Visa invests in Replit to power agentic payments for developers

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

London-based Geordie AI, which builds a security and governance platform for AI agents, raised a $30M Series A led by Balderton at an estimated $180M valuation

revfactory/harness — A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the

anthropics/skills — Public repository for Agent Skills

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

Voluntary Collusion with Secret Tools in Competing LLM Agents

Laguna M.1/XS.2 Technical Report

Reasoning and Planning with Dynamically Changing Norms

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

A Policy-Driven Runtime Layer for Agentic LLM Serving

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

A Query Engine for the Agents

I used autoresearch to improve my AGENTS.md, measured against real tasks

Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est.

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Evolving Webflow for the Agentic Web

NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation

Why AI Agents Cannot Change Software Systems

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Robinhood now lets your AI agents trade stocks

Sidekick: keep using neovim while a dozen agents rewrite your code

Demis Hassabis says he still broadly expects AGI around 2030, though he now sees 2029 as a possibility, and 2026's “agentic era” is a “bit like a practice run”

Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code

How AI is starting to dismantle the hegemony of the Big Four consultancies and other large firms, as AI agents help smaller consultancies handle big workloads

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Experiments in Agentic AI for Science

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

JobBench: Aligning Agent Work With Human Will

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

Agent Memory: An Anatomy

DeepSWE: A contamination-free benchmark for long-horizon coding agents

Agent Trace RFC

Millions of AI agents imperiled by critical vulnerability in open source package

FBI agent explains how easy it is to ID people posting AI porn without consent

Sources: Qualcomm reached a deal with ByteDance to supply millions of ASICs for AI data centers to support AI agents in the Doubao chatbot; QCOM jumps 5%+

Rethinking organizational design in the age of agentic AI

Mixture of Complementary Agents for Robust LLM Ensemble

Not All Transitions Matter: Evidence from PPO

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

ai & ml

2 min read★★★☆☆

Read Breakdown →

Agents

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot

CAPTCHAs can still detect AI agents

Even (very) noisy LLM evaluators are useful for improving AI agents

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

Molecular Lead Optimization via Agentic Tool Planning

Self-Play Reinforcement Learning under Imperfect Information in Big 2

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he

Adobe’s conversational AI agent is a mediocre design intern

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

Orchestrating AI code review at scale

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

Differentiable Belief-based Opponent Shaping

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

PRO-CUA: Process-Reward Optimization for Computer Use Agents

Governing Technical Debt in Agentic AI Systems

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

SQLite Does Not Accept Agentic Code

Okta reports Q1 revenue up 11% YoY to $765M, vs. $752M est., says the agentic AI build-out is spiking demand for its identity tools; OKTA jumps 7%+ after hours

Asana acquires no-code agent-builder Stack AI

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

The OpenClaw crisis is the most complete case study of agentic AI security failure. Here's the full timeline and technical breakdown.

I gave my AI agents email instead of better reasoning. They started fixing each other's bugs.

Adding agentic AI to an existing search app without replacing anything

Protestware for coding agents

Agentic Search for Context Engineering

Show HN: Ktx – Open-source executable context layer for data agents

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

Saris, which builds AI agents to automate back-office work for banks and credit unions, raised a $28.8M Series A led by 8VC

Visa invests in Replit to power agentic payments for developers

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

London-based Geordie AI, which builds a security and governance platform for AI agents, raised a $30M Series A led by Balderton at an estimated $180M valuation

revfactory/harness — A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the

anthropics/skills — Public repository for Agent Skills

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

Voluntary Collusion with Secret Tools in Competing LLM Agents

Laguna M.1/XS.2 Technical Report

Reasoning and Planning with Dynamically Changing Norms

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

A Policy-Driven Runtime Layer for Agentic LLM Serving

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

A Query Engine for the Agents

I used autoresearch to improve my AGENTS.md, measured against real tasks

Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est.

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Evolving Webflow for the Agentic Web

NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation

Why AI Agents Cannot Change Software Systems

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Robinhood now lets your AI agents trade stocks

Sidekick: keep using neovim while a dozen agents rewrite your code

Demis Hassabis says he still broadly expects AGI around 2030, though he now sees 2029 as a possibility, and 2026's “agentic era” is a “bit like a practice run”

Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code

How AI is starting to dismantle the hegemony of the Big Four consultancies and other large firms, as AI agents help smaller consultancies handle big workloads

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Experiments in Agentic AI for Science

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation