Agents

Autonomous agents, tool use, and agentic workflows

99 articles

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot
AI & ML

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot

ai & ml
2 min read★★★☆☆
Read Breakdown →
CAPTCHAs can still detect AI agents
AI & ML

CAPTCHAs can still detect AI agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Even (very) noisy LLM evaluators are useful for improving AI agents
AI & ML

Even (very) noisy LLM evaluators are useful for improving AI agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
AI & ML

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Molecular Lead Optimization via Agentic Tool Planning
AI & ML

Molecular Lead Optimization via Agentic Tool Planning

ai & ml
2 min read★★★☆☆
Read Breakdown →
Self-Play Reinforcement Learning under Imperfect Information in Big 2
AI & ML

Self-Play Reinforcement Learning under Imperfect Information in Big 2

ai & ml
2 min read★★★☆☆
Read Breakdown →
FedQHD: Closed-Form Function-Space Federated Reinforcement Learning
AI & ML

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

ai & ml
2 min read★★★☆☆
Read Breakdown →
LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks
AI & ML

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

ai & ml
2 min read★★★☆☆
Read Breakdown →
StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation
AI & ML

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

ai & ml
2 min read★★★☆☆
Read Breakdown →
TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
AI & ML

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

ai & ml
2 min read★★★☆☆
Read Breakdown →
UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind
AI & ML

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

ai & ml
2 min read★★★☆☆
Read Breakdown →
anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he
OPEN SOURCE

anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he

open source
2 min read★★★☆☆
Read Breakdown →
Adobe’s conversational AI agent is a mediocre design intern
AI & ML

Adobe’s conversational AI agent is a mediocre design intern

ai & ml
2 min read★★★☆☆
Read Breakdown →
Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents
AI & ML

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Orchestrating AI code review at scale
AI & ML

Orchestrating AI code review at scale

ai & ml
2 min read★★★☆☆
Read Breakdown →
Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes
AI & ML

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

ai & ml
2 min read★★★☆☆
Read Breakdown →
VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis
AI & ML

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

ai & ml
2 min read★★★☆☆
Read Breakdown →
Differentiable Belief-based Opponent Shaping
AI & ML

Differentiable Belief-based Opponent Shaping

ai & ml
2 min read★★★☆☆
Read Breakdown →
Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
AI & ML

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

ai & ml
2 min read★★★☆☆
Read Breakdown →
The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
ENGINEERING

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

engineering
2 min read★★★☆☆
Read Breakdown →
Beyond Consensus: Trace-Level Synthesis in Mixture of Agents
AI & ML

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
PRO-CUA: Process-Reward Optimization for Computer Use Agents
AI & ML

PRO-CUA: Process-Reward Optimization for Computer Use Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Governing Technical Debt in Agentic AI Systems
AI & ML

Governing Technical Debt in Agentic AI Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents
AI & ML

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
GTA: Generating Long-Horizon Tasks for Web Agents at Scale
AI & ML

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

ai & ml
2 min read★★★☆☆
Read Breakdown →
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
AI & ML

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
SQLite Does Not Accept Agentic Code
AI & ML

SQLite Does Not Accept Agentic Code

ai & ml
2 min read★★★☆☆
Read Breakdown →
Okta reports Q1 revenue up 11% YoY to $765M, vs. $752M est., says the agentic AI build-out is spiking demand for its identity tools; OKTA jumps 7%+ after hours
AI & ML

Okta reports Q1 revenue up 11% YoY to $765M, vs. $752M est., says the agentic AI build-out is spiking demand for its identity tools; OKTA jumps 7%+ after hours

ai & ml
2 min read★★★☆☆
Read Breakdown →
Asana acquires no-code agent-builder Stack AI
AI & ML

Asana acquires no-code agent-builder Stack AI

ai & ml
2 min read★★★☆☆
Read Breakdown →
[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]
AI & ML

[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]

ai & ml
2 min read★★★☆☆
Read Breakdown →
The OpenClaw crisis is the most complete case study of agentic AI security failure. Here's the full timeline and technical breakdown.
AI & ML

The OpenClaw crisis is the most complete case study of agentic AI security failure. Here's the full timeline and technical breakdown.

ai & ml
2 min read★★★☆☆
Read Breakdown →
I gave my AI agents email instead of better reasoning. They started fixing each other's bugs.
AI & ML

I gave my AI agents email instead of better reasoning. They started fixing each other's bugs.

ai & ml
2 min read★★★☆☆
Read Breakdown →
Adding agentic AI to an existing search app without replacing anything
AI & ML

Adding agentic AI to an existing search app without replacing anything

ai & ml
2 min read★★★☆☆
Read Breakdown →
Protestware for coding agents
AI & ML

Protestware for coding agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agentic Search for Context Engineering
AI & ML

Agentic Search for Context Engineering

ai & ml
2 min read★★★☆☆
Read Breakdown →
Show HN: Ktx – Open-source executable context layer for data agents
AI & ML

Show HN: Ktx – Open-source executable context layer for data agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue
AI & ML

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

ai & ml
2 min read★★★☆☆
Read Breakdown →
Saris, which builds AI agents to automate back-office work for banks and credit unions, raised a $28.8M Series A led by 8VC
TECH BUSINESS

Saris, which builds AI agents to automate back-office work for banks and credit unions, raised a $28.8M Series A led by 8VC

tech business
2 min read★★★☆☆
Read Breakdown →
Visa invests in Replit to power agentic payments for developers
TECH BUSINESS

Visa invests in Replit to power agentic payments for developers

tech business
2 min read★★★☆☆
Read Breakdown →
Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity
AI & ML

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

ai & ml
2 min read★★★☆☆
Read Breakdown →
$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference
AI & ML

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

ai & ml
2 min read★★★☆☆
Read Breakdown →
Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection
AI & ML

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

ai & ml
2 min read★★★☆☆
Read Breakdown →
The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution
AI & ML

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

ai & ml
2 min read★★★☆☆
Read Breakdown →
Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment
AI & ML

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

ai & ml
2 min read★★★☆☆
Read Breakdown →
London-based Geordie AI, which builds a security and governance platform for AI agents, raised a $30M Series A led by Balderton at an estimated $180M valuation
TECH BUSINESS

London-based Geordie AI, which builds a security and governance platform for AI agents, raised a $30M Series A led by Balderton at an estimated $180M valuation

tech business
2 min read★★★☆☆
Read Breakdown →
revfactory/harness — A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the
OPEN SOURCE

revfactory/harness — A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the

open source
2 min read★★★☆☆
Read Breakdown →
anthropics/skills — Public repository for Agent Skills
OPEN SOURCE

anthropics/skills — Public repository for Agent Skills

open source
2 min read★★★☆☆
Read Breakdown →
DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents
AI & ML

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Why LLMs Fail at Causal Discovery and How Interventional Agents Escape
AI & ML

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

ai & ml
2 min read★★★☆☆
Read Breakdown →
Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems
AI & ML

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access
AI & ML

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

ai & ml
2 min read★★★☆☆
Read Breakdown →
Voluntary Collusion with Secret Tools in Competing LLM Agents
AI & ML

Voluntary Collusion with Secret Tools in Competing LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Laguna M.1/XS.2 Technical Report
AI & ML

Laguna M.1/XS.2 Technical Report

ai & ml
2 min read★★★☆☆
Read Breakdown →
Reasoning and Planning with Dynamically Changing Norms
AI & ML

Reasoning and Planning with Dynamically Changing Norms

ai & ml
2 min read★★★☆☆
Read Breakdown →
Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems
ENGINEERING

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

engineering
2 min read★★★☆☆
Read Breakdown →
Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models
AI & ML

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Policy-Driven Runtime Layer for Agentic LLM Serving
AI & ML

A Policy-Driven Runtime Layer for Agentic LLM Serving

ai & ml
2 min read★★★☆☆
Read Breakdown →
SkillGrad: Optimizing Agent Skills Like Gradient Descent
AI & ML

SkillGrad: Optimizing Agent Skills Like Gradient Descent

ai & ml
2 min read★★★☆☆
Read Breakdown →
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
AI & ML

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles
AI & ML

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Query Engine for the Agents
AI & ML

A Query Engine for the Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
I used autoresearch to improve my AGENTS.md, measured against real tasks
AI & ML

I used autoresearch to improve my AGENTS.md, measured against real tasks

ai & ml
2 min read★★★☆☆
Read Breakdown →
Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est.
TECH BUSINESS

Salesforce reports Q1 revenue up 13% YoY to $11.13B, vs. $11.05B est., Agentforce annual recurring revenue up 205% to $1.2B, and forecasts Q2 revenue below est.

tech business
2 min read★★★☆☆
Read Breakdown →
Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
AI & ML

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

ai & ml
2 min read★★★☆☆
Read Breakdown →
Evolving Webflow for the Agentic Web
AI & ML

Evolving Webflow for the Agentic Web

ai & ml
2 min read★★★☆☆
Read Breakdown →
NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation
TECH BUSINESS

NYC-based Pace, whose AI agents automate back-office operations for insurance companies, raised a $46M series B led by Thrive and Sequoia at a $375M valuation

tech business
2 min read★★★☆☆
Read Breakdown →
Why AI Agents Cannot Change Software Systems
AI & ML

Why AI Agents Cannot Change Software Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Stateful Inference for Low-Latency Multi-Agent Tool Calling
AI & ML

Stateful Inference for Low-Latency Multi-Agent Tool Calling

ai & ml
2 min read★★★☆☆
Read Breakdown →
Robinhood now lets your AI agents trade stocks
AI & ML

Robinhood now lets your AI agents trade stocks

ai & ml
2 min read★★★☆☆
Read Breakdown →
Sidekick: keep using neovim while a dozen agents rewrite your code
AI & ML

Sidekick: keep using neovim while a dozen agents rewrite your code

ai & ml
2 min read★★★☆☆
Read Breakdown →
Demis Hassabis says he still broadly expects AGI around 2030, though he now sees 2029 as a possibility, and 2026's “agentic era” is a “bit like a practice run”
AI & ML

Demis Hassabis says he still broadly expects AGI around 2030, though he now sees 2029 as a possibility, and 2026's “agentic era” is a “bit like a practice run”

ai & ml
2 min read★★★☆☆
Read Breakdown →
Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code
OPEN SOURCE

Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code

open source
2 min read★★★☆☆
Read Breakdown →
How AI is starting to dismantle the hegemony of the Big Four consultancies and other large firms, as AI agents help smaller consultancies handle big workloads
AI & ML

How AI is starting to dismantle the hegemony of the Big Four consultancies and other large firms, as AI agents help smaller consultancies handle big workloads

ai & ml
2 min read★★★☆☆
Read Breakdown →
Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs
AI & ML

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

ai & ml
2 min read★★★☆☆
Read Breakdown →
A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape
AI & ML

A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape

ai & ml
2 min read★★★☆☆
Read Breakdown →
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions
ENGINEERING

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

engineering
2 min read★★★☆☆
Read Breakdown →
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems
AI & ML

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Experiments in Agentic AI for Science
AI & ML

Experiments in Agentic AI for Science

ai & ml
2 min read★★★☆☆
Read Breakdown →
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation
AI & ML

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

ai & ml
2 min read★★★☆☆
Read Breakdown →
JobBench: Aligning Agent Work With Human Will
AI & ML

JobBench: Aligning Agent Work With Human Will

ai & ml
2 min read★★★☆☆
Read Breakdown →
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
ENGINEERING

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

engineering
2 min read★★★☆☆
Read Breakdown →
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
AI & ML

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

ai & ml
2 min read★★★☆☆
Read Breakdown →
MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration
AI & ML

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

ai & ml
2 min read★★★☆☆
Read Breakdown →
AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents
AI & ML

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
AI & ML

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
AI & ML

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents
AI & ML

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
AI & ML

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agent Memory: An Anatomy
AI & ML

Agent Memory: An Anatomy

ai & ml
2 min read★★★☆☆
Read Breakdown →
DeepSWE: A contamination-free benchmark for long-horizon coding agents
AI & ML

DeepSWE: A contamination-free benchmark for long-horizon coding agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agent Trace RFC
AI & ML

Agent Trace RFC

ai & ml
2 min read★★★☆☆
Read Breakdown →
Millions of AI agents imperiled by critical vulnerability in open source package
AI & ML

Millions of AI agents imperiled by critical vulnerability in open source package

ai & ml
2 min read★★★☆☆
Read Breakdown →
FBI agent explains how easy it is to ID people posting AI porn without consent
AI & ML

FBI agent explains how easy it is to ID people posting AI porn without consent

ai & ml
2 min read★★★☆☆
Read Breakdown →
Sources: Qualcomm reached a deal with ByteDance to supply millions of ASICs for AI data centers to support AI agents in the Doubao chatbot; QCOM jumps 5%+
AI & ML

Sources: Qualcomm reached a deal with ByteDance to supply millions of ASICs for AI data centers to support AI agents in the Doubao chatbot; QCOM jumps 5%+

ai & ml
2 min read★★★☆☆
Read Breakdown →
Rethinking organizational design in the age of agentic AI
AI & ML

Rethinking organizational design in the age of agentic AI

ai & ml
2 min read★★★☆☆
Read Breakdown →
Mixture of Complementary Agents for Robust LLM Ensemble
AI & ML

Mixture of Complementary Agents for Robust LLM Ensemble

ai & ml
2 min read★★★☆☆
Read Breakdown →
Not All Transitions Matter: Evidence from PPO
AI & ML

Not All Transitions Matter: Evidence from PPO

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
AI & ML

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

ai & ml
2 min read★★★☆☆
Read Breakdown →
PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets
AI & ML

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

ai & ml
2 min read★★★☆☆
Read Breakdown →