AI & Machine Learning

AI & ML

Sources: President Trump met with Coinbase CEO Brian Armstrong on March 3 before publicly admonishing banks over the GENIUS Act, echoing Coinbase's position

ai & ml

2 min read★★★☆☆

Read Breakdown →

AI & ML

Agentic Engineering Patterns

A CPU that runs entirely on GPU

Log messages are mostly for the people operating your software

Security researchers successfully prompted the AI behind a Utah prescription renewal pilot to reclassify meth as an “unrestricted therapeutic”, and more

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

Can machines be uncertain?

COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

Revealing Positive and Negative Role Models to Help People Make Good Decisions

NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

See and Remember: A Multimodal Agent for Web Traversal

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Retrieval-Augmented Robots via Retrieve-Reason-Act

A Natural Language Agentic Approach to Study Affective Polarization

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

Rethinking Code Similarity for Automated Algorithm Design with LLMs

Agentified Assessment of Logical Reasoning Agents

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach

Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation

Scaling Reward Modeling without Human Supervision

Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Safety Training Persists Through Helpfulness Optimization in LLM Agents

Generalized Discrete Diffusion with Self-Correction

Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field Reconstruction

Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

Adaptive Personalized Federated Learning via Multi-task Averaging of Kernel Mean Embeddings

Structured vs. Unstructured Pruning: An Exponential Gap

Talking with Verifiers: Automatic Specification Generation for Neural Network Verification

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Concept Heterogeneity-aware Representation Steering

Length Generalization Bounds for Transformers

High-order Knowledge Based Network Controllability Robustness Prediction: A Hypergraph Neural Network Approach

Boosting Meta-Learning for Few-Shot Text Classification via Label-guided Distance Scaling

PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network

A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning

Number Research Inc

Coruna: The Mysterious Journey of a Powerful iOS Exploit Kit

A pretty looking web for a quantum mechanics tool

Speculative Speculative Decoding (SSD)

The largest acidic geyser has been putting on quite a show

Weave – A language aware merge algorithm based on entities

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

TikTok will not introduce end-to-end encryption, saying it makes users less safe

California's Digital Age Assurance Act, and FOSS

Claude's Cycles [pdf]

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Don't make me talk to your chatbot

ai & ml

2 min read★★★☆☆

Read Breakdown →

$Google launches Gemini 3.1 Flash-Lite, which it says delivers “enhanced performance” at a fraction of the cost of larger models and outperforms 2.5 Flash$

AI & ML

Google launches Gemini 3.1 Flash-Lite, which it says delivers “enhanced performance” at a fraction of the cost of larger models and outperforms 2.5 Flash

OpenAI releases GPT-5.3 Instant, which it says delivers more accurate answers and better-contextualized results when searching the web, for all ChatGPT users

OpenAI says GPT-5.3 Instant's tone should feel less “cringe” than GPT-5.2 Instant and the model has a smoother, more to-the-point conversational style

Ziff Davis agrees to sell its Connectivity division, including Ookla and Downdetector, to Accenture for $1.2B in cash, to focus on enthusiast websites like IGN

Claude is an Electron App because we’ve lost native

Don Knuth's "Claude-like" directed Hamiltonian cycles decompositions

New MacBook Airs come with M5, double the storage, and higher starting prices

LLMs can unmask pseudonymous users at scale with surprising accuracy

Research roundup: Six cool science stories we almost missed

Apple's new iPhone 17e has an A19 chip, MagSafe, and 256GB of storage for $599

Claude Code rolls out a voice mode capability

ChatGPT’s new GPT-5.3 Instant model will stop telling you to calm down

Anthropic’s Claude reports widespread outage

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

[P] I trained Qwen2.5-1.5b with RLVR (GRPO) vs SFT and compared benchmark performance

[D] How much time do you actually lose trying to reproduce ML papers?

[R] Boundary-Metric Evaluation for Thin-Structure Segmentation under 2% Foreground Sparsity

[P] Bridging the gap between arXiv PDFs and runnable implementations: Announcing ResearchClaw (Open Source)

[R] How often do you implement research papers?

[P] Free Code Real-time voice-to-voice with your LLM & full reasoning LLM interface (Telegram + 25 tools, vision, docs, memory) on a Mac Studio running Qwen 3.5 35B — 100% local, zero API cost. Full build open-sourced. cloudfare + n8n + Pipecat + MLX unlock insane possibilities on consumer hardwar

[D] How to get credits to run experiments on closed source models as a student researcher.

[R] Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification

[P] On-device Qwen3-TTS (1.7B/0.6B) inference on iOS and macOS via MLX-Swift — voice cloning, voice design, and streaming TTS with no cloud

[R] Benchmarked 94 LLM endpoints for jan 2026. open source is now within 5 quality points of proprietary

[R] CVPR 2026 Camera Ready Paper

[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy

ChatGPT Uninstalls Surge 295% After OpenAI’s DoD Deal Sparks Backlash

Warning: Trae IDE's New Token Pricing Destroyed My Workflow Overnight – Don't Get Caught Off Guard

Scientists made AI agents ruder — and they performed better at complex reasoning tasks

Learning how to steer agentic AI in the right direction is a useless skill #changemymind

Claude hits No. 1 on App Store as ChatGPT users defect in show of support for Anthropic's Pentagon stance

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Confusion-Aware Rubric Optimization for LLM-based Automated Grading

MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval

Optimizing In-Context Demonstrations for LLM-based Automated Grading

From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems

LifeEval: A Multimodal Benchmark for Assistive AI in Egocentric Daily Life Tasks

DenoiseFlow: Uncertainty-Aware Denoising for Reliable LLM Agentic Workflows

LOGIGEN: Logic-Driven Generation of Verifiable Agentic Tasks

Advancing Multimodal Judge Models through a Capability-Oriented Benchmark and MCTS-Driven Data Generation

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

LiTS: A Modular Framework for LLM Tree Search

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Google’s latest Pixel drop allows Gemini to order groceries for you and more

How the experts figure out what’s real in the age of deepfakes

Anthropic upgrades Claude’s memory to attract AI switchers

ai & ml

2 min read★★★☆☆

Read Breakdown →

AI & Machine Learning

Show HN: Rust compiler in PHP emitting x86-64 executables

Giving LLMs a personality is just good engineering

Better JIT for Postgres

Sources: President Trump met with Coinbase CEO Brian Armstrong on March 3 before publicly admonishing banks over the GENIUS Act, echoing Coinbase's position

Agentic Engineering Patterns

A CPU that runs entirely on GPU

Log messages are mostly for the people operating your software

Security researchers successfully prompted the AI behind a Utah prescription renewal pilot to reclassify meth as an “unrestricted therapeutic”, and more

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Estimating Visual Attribute Effects in Advertising from Observational Data: A Deepfake-Informed Double Machine Learning Approach

Can machines be uncertain?

COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

Revealing Positive and Negative Role Models to Help People Make Good Decisions

NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

See and Remember: A Multimodal Agent for Web Traversal

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Retrieval-Augmented Robots via Retrieve-Reason-Act

A Natural Language Agentic Approach to Study Affective Polarization

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

Rethinking Code Similarity for Automated Algorithm Design with LLMs

Agentified Assessment of Logical Reasoning Agents

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach

Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation

Scaling Reward Modeling without Human Supervision

Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Safety Training Persists Through Helpfulness Optimization in LLM Agents

Generalized Discrete Diffusion with Self-Correction

Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field Reconstruction

Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

Adaptive Personalized Federated Learning via Multi-task Averaging of Kernel Mean Embeddings

Structured vs. Unstructured Pruning: An Exponential Gap

Talking with Verifiers: Automatic Specification Generation for Neural Network Verification

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Concept Heterogeneity-aware Representation Steering

Length Generalization Bounds for Transformers

High-order Knowledge Based Network Controllability Robustness Prediction: A Hypergraph Neural Network Approach

Boosting Meta-Learning for Few-Shot Text Classification via Label-guided Distance Scaling

PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network

A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning

Number Research Inc

Coruna: The Mysterious Journey of a Powerful iOS Exploit Kit

A pretty looking web for a quantum mechanics tool

Speculative Speculative Decoding (SSD)

The largest acidic geyser has been putting on quite a show

Weave – A language aware merge algorithm based on entities

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

TikTok will not introduce end-to-end encryption, saying it makes users less safe

California's Digital Age Assurance Act, and FOSS

Claude's Cycles [pdf]

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Don't make me talk to your chatbot

Google launches Gemini 3.1 Flash-Lite, which it says delivers “enhanced performance” at a fraction of the cost of larger models and outperforms 2.5 Flash

[R] Tiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy

How the experts figure out what’s real in the age of deepfakes