Models

LLMs, foundation models, and model releases

125 articles

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot
AI & ML

Sources: Microsoft is working on an app that will include GitHub Copilot, Copilot chat, Copilot Cowork, and a new agentic workflow tool called Autopilot

ai & ml
2 min read★★★☆☆
Read Breakdown →
Rsync maintainer starts uses Claude, regressions mount
AI & ML

Rsync maintainer starts uses Claude, regressions mount

ai & ml
2 min read★★★☆☆
Read Breakdown →
Notes from the Mistral AI Now Summit in Paris
AI & ML

Notes from the Mistral AI Now Summit in Paris

ai & ml
2 min read★★★☆☆
Read Breakdown →
Flathub disallows LLM-based submissions
AI & ML

Flathub disallows LLM-based submissions

ai & ml
2 min read★★★☆☆
Read Breakdown →
Even (very) noisy LLM evaluators are useful for improving AI agents
AI & ML

Even (very) noisy LLM evaluators are useful for improving AI agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents
AI & ML

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models
AI & ML

Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

ai & ml
2 min read★★★☆☆
Read Breakdown →
When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
AI & ML

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

ai & ml
2 min read★★★☆☆
Read Breakdown →
FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks
AI & ML

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

ai & ml
2 min read★★★☆☆
Read Breakdown →
Label-Free Reinforcement Learning via Cross-Model Entropy
AI & ML

Label-Free Reinforcement Learning via Cross-Model Entropy

ai & ml
2 min read★★★☆☆
Read Breakdown →
ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
AI & ML

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ai & ml
2 min read★★★☆☆
Read Breakdown →
LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks
AI & ML

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

ai & ml
2 min read★★★☆☆
Read Breakdown →
Enhancing LLM Medical Coding with Structured External Knowledge
AI & ML

Enhancing LLM Medical Coding with Structured External Knowledge

ai & ml
2 min read★★★☆☆
Read Breakdown →
BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking
AI & ML

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

ai & ml
2 min read★★★☆☆
Read Breakdown →
Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities
AI & ML

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

ai & ml
2 min read★★★☆☆
Read Breakdown →
Learning to Translate from Soft to Hard LLM Prompts
AI & ML

Learning to Translate from Soft to Hard LLM Prompts

ai & ml
2 min read★★★☆☆
Read Breakdown →
Disentangling Language Roles in Multilingual LLM Task Execution
AI & ML

Disentangling Language Roles in Multilingual LLM Task Execution

ai & ml
2 min read★★★☆☆
Read Breakdown →
TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
AI & ML

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

ai & ml
2 min read★★★☆☆
Read Breakdown →
We should be more tired than the model
AI & ML

We should be more tired than the model

ai & ml
2 min read★★★☆☆
Read Breakdown →
Show HN: Compile-time model-id validation with declared capability
AI & ML

Show HN: Compile-time model-id validation with declared capability

ai & ml
2 min read★★★☆☆
Read Breakdown →
anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he
OPEN SOURCE

anthropics/claude-code — Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and he

open source
2 min read★★★☆☆
Read Breakdown →
cursor/plugins — Cursor plugin specification and official plugins
OPEN SOURCE

cursor/plugins — Cursor plugin specification and official plugins

open source
2 min read★★★☆☆
Read Breakdown →
run-llama/liteparse — A fast, helpful, and open-source document parser
OPEN SOURCE

run-llama/liteparse — A fast, helpful, and open-source document parser

open source
2 min read★★★☆☆
Read Breakdown →
galilai-group/stable-worldmodel — A platform for reproducible world model research and evaluation
OPEN SOURCE

galilai-group/stable-worldmodel — A platform for reproducible world model research and evaluation

open source
2 min read★★★☆☆
Read Breakdown →
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
AI & ML

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

ai & ml
2 min read★★★☆☆
Read Breakdown →
OpenAI says it has briefed the White House on its new biodefense program, which uses GPT-Rosalind to help develop biodefense and pandemic preparedness tools
AI & ML

OpenAI says it has briefed the White House on its new biodefense program, which uses GPT-Rosalind to help develop biodefense and pandemic preparedness tools

ai & ml
2 min read★★★☆☆
Read Breakdown →
Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]
AI & ML

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

ai & ml
2 min read★★★☆☆
Read Breakdown →
Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]
AI & ML

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

ai & ml
2 min read★★★☆☆
Read Breakdown →
Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]
AI & ML

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

ai & ml
2 min read★★★☆☆
Read Breakdown →
Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days
AI & ML

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

ai & ml
2 min read★★★☆☆
Read Breakdown →
Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents
AI & ML

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
I integrated a local Llama 3.2 model to act as a dynamic Dungeon Master in my indie RPG.
AI & ML

I integrated a local Llama 3.2 model to act as a dynamic Dungeon Master in my indie RPG.

ai & ml
2 min read★★★☆☆
Read Breakdown →
Claude Code – Everything You Can Configure That the Docs Don't Tell You
AI & ML

Claude Code – Everything You Can Configure That the Docs Don't Tell You

ai & ml
2 min read★★★☆☆
Read Breakdown →
Python utility package for building Claude Code hooks
AI & ML

Python utility package for building Claude Code hooks

ai & ml
2 min read★★★☆☆
Read Breakdown →
Review Arcade: On the Human Alignment and Gameability of LLM Reviews
AI & ML

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

ai & ml
2 min read★★★☆☆
Read Breakdown →
Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes
AI & ML

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

ai & ml
2 min read★★★☆☆
Read Breakdown →
Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild
AI & ML

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

ai & ml
2 min read★★★☆☆
Read Breakdown →
When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis
AI & ML

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

ai & ml
2 min read★★★☆☆
Read Breakdown →
Mind Your Tone: Does Tone Alter LLM Performance?
AI & ML

Mind Your Tone: Does Tone Alter LLM Performance?

ai & ml
2 min read★★★☆☆
Read Breakdown →
Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
AI & ML

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

ai & ml
2 min read★★★☆☆
Read Breakdown →
ReasonOps: Operator Segmentation for LLM Reasoning Traces
AI & ML

ReasonOps: Operator Segmentation for LLM Reasoning Traces

ai & ml
2 min read★★★☆☆
Read Breakdown →
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
AI & ML

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility
AI & ML

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

ai & ml
2 min read★★★☆☆
Read Breakdown →
Microsoft overhauled Copilot's design in Microsoft 365 with a minimalist black-and-white, text-focused interface aimed at creating a more consistent experience
AI & ML

Microsoft overhauled Copilot's design in Microsoft 365 with a minimalist black-and-white, text-focused interface aimed at creating a more consistent experience

ai & ml
2 min read★★★☆☆
Read Breakdown →
AI researchers ran 15-day simulations of worlds governed by different AI models: Claude Sonnet 4.6 recorded zero crime, while Gemini 3 Flash had the most at 683
AI & ML

AI researchers ran 15-day simulations of worlds governed by different AI models: Claude Sonnet 4.6 recorded zero crime, while Gemini 3 Flash had the most at 683

ai & ml
2 min read★★★☆☆
Read Breakdown →
The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin
AI & ML

The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

ai & ml
2 min read★★★☆☆
Read Breakdown →
LLMs believe false statements even after explicit warnings that they're false
AI & ML

LLMs believe false statements even after explicit warnings that they're false

ai & ml
2 min read★★★☆☆
Read Breakdown →
Various LLM Smells
AI & ML

Various LLM Smells

ai & ml
2 min read★★★☆☆
Read Breakdown →
Apple working to cram massive Gemini model into iPhone to power new Siri
AI & ML

Apple working to cram massive Gemini model into iPhone to power new Siri

ai & ml
2 min read★★★☆☆
Read Breakdown →
Microsoft 365 Copilot gets a speed boost and cleaner design
AI & ML

Microsoft 365 Copilot gets a speed boost and cleaner design

ai & ml
2 min read★★★☆☆
Read Breakdown →
The CFTC moves to vacate a $5M settlement with Gemini, reversing a Biden-era enforcement action, following a lobbying campaign by the Winklevoss twins
AI & ML

The CFTC moves to vacate a $5M settlement with Gemini, reversing a Biden-era enforcement action, following a lobbying campaign by the Winklevoss twins

ai & ml
2 min read★★★☆☆
Read Breakdown →
Training GPT-like model on non-language series [R]
AI & ML

Training GPT-like model on non-language series [R]

ai & ml
2 min read★★★☆☆
Read Breakdown →
[R]GNN Model For Fraud Detection Isn't Performing Well[R]
TECH BUSINESS

[R]GNN Model For Fraud Detection Isn't Performing Well[R]

tech business
2 min read★★★☆☆
Read Breakdown →
Best Text to Text Translation Model? [D]
AI & ML

Best Text to Text Translation Model? [D]

ai & ml
2 min read★★★☆☆
Read Breakdown →
Tuning LLVM's SLP Vectorizer Cost Model
AI & ML

Tuning LLVM's SLP Vectorizer Cost Model

ai & ml
2 min read★★★☆☆
Read Breakdown →
About LLMs at Zig Days
AI & ML

About LLMs at Zig Days

ai & ml
2 min read★★★☆☆
Read Breakdown →
Dynamic Workflows in Claude Code
AI & ML

Dynamic Workflows in Claude Code

ai & ml
2 min read★★★☆☆
Read Breakdown →
Claude’s new model is more ‘honest’ when it messes up
AI & ML

Claude’s new model is more ‘honest’ when it messes up

ai & ml
2 min read★★★☆☆
Read Breakdown →
Claude Opus 4.8
AI & ML

Claude Opus 4.8

ai & ml
2 min read★★★☆☆
Read Breakdown →
Sneak peek at new Siri app reveals Apple’s plans to take on ChatGPT and more
AI & ML

Sneak peek at new Siri app reveals Apple’s plans to take on ChatGPT and more

ai & ml
2 min read★★★☆☆
Read Breakdown →
These new iOS 27 renders hint at Siri’s big redesign
AI & ML

These new iOS 27 renders hint at Siri’s big redesign

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Simple State Space Model Excels at Multivariate Time Series Classification
TECH BUSINESS

A Simple State Space Model Excels at Multivariate Time Series Classification

tech business
2 min read★★★☆☆
Read Breakdown →
Gradient Transformer: Learning to Generate Updates for LLMs
AI & ML

Gradient Transformer: Learning to Generate Updates for LLMs

ai & ml
2 min read★★★☆☆
Read Breakdown →
Faster Thermal Profiling of a Lunar Rover with Machine Learning Adapted Finite Difference Model
AI & ML

Faster Thermal Profiling of a Lunar Rover with Machine Learning Adapted Finite Difference Model

ai & ml
2 min read★★★☆☆
Read Breakdown →
Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting
AI & ML

Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting

ai & ml
2 min read★★★☆☆
Read Breakdown →
Sources: at WWDC, Apple is likely to showcase how 15 years of designing custom silicon chips gives it an advantage in local AI, using a distilled Gemini model
AI & ML

Sources: at WWDC, Apple is likely to showcase how 15 years of designing custom silicon chips gives it an advantage in local AI, using a distilled Gemini model

ai & ml
2 min read★★★☆☆
Read Breakdown →
Five frontier LLMs disagree on 67% of 1k real-world fact-check claims
AI & ML

Five frontier LLMs disagree on 67% of 1k real-world fact-check claims

ai & ml
2 min read★★★☆☆
Read Breakdown →
IBM and Red Hat commit $5B to establish a new model for open-source software, dubbed Project Lightwell, and will deploy 20,000 engineers, supported by AI
ENGINEERING

IBM and Red Hat commit $5B to establish a new model for open-source software, dubbed Project Lightwell, and will deploy 20,000 engineers, supported by AI

engineering
2 min read★★★☆☆
Read Breakdown →
unclecode/crawl4ai — 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://disc
OPEN SOURCE

unclecode/crawl4ai — 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://disc

open source
2 min read★★★☆☆
Read Breakdown →
OpenMOSS/MOSS-TTS — MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the Open
OPEN SOURCE

OpenMOSS/MOSS-TTS — MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the Open

open source
2 min read★★★☆☆
Read Breakdown →
EveryInc/compound-engineering-plugin — Official Compound Engineering plugin for Claude Code, Codex, Cursor, and more
OPEN SOURCE

EveryInc/compound-engineering-plugin — Official Compound Engineering plugin for Claude Code, Codex, Cursor, and more

open source
2 min read★★★☆☆
Read Breakdown →
London- and SF-based Orbital Industries, which uses its Orb model to design advanced materials and then sell them directly, raised a $50M Series B led by Plural
TECH BUSINESS

London- and SF-based Orbital Industries, which uses its Orb model to design advanced materials and then sell them directly, raised a $50M Series B led by Plural

tech business
2 min read★★★☆☆
Read Breakdown →
Mistral says it is accelerating superintelligence development to ensure Europe's independence from US tech giants, and signs deals to supply Airbus and BMW
AI & ML

Mistral says it is accelerating superintelligence development to ensure Europe's independence from US tech giants, and signs deals to supply Airbus and BMW

ai & ml
2 min read★★★☆☆
Read Breakdown →
Qwen3.7-Max Ran for 35 Hours on Unknown Hardware and Achieved a 10× Speedup
AI & ML

Qwen3.7-Max Ran for 35 Hours on Unknown Hardware and Achieved a 10× Speedup

ai & ml
2 min read★★★☆☆
Read Breakdown →
Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture
AI & ML

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

ai & ml
2 min read★★★☆☆
Read Breakdown →
Soro: A Lightweight Foundation Model and Chatbot for Tajik
AI & ML

Soro: A Lightweight Foundation Model and Chatbot for Tajik

ai & ml
2 min read★★★☆☆
Read Breakdown →
DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents
AI & ML

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
Why LLMs Fail at Causal Discovery and How Interventional Agents Escape
AI & ML

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

ai & ml
2 min read★★★☆☆
Read Breakdown →
Voluntary Collusion with Secret Tools in Competing LLM Agents
AI & ML

Voluntary Collusion with Secret Tools in Competing LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation
AI & ML

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

ai & ml
2 min read★★★☆☆
Read Breakdown →
Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking
AI & ML

Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Policy-Driven Runtime Layer for Agentic LLM Serving
AI & ML

A Policy-Driven Runtime Layer for Agentic LLM Serving

ai & ml
2 min read★★★☆☆
Read Breakdown →
Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration
AI & ML

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

ai & ml
2 min read★★★☆☆
Read Breakdown →
Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
AI & ML

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles
AI & ML

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Query Engine for the Agents
AI & ML

A Query Engine for the Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test
AI & ML

A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test

ai & ml
2 min read★★★☆☆
Read Breakdown →
Why Ctrl+V won't paste images in Claude Code on WSL, with a fix
AI & ML

Why Ctrl+V won't paste images in Claude Code on WSL, with a fix

ai & ml
2 min read★★★☆☆
Read Breakdown →
The CFTC files alongside Gemini to nullify Gemini's $5M settlement in January 2025, arguing that the agency's current management wouldn't have pursued the case
AI & ML

The CFTC files alongside Gemini to nullify Gemini's $5M settlement in January 2025, arguing that the agency's current management wouldn't have pursued the case

ai & ml
2 min read★★★☆☆
Read Breakdown →
Getting Claude to extract data from a 1997 football manager game
AI & ML

Getting Claude to extract data from a 1997 football manager game

ai & ml
2 min read★★★☆☆
Read Breakdown →
Chachamaru127/claude-code-harness — Claude Code Dedicated Development Harness - Achieving High-Quality Development Through an Autonomous
OPEN SOURCE

Chachamaru127/claude-code-harness — Claude Code Dedicated Development Harness - Achieving High-Quality Development Through an Autonomous

open source
2 min read★★★☆☆
Read Breakdown →
Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
AI & ML

Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

ai & ml
2 min read★★★☆☆
Read Breakdown →
Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS
AI & ML

Gemini, Gophers, and Fingers. Oh My Alternative Internets Beyond HTTPS

ai & ml
2 min read★★★☆☆
Read Breakdown →
harry0703/MoneyPrinterTurbo — 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
OPEN SOURCE

harry0703/MoneyPrinterTurbo — 利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

open source
2 min read★★★☆☆
Read Breakdown →
ElevenLabs’s new music generation model can switch genres mid-track
AI & ML

ElevenLabs’s new music generation model can switch genres mid-track

ai & ml
2 min read★★★☆☆
Read Breakdown →
GEM: Geometric Entropy Mixing for Optimal LLM Data Curation
AI & ML

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

ai & ml
2 min read★★★☆☆
Read Breakdown →
AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion
AI & ML

AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion

ai & ml
2 min read★★★☆☆
Read Breakdown →
InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization
AI & ML

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

ai & ml
2 min read★★★☆☆
Read Breakdown →
Co-folding model guided by structural proteomics
AI & ML

Co-folding model guided by structural proteomics

ai & ml
2 min read★★★☆☆
Read Breakdown →
The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works
AI & ML

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

ai & ml
2 min read★★★☆☆
Read Breakdown →
Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization
AI & ML

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

ai & ml
2 min read★★★☆☆
Read Breakdown →
MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding
AI & ML

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

ai & ml
2 min read★★★☆☆
Read Breakdown →
Biohub, the Mark Zuckerberg and Priscilla Chan-funded institute, releases a protein-structure prediction model and more, calling it “a world model” of proteins
AI & ML

Biohub, the Mark Zuckerberg and Priscilla Chan-funded institute, releases a protein-structure prediction model and more, calling it “a world model” of proteins

ai & ml
2 min read★★★☆☆
Read Breakdown →
Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code
OPEN SOURCE

Yeachan-Heo/oh-my-claudecode — Teams-first Multi-agent orchestration for Claude Code

open source
2 min read★★★☆☆
Read Breakdown →
Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70%
AI & ML

Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70%

ai & ml
2 min read★★★☆☆
Read Breakdown →
Prompt Politeness Affects LLM Accuracy
AI & ML

Prompt Politeness Affects LLM Accuracy

ai & ml
2 min read★★★☆☆
Read Breakdown →
Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs
AI & ML

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

ai & ml
2 min read★★★☆☆
Read Breakdown →
A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape
AI & ML

A deep dive into how Claude Code and OpenClaw unleashed the AI agent revolution that is rapidly transforming the modern computing landscape

ai & ml
2 min read★★★☆☆
Read Breakdown →
Q&A with Claude Code creator and head Boris Cherny on how the title “software engineer” is disappearing, why AI may create more jobs than it destroys, and more
AI & ML

Q&A with Claude Code creator and head Boris Cherny on how the title “software engineer” is disappearing, why AI may create more jobs than it destroys, and more

ai & ml
2 min read★★★☆☆
Read Breakdown →
Can LLMs Introspect? A Reality Check
AI & ML

Can LLMs Introspect? A Reality Check

ai & ml
2 min read★★★☆☆
Read Breakdown →
Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions
ENGINEERING

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

engineering
2 min read★★★☆☆
Read Breakdown →
OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling
AI & ML

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

ai & ml
2 min read★★★☆☆
Read Breakdown →
Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning
AI & ML

Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

ai & ml
2 min read★★★☆☆
Read Breakdown →
PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design
ENGINEERING

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

engineering
2 min read★★★☆☆
Read Breakdown →
AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents
AI & ML

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

ai & ml
2 min read★★★☆☆
Read Breakdown →
UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
AI & ML

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
AI & ML

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

ai & ml
2 min read★★★☆☆
Read Breakdown →
Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
AI & ML

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

ai & ml
2 min read★★★☆☆
Read Breakdown →
LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs
AI & ML

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

ai & ml
2 min read★★★☆☆
Read Breakdown →
Mixture of Complementary Agents for Robust LLM Ensemble
AI & ML

Mixture of Complementary Agents for Robust LLM Ensemble

ai & ml
2 min read★★★☆☆
Read Breakdown →
Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing
AI & ML

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

ai & ml
2 min read★★★☆☆
Read Breakdown →
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
AI & ML

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

ai & ml
2 min read★★★☆☆
Read Breakdown →
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
AI & ML

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

ai & ml
2 min read★★★☆☆
Read Breakdown →
Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team
AI & ML

Eagle 3.1: Collaboration Between the EAGLE Team, vLLM Team, and TorchSpec Team

ai & ml
2 min read★★★☆☆
Read Breakdown →
Amazon fulfillment competitor Stord raises $250M at $3B valuation
AI & ML

Amazon fulfillment competitor Stord raises $250M at $3B valuation

ai & ml
2 min read★★★☆☆
Read Breakdown →