Research

Papers, benchmarks, and academic breakthroughs

265 articles

AI & ML

Microsoft under fire for threatening security researcher with criminal investigation

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Molecular Lead Optimization via Agentic Tool Planning

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

Balancing Multimodal Learning through Label Space Reshaping

Representation Alignment Rests on Linear Structure

Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit

Towards Continuous-time Causal Foundation Models

Context Distillation as Latent Memory Management

Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

ai & ml

2 min read★★★☆☆

Read Breakdown →

$Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System$

AI & ML

Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System

Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

Conf-Gen: Conformal Uncertainty Quantification for Generative Models

A Training-Time Diagnostic for Generalization via the Log-Alignment Ratio

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

The Hamilton-Jacobi Theory of Deep Learning

Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions

Label-Free Reinforcement Learning via Cross-Model Entropy

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

Enhancing LLM Medical Coding with Structured External Knowledge

OralAgent: Integrating Reasoning, Tools, and Knowledge for Interactive Dental Image Analysis

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

Debate Helps Weak Judges Reward Stronger Models

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

The Future of Facts: Tracing the Factual Generation-Verification Gap

Can Hallucinations Be Useful? Solving Multi-Hop Questions With SLMs By Chaining System-I/II Reasoning

Simorgh at SemEval-2026 task 7: Region-Aware Hybrid Retrieval for Low-Resource Cultural Reasoning in Multilingual Question Answering

Learning to Translate from Soft to Hard LLM Prompts

Disentangling Language Roles in Multilingual LLM Task Execution

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

Chain-based Adaptive Reconfiguration Over Lattices for Hallucination Reduction

ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation

Beyond Input Understanding: Diagnosing Multilingual Mathematical Reasoning with Directed Acyclic Trace Graphs

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

UniMaia: Steering Chess Policies with Language for Human-like Play

Do Models Know Why They Changed Their Mind? Interpretability and Faithfulness of Chain-of-Thought Under Knowledge Conflict

These researchers would be in Africa fighting ebola—but Trump cut their funding

galilai-group/stable-worldmodel — A platform for reproducible world model research and evaluation

How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

London-based Inherent, which aims to combine human scientific research with AI to produce innovations, emerges from stealth with $50M led by Index Ventures

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

What are important data systems problems, ignored by research? (2024)

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Orthogonal Concept Erasure for Diffusion Models

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

tech business

2 min read★★★☆☆

Read Breakdown →

$Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild$

AI & ML

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

Mind Your Tone: Does Tone Alter LLM Performance?

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

Differentiable Belief-based Opponent Shaping

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Robust and Efficient Guardrails with Latent Reasoning

Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

PRO-CUA: Process-Reward Optimization for Computer Use Agents

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Governing Technical Debt in Agentic AI Systems

Better Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction Correction

Paper Agents, Paper Gains: An Empirical Analysis of DeFi Investment Agents

ReasonOps: Operator Segmentation for LLM Reasoning Traces

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

AI researchers ran 15-day simulations of worlds governed by different AI models: Claude Sonnet 4.6 recorded zero crime, while Gemini 3 Flash had the most at 683

GitHub bans security researcher who posted zero-day Windows exploits

[D] Where do you go for serious AI research discussion online? [D]

US healthcare still stupidly expensive, with pathetic outcomes, study finds

Researchers develop a new process to get lithium out of rocks

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]

The OpenClaw crisis is the most complete case study of agentic AI security failure. Here's the full timeline and technical breakdown.

How a new extraction process could unlock the world’s lithium

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

IGADA-IoT: IoT Sensor Energy Optimization in Wireless Sensor Networks Driven by Automatic Data Augmentation

A Simple State Space Model Excels at Multivariate Time Series Classification

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

HEAL: Resilient and Self-* Hub-based Learning

Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective

Resource-Constrained Affect Modelling via Variance Regularisation Pruning

Energy-Structured Low-Rank Adaptation for Continual Learning

Federated Learning for Multivariate Time Series Anomaly Detection in Industrial Automation

GenSBI: Generative Methods for Simulation-Based Inference in JAX

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

The Fundamental Limits of Fraud Detection in Card Payment Networks

Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals

Gradient Transformer: Learning to Generate Updates for LLMs

The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution

Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression

Faster Thermal Profiling of a Lunar Rover with Machine Learning Adapted Finite Difference Model

Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting

When do complex-valued neural networks help? A study of representation, geometry, and optimization

Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms

AWS is rolling out Resilient Network Graphs, a “quasi-random” networking architecture that uses a flat mesh design, and says it accelerates information flows

UK researchers win access to Google's Willow quantum chip, which it says completes a calculation in five minutes that takes supercomputers 10 septillion years

Finding miscompiles for fun, not profit

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Soro: A Lightweight Foundation Model and Chatbot for Tajik

On the Origin of Synthetic Information by Means of Steganographic Inheritance

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

RULER: Representation-Level Verification of Machine Unlearning

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

Cyberbullying Governance on Social Media: A Unified Framework from Content Identification to Intervention

Voluntary Collusion with Secret Tools in Competing LLM Agents

Laguna M.1/XS.2 Technical Report

Reasoning and Planning with Dynamically Changing Norms

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Behavioural Analysis of Alignment Faking

Cross-Entropy Games and Frost Training

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking

A Policy-Driven Runtime Layer for Agentic LLM Serving

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Auditable Decision Models with Learned Abstention and Real-Time Steering

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

A Query Engine for the Agents

A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test

GraD-IBD: Graph Representation Learning from Diagnosis Trajectories for Early Detection of Inflammatory Bowel Disease

I used autoresearch to improve my AGENTS.md, measured against real tasks

Amazon says it is making the “architecture, starter code, and learnings” from Alexa for Shopping available to third-party retailers, starting with Kate Spade

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion

SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection

Neural Bayesian Sequential Routing

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach

Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning

When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals

Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series

Co-folding model guided by structural proteomics

Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection

On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series

From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD

Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

Unified Neural Scaling Laws

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Two-Parameter Flows for Learning Population Dynamics of Physical Systems

Stateful Inference for Low-Latency Multi-Agent Tool Calling

Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks

Classification and detection of multiple UAVs using rational Gaussian wavelet neural networks

Curriculum Learning for Safety Alignment

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories and five languages, and says GPT-5.5 is the leader at 70%

BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization

Can LLMs Introspect? A Reality Check

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Constraint acquisition needs better benchmarks

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

Experiments in Agentic AI for Science

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

JobBench: Aligning Agent Work With Human Will

Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Automatic Layer Selection for Hallucination Detection

Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL

Advancing Creative Physical Intelligence in Large Multimodal Models

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning

Tail-Aware HiFloat4: W4A4 Post-Training Quantization for Wan2.2

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

Initial benchmarks show Nvidia's Vera CPU, which features 88 in-house-designed Olympus cores, packs a heavy-hitting punch, beating Intel's and AMD's x86_64 CPUs

DeepSWE: A contamination-free benchmark for long-horizon coding agents

Algometrics: Forecasting Under Algorithmic Feedback

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

CAFD: Concept-Aware DNN Fault Detection using VLMs

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation

Hidden-State Privacy Has an Empty Middle

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?

Mixture of Complementary Agents for Robust LLM Ensemble

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions

Feature Lottery? A Bifurcation Theory of Concept Emergence

Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette

Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion

Not All Transitions Matter: Evidence from PPO

Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

Characterizing the Representational Capacity of Neural Processes

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions

Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence

A lift for input-convex neural network training

Fourier Feature Pyramids for Physics-Informed Neural Networks

ai & ml

2 min read★★★☆☆

Read Breakdown →

Research

Microsoft under fire for threatening security researcher with criminal investigation

The Secret Garden of Rock-Paper-Scissors

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Molecular Lead Optimization via Agentic Tool Planning

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

Balancing Multimodal Learning through Label Space Reshaping

Representation Alignment Rests on Linear Structure

Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit

Towards Continuous-time Causal Foundation Models

Context Distillation as Latent Memory Management

Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

Sequential Physics-Constrained Neural Operator Forward Modeling for the $\textit{Norne}$ Reservoir System

Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models

Conf-Gen: Conformal Uncertainty Quantification for Generative Models

A Training-Time Diagnostic for Generalization via the Log-Alignment Ratio

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

The Hamilton-Jacobi Theory of Deep Learning

Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions

Label-Free Reinforcement Learning via Cross-Model Entropy

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

Enhancing LLM Medical Coding with Structured External Knowledge

OralAgent: Integrating Reasoning, Tools, and Knowledge for Interactive Dental Image Analysis

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation

StoryMI: Steerable Multi-Agent Therapeutic Dialogue Generation

Debate Helps Weak Judges Reward Stronger Models

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

The Future of Facts: Tracing the Factual Generation-Verification Gap

Can Hallucinations Be Useful? Solving Multi-Hop Questions With SLMs By Chaining System-I/II Reasoning

Simorgh at SemEval-2026 task 7: Region-Aware Hybrid Retrieval for Low-Resource Cultural Reasoning in Multilingual Question Answering

Learning to Translate from Soft to Hard LLM Prompts

Disentangling Language Roles in Multilingual LLM Task Execution

Cultural Fidelity in English-to-Hindi Translation: A Preservation-Fluency Frontier for Gender Recoverability

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

Chain-based Adaptive Reconfiguration Over Lattices for Hallucination Reduction

ReverseMath: Answer Inversion for Scalable and Verifiable Mathematical Problem Generation

Beyond Input Understanding: Diagnosing Multilingual Mathematical Reasoning with Directed Acyclic Trace Graphs

UserHarness: Harnessing User Minds for Stronger Agent Theory-of-Mind

UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Reading or Guessing? Visual Grounding Failures of Vision-Language Models for OCR in Ancient Greek Editions

UniMaia: Steering Chess Policies with Language for Human-like Play

Do Models Know Why They Changed Their Mind? Interpretability and Faithfulness of Chain-of-Thought Under Knowledge Conflict

These researchers would be in Africa fighting ebola—but Trump cut their funding

galilai-group/stable-worldmodel — A platform for reproducible world model research and evaluation

How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

London-based Inherent, which aims to combine human scientific research with AI to produce innovations, emerges from stealth with $50M led by Index Ventures

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

Blaming the model won't fix your workflow — a white paper on structural enforcement for AI agents

What are important data systems problems, ignored by research? (2024)

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Orthogonal Concept Erasure for Diffusion Models

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild