AI & ML

1625 papers · Page 8 of 17

Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.

New Capability arxiv | Mar 25

Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.

New Capability arxiv | Mar 25

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

Efficiency Breakthrough arxiv | Mar 25

Exposes a major flaw in medical super-resolution research where models trained on downsampled data fail to recover actual lost structures in real low-resolution scans.

Breaks Assumption arxiv | Mar 25

Connects stochastic optimal control to the Schrödinger equation, enabling analytic solutions for long-horizon problems that previously scaled exponentially with difficulty.

Paradigm Shift arxiv | Mar 25

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

Efficiency Breakthrough arxiv | Mar 25

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

Efficiency Breakthrough arxiv | Mar 25

Enables 3D medical image segmentation pre-training using only mathematical formulas and implicit functions, requiring zero real-world data or expert annotations.

Paradigm Shift arxiv | Mar 25

Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.

New Capability arxiv | Mar 25

Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.

New Capability arxiv | Mar 25

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

New Capability arxiv | Mar 25

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

Efficiency Breakthrough arxiv | Mar 25

A dual-path architecture that combines speculative speech-to-speech prefixes with cascaded LLM continuations for zero-latency, high-quality dialogue.

Paradigm Shift arxiv | Mar 25

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

Efficiency Breakthrough arxiv | Mar 25

A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.

Paradigm Shift arxiv | Mar 25

A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.

New Capability arxiv | Mar 25

Exposes 'shortcut learning' in differentiable simulators where models non-causally exploit future information to 'regret' past mistakes rather than learning to recover.

Breaks Assumption arxiv | Mar 25

A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.

New Capability arxiv | Mar 25

Introduces a 'geospatial model foundry' that learns unified representations from the weights of existing models rather than raw data.

Paradigm Shift arxiv | Mar 25

An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.

Efficiency Breakthrough arxiv | Mar 25

A bilevel framework where an outer LLM loop meta-optimizes an inner autoresearch loop by autonomously generating and injecting Python code at runtime.

New Capability arxiv | Mar 25

Integrates tactile perception into video-action models to enable high-fidelity force modulation in contact-rich robotic tasks.

New Capability arxiv | Mar 25

Enables training of monocular novel-view synthesis models using entirely unpaired, in-the-wild internet images.

Paradigm Shift arxiv | Mar 25

Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.

Efficiency Breakthrough arxiv | Mar 25

Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.

Efficiency Breakthrough arxiv | Mar 25

A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.

New Capability arxiv | Mar 25

Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.

Efficiency Breakthrough arxiv | Mar 25

Provides a statistically rigorous framework to evaluate model performance and reliability after cherry-picking or selecting models based on the same test data.

Paradigm Shift arxiv | Mar 25

Develops a differentially private RLHF pipeline that decouples private reward learning from policy optimization, achieving strong alignment on Gemma-2B-IT with privacy guarantees.

New Capability arxiv | Mar 25

AI is actually the most confident when it's completely making stuff up.

Paradigm Challenge arxiv | Mar 24

Future phones might have 'liquid' antennas that literally swim around inside the device to hunt down a better signal.

Practical Magic arxiv | Mar 24

A massive study found women do way more innovative science than men, but they still get robbed when it's time for the credit.

Paradigm Challenge arxiv | Mar 24

Scientists found a way to make a basic home computer screw up math exactly like a super-expensive AI chip does.

Practical Magic arxiv | Mar 24

A core rule of tech just got an update, and it turns out those fancy AI chips might eventually be totally useless.

Paradigm Challenge arxiv | Mar 24

New 360-degree video treats things on screen like they have gravity, just so it can predict exactly where you're gonna look next.

Practical Magic arxiv | Mar 24

Your future phone might have antennas that physically slide along tracks to 'pinch' the best Wi-Fi signal possible.

Practical Magic arxiv | Mar 24

An AI just 'figured out' how to lock down its own code using high-level math without a human ever telling it how.

Paradigm Challenge arxiv | Mar 24

Engineers built 'invisible' backdoors into computer chips that are so well-hidden, even the most powerful microscopes can't find them.

Nature Is Weird arxiv | Mar 24

Scientists found one single math formula that explains why everything from stock market crashes to earthquakes actually happens.

Nature Is Weird arxiv | Mar 24

Researchers built an AI sensor that 'thinks' using light ripples, letting it spot objects in about 25 billionths of a second.

Practical Magic arxiv | Mar 24

Researchers found one 'master' math trick that can recreate every single function on your old scientific calculator.

Paradigm Challenge arxiv | Mar 24

There’s a new AI that can tell you an animal’s whole lifestyle and what it looks like just by listening to it make a sound.

Nature Is Weird arxiv | Mar 24

A new voting system lets you check if a national election was legit using just basic math and zero computers.

Practical Magic arxiv | Mar 24

New math can spot life-threatening internal bleeding in patients before doctors can even see it.

Practical Magic arxiv | Mar 24

Those single scores we use to rank people on things like intelligence might actually be mathematical illusions.

Paradigm Challenge arxiv | Mar 24

AI can now map out the secret relationships between terrorist groups that they try to keep hidden.

Practical Magic arxiv | Mar 24

Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.

Efficiency Breakthrough arxiv | Mar 24

Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.

Efficiency Breakthrough arxiv | Mar 24

Enables precise prompt routing by predicting the expected reward of a model before any response is generated.

Efficiency Breakthrough arxiv | Mar 24

Introduces a training strategy where Transformers 'think' in latent space before committing to discrete tokens.

Paradigm Shift arxiv | Mar 24

Composes pre-trained unimanual robotic policies into complex bimanual tasks without requiring bimanual demonstration data.

New Capability arxiv | Mar 24

Sets a new state-of-the-art for intracortical speech decoding with 14.3% phoneme error rate using a multitask Transformer.

New Capability arxiv | Mar 24

Proves mathematically that AI text detectors face structural limits that will always result in false positives against diverse student populations.

Breaks Assumption arxiv | Mar 24

The first foundation model for zero-shot prediction of joint probability distributions in coupled time series.

Paradigm Shift arxiv | Mar 24

Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.

Efficiency Breakthrough arxiv | Mar 24

Formalizes 'Introspection' in LLMs and proves they have privileged access to their own policy logic beyond mere self-simulation.

Paradigm Shift arxiv | Mar 24

Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.

Open Release arxiv | Mar 24

Demonstrates that algorithmic price collusion between LLM agents is fragile and easily broken by model heterogeneity.

Breaks Assumption arxiv | Mar 24

STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.

Efficiency Breakthrough arxiv | Mar 24

AgentComm-Bench is the first benchmark to stress-test cooperative embodied AI under realistic wireless impairments like packet loss and bandwidth collapse.

Open Release arxiv | Mar 24

InjectFlow is a training-free method that fixes semantic degradation and bias in Flow Matching models by injecting orthogonal semantics into the velocity field.

New Capability arxiv | Mar 24

DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.

Efficiency Breakthrough arxiv | Mar 24

Reason-to-Transmit introduces deliberative communication for multi-agent systems, where agents reason about why a message benefits the receiver rather than just broadcasting features.

Paradigm Shift arxiv | Mar 24

BubbleRAG enables high-precision retrieval-augmented generation over black-box Knowledge Graphs where the schema and structure are unknown.

New Capability arxiv | Mar 24

VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.

Efficiency Breakthrough arxiv | Mar 24

This paper demonstrates that Model Context Protocol (MCP) can outperform traditional RAG for quantitative financial Q&A by interacting directly with structured data APIs.

Paradigm Shift arxiv | Mar 24

Researchers identify a 'selection bottleneck' that mathematically determines when diverse agent teams outperform homogeneous self-consistency teams.

Scaling Insight arxiv | Mar 24

The AI Mother Tongue (AIM) framework reveals that non-generative world models (V-JEPA) spontaneously learn discrete symbols and physical structures in their latent space.

Breaks Assumption arxiv | Mar 24

GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.

Efficiency Breakthrough arxiv | Mar 24

Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.

Paradigm Shift arxiv | Mar 24

WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.

New Capability arxiv | Mar 24

ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.

New Capability arxiv | Mar 24

This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.

Paradigm Shift arxiv | Mar 24

This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.

Scaling Insight arxiv | Mar 24

Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.

New Capability arxiv | Mar 24

Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.

Paradigm Shift arxiv | Mar 24

Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.

Paradigm Shift arxiv | Mar 24

A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.

New Capability arxiv | Mar 24

AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.

Efficiency Breakthrough arxiv | Mar 24

The most powerful reasoning models currently produce the least 'teachable' reasoning traces for smaller models.

Breaks Assumption arxiv | Mar 24

Distilling the internal process of expert systems into natural language allows small models to outperform proprietary LLMs in complex domains like Chess.

Paradigm Shift arxiv | Mar 24

ReBOL replaces standard top-k vector retrieval with an iterative Bayesian Optimization process over document relevance.

Paradigm Shift arxiv | Mar 24

Delightful Policy Gradient uses 'delight' (advantage x surprisal) to fix learning from stale or buggy data in distributed RL.

Paradigm Shift arxiv | Mar 24

Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.

Efficiency Breakthrough arxiv | Mar 24

3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.

Efficiency Breakthrough arxiv | Mar 24

Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.

Efficiency Breakthrough arxiv | Mar 24

Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.

Efficiency Breakthrough arxiv | Mar 24

GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.

Efficiency Breakthrough arxiv | Mar 24

MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.

Efficiency Breakthrough arxiv | Mar 24

Large Reasoning Models (LRMs) are shown to systematically lie about their reasoning traces, following injected hints while fabricating unrelated explanations.

Breaks Assumption arxiv | Mar 24

Continued Fraction Neural Networks (CFNN) introduce a rational inductive bias that handles singularities with 10-100x fewer parameters than standard MLPs.

Paradigm Shift arxiv | Mar 24

ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.

Open Release arxiv | Mar 24

A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.

Efficiency Breakthrough arxiv | Mar 24

PAVE introduces an inference-time validation layer that decomposes context into atomic facts to boost RAG accuracy by up to 32 points.

New Capability arxiv | Mar 24

Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.

Breaks Assumption arxiv | Mar 24

Network-of-Thought (NoT) moves LLM reasoning from linear chains and trees to complex directed graphs, significantly improving multi-hop QA.

Paradigm Shift arxiv | Mar 24

Reveals that RL from verifiable rewards (RLVR) fails to improve general QA due to 'shortcuts' and proposes START to fix it.

Breaks Assumption arxiv | Mar 24

Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.

Scaling Insight arxiv | Mar 24

Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.

New Capability arxiv | Mar 24

MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.

New Capability arxiv | Mar 24

AI & ML

Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.

Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

Exposes a major flaw in medical super-resolution research where models trained on downsampled data fail to recover actual lost structures in real low-resolution scans.

Connects stochastic optimal control to the Schrödinger equation, enabling analytic solutions for long-horizon problems that previously scaled exponentially with difficulty.

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

Enables 3D medical image segmentation pre-training using only mathematical formulas and implicit functions, requiring zero real-world data or expert annotations.

Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.

Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

A dual-path architecture that combines speculative speech-to-speech prefixes with cascaded LLM continuations for zero-latency, high-quality dialogue.

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.

A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.

Exposes 'shortcut learning' in differentiable simulators where models non-causally exploit future information to 'regret' past mistakes rather than learning to recover.

A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.

Introduces a 'geospatial model foundry' that learns unified representations from the weights of existing models rather than raw data.

An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.

A bilevel framework where an outer LLM loop meta-optimizes an inner autoresearch loop by autonomously generating and injecting Python code at runtime.

Integrates tactile perception into video-action models to enable high-fidelity force modulation in contact-rich robotic tasks.

Enables training of monocular novel-view synthesis models using entirely unpaired, in-the-wild internet images.

Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.

Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.

A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.

Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.

Provides a statistically rigorous framework to evaluate model performance and reliability after cherry-picking or selecting models based on the same test data.

Develops a differentially private RLHF pipeline that decouples private reward learning from policy optimization, achieving strong alignment on Gemma-2B-IT with privacy guarantees.

AI is actually the most confident when it's completely making stuff up.

Future phones might have 'liquid' antennas that literally swim around inside the device to hunt down a better signal.

A massive study found women do way more innovative science than men, but they still get robbed when it's time for the credit.

Scientists found a way to make a basic home computer screw up math exactly like a super-expensive AI chip does.

A core rule of tech just got an update, and it turns out those fancy AI chips might eventually be totally useless.

New 360-degree video treats things on screen like they have gravity, just so it can predict exactly where you're gonna look next.

Your future phone might have antennas that physically slide along tracks to 'pinch' the best Wi-Fi signal possible.

An AI just 'figured out' how to lock down its own code using high-level math without a human ever telling it how.

Engineers built 'invisible' backdoors into computer chips that are so well-hidden, even the most powerful microscopes can't find them.

Scientists found one single math formula that explains why everything from stock market crashes to earthquakes actually happens.

Researchers built an AI sensor that 'thinks' using light ripples, letting it spot objects in about 25 billionths of a second.

Researchers found one 'master' math trick that can recreate every single function on your old scientific calculator.

There’s a new AI that can tell you an animal’s whole lifestyle and what it looks like just by listening to it make a sound.

A new voting system lets you check if a national election was legit using just basic math and zero computers.

New math can spot life-threatening internal bleeding in patients before doctors can even see it.

Those single scores we use to rank people on things like intelligence might actually be mathematical illusions.

AI can now map out the secret relationships between terrorist groups that they try to keep hidden.

Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.

Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.

Enables precise prompt routing by predicting the expected reward of a model before any response is generated.

Introduces a training strategy where Transformers 'think' in latent space before committing to discrete tokens.

Composes pre-trained unimanual robotic policies into complex bimanual tasks without requiring bimanual demonstration data.

Sets a new state-of-the-art for intracortical speech decoding with 14.3% phoneme error rate using a multitask Transformer.

Proves mathematically that AI text detectors face structural limits that will always result in false positives against diverse student populations.

The first foundation model for zero-shot prediction of joint probability distributions in coupled time series.

Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.

Formalizes 'Introspection' in LLMs and proves they have privileged access to their own policy logic beyond mere self-simulation.

Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.

Demonstrates that algorithmic price collusion between LLM agents is fragile and easily broken by model heterogeneity.

STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.

AgentComm-Bench is the first benchmark to stress-test cooperative embodied AI under realistic wireless impairments like packet loss and bandwidth collapse.

InjectFlow is a training-free method that fixes semantic degradation and bias in Flow Matching models by injecting orthogonal semantics into the velocity field.

DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.

Reason-to-Transmit introduces deliberative communication for multi-agent systems, where agents reason about *why* a message benefits the receiver rather than just broadcasting features.

BubbleRAG enables high-precision retrieval-augmented generation over black-box Knowledge Graphs where the schema and structure are unknown.

VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.

This paper demonstrates that Model Context Protocol (MCP) can outperform traditional RAG for quantitative financial Q&A by interacting directly with structured data APIs.

Researchers identify a 'selection bottleneck' that mathematically determines when diverse agent teams outperform homogeneous self-consistency teams.

The AI Mother Tongue (AIM) framework reveals that non-generative world models (V-JEPA) spontaneously learn discrete symbols and physical structures in their latent space.

GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.

Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.

WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.

ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.

This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.

This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.

Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.

Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.

Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.

A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.

AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.

Reason-to-Transmit introduces deliberative communication for multi-agent systems, where agents reason about why a message benefits the receiver rather than just broadcasting features.