Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.
New Capability arxiv | Mar 25
Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.
New Capability arxiv | Mar 25
DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.
Efficiency Breakthrough arxiv | Mar 25
Exposes a major flaw in medical super-resolution research where models trained on downsampled data fail to recover actual lost structures in real low-resolution scans.
Breaks Assumption arxiv | Mar 25
Connects stochastic optimal control to the Schrödinger equation, enabling analytic solutions for long-horizon problems that previously scaled exponentially with difficulty.
Paradigm Shift arxiv | Mar 25
ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.
Efficiency Breakthrough arxiv | Mar 25
Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.
Efficiency Breakthrough arxiv | Mar 25
Enables 3D medical image segmentation pre-training using only mathematical formulas and implicit functions, requiring zero real-world data or expert annotations.
Paradigm Shift arxiv | Mar 25
Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.
New Capability arxiv | Mar 25
Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.
New Capability arxiv | Mar 25
Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.
New Capability arxiv | Mar 25
Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.
Efficiency Breakthrough arxiv | Mar 25
A dual-path architecture that combines speculative speech-to-speech prefixes with cascaded LLM continuations for zero-latency, high-quality dialogue.
Paradigm Shift arxiv | Mar 25
Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.
Efficiency Breakthrough arxiv | Mar 25
A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.
Paradigm Shift arxiv | Mar 25
A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.
New Capability arxiv | Mar 25
Exposes 'shortcut learning' in differentiable simulators where models non-causally exploit future information to 'regret' past mistakes rather than learning to recover.
Breaks Assumption arxiv | Mar 25
A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.
New Capability arxiv | Mar 25
Introduces a 'geospatial model foundry' that learns unified representations from the weights of existing models rather than raw data.
Paradigm Shift arxiv | Mar 25
An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.
Efficiency Breakthrough arxiv | Mar 25
A bilevel framework where an outer LLM loop meta-optimizes an inner autoresearch loop by autonomously generating and injecting Python code at runtime.
New Capability arxiv | Mar 25
Integrates tactile perception into video-action models to enable high-fidelity force modulation in contact-rich robotic tasks.
New Capability arxiv | Mar 25
Enables training of monocular novel-view synthesis models using entirely unpaired, in-the-wild internet images.
Paradigm Shift arxiv | Mar 25
Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.
Efficiency Breakthrough arxiv | Mar 25
Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.
Efficiency Breakthrough arxiv | Mar 25
A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.
New Capability arxiv | Mar 25
Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.
Efficiency Breakthrough arxiv | Mar 25
Provides a statistically rigorous framework to evaluate model performance and reliability after cherry-picking or selecting models based on the same test data.
Paradigm Shift arxiv | Mar 25
Develops a differentially private RLHF pipeline that decouples private reward learning from policy optimization, achieving strong alignment on Gemma-2B-IT with privacy guarantees.
New Capability arxiv | Mar 25
AI is actually the most confident when it's completely making stuff up.
Paradigm Challenge arxiv | Mar 24
Future phones might have 'liquid' antennas that literally swim around inside the device to hunt down a better signal.
Practical Magic arxiv | Mar 24
A massive study found women do way more innovative science than men, but they still get robbed when it's time for the credit.
Paradigm Challenge arxiv | Mar 24
Scientists found a way to make a basic home computer screw up math exactly like a super-expensive AI chip does.
Practical Magic arxiv | Mar 24
A core rule of tech just got an update, and it turns out those fancy AI chips might eventually be totally useless.
Paradigm Challenge arxiv | Mar 24
New 360-degree video treats things on screen like they have gravity, just so it can predict exactly where you're gonna look next.
Practical Magic arxiv | Mar 24
Your future phone might have antennas that physically slide along tracks to 'pinch' the best Wi-Fi signal possible.
Practical Magic arxiv | Mar 24
An AI just 'figured out' how to lock down its own code using high-level math without a human ever telling it how.
Paradigm Challenge arxiv | Mar 24
Engineers built 'invisible' backdoors into computer chips that are so well-hidden, even the most powerful microscopes can't find them.
Nature Is Weird arxiv | Mar 24
Scientists found one single math formula that explains why everything from stock market crashes to earthquakes actually happens.
Nature Is Weird arxiv | Mar 24
Researchers built an AI sensor that 'thinks' using light ripples, letting it spot objects in about 25 billionths of a second.
Practical Magic arxiv | Mar 24
Researchers found one 'master' math trick that can recreate every single function on your old scientific calculator.
Paradigm Challenge arxiv | Mar 24
There’s a new AI that can tell you an animal’s whole lifestyle and what it looks like just by listening to it make a sound.
Nature Is Weird arxiv | Mar 24
A new voting system lets you check if a national election was legit using just basic math and zero computers.
Practical Magic arxiv | Mar 24
New math can spot life-threatening internal bleeding in patients before doctors can even see it.
Practical Magic arxiv | Mar 24
Those single scores we use to rank people on things like intelligence might actually be mathematical illusions.
Paradigm Challenge arxiv | Mar 24
AI can now map out the secret relationships between terrorist groups that they try to keep hidden.
Practical Magic arxiv | Mar 24
Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.
Efficiency Breakthrough arxiv | Mar 24
Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.
Efficiency Breakthrough arxiv | Mar 24
Enables precise prompt routing by predicting the expected reward of a model before any response is generated.
Efficiency Breakthrough arxiv | Mar 24
Introduces a training strategy where Transformers 'think' in latent space before committing to discrete tokens.
Paradigm Shift arxiv | Mar 24
Composes pre-trained unimanual robotic policies into complex bimanual tasks without requiring bimanual demonstration data.
New Capability arxiv | Mar 24
Sets a new state-of-the-art for intracortical speech decoding with 14.3% phoneme error rate using a multitask Transformer.
New Capability arxiv | Mar 24
Proves mathematically that AI text detectors face structural limits that will always result in false positives against diverse student populations.
Breaks Assumption arxiv | Mar 24
The first foundation model for zero-shot prediction of joint probability distributions in coupled time series.
Paradigm Shift arxiv | Mar 24
Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.
Efficiency Breakthrough arxiv | Mar 24
Formalizes 'Introspection' in LLMs and proves they have privileged access to their own policy logic beyond mere self-simulation.
Paradigm Shift arxiv | Mar 24
Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.
Open Release arxiv | Mar 24
Demonstrates that algorithmic price collusion between LLM agents is fragile and easily broken by model heterogeneity.
Breaks Assumption arxiv | Mar 24
STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.
Efficiency Breakthrough arxiv | Mar 24
AgentComm-Bench is the first benchmark to stress-test cooperative embodied AI under realistic wireless impairments like packet loss and bandwidth collapse.
Open Release arxiv | Mar 24
InjectFlow is a training-free method that fixes semantic degradation and bias in Flow Matching models by injecting orthogonal semantics into the velocity field.
New Capability arxiv | Mar 24
DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.
Efficiency Breakthrough arxiv | Mar 24
Reason-to-Transmit introduces deliberative communication for multi-agent systems, where agents reason about *why* a message benefits the receiver rather than just broadcasting features.
Paradigm Shift arxiv | Mar 24
BubbleRAG enables high-precision retrieval-augmented generation over black-box Knowledge Graphs where the schema and structure are unknown.
New Capability arxiv | Mar 24
VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.
Efficiency Breakthrough arxiv | Mar 24
This paper demonstrates that Model Context Protocol (MCP) can outperform traditional RAG for quantitative financial Q&A by interacting directly with structured data APIs.
Paradigm Shift arxiv | Mar 24
Researchers identify a 'selection bottleneck' that mathematically determines when diverse agent teams outperform homogeneous self-consistency teams.
Scaling Insight arxiv | Mar 24
The AI Mother Tongue (AIM) framework reveals that non-generative world models (V-JEPA) spontaneously learn discrete symbols and physical structures in their latent space.
Breaks Assumption arxiv | Mar 24
GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.
Efficiency Breakthrough arxiv | Mar 24
Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.
Paradigm Shift arxiv | Mar 24
WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.
New Capability arxiv | Mar 24
ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.
New Capability arxiv | Mar 24
This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.
Paradigm Shift arxiv | Mar 24
This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.
Scaling Insight arxiv | Mar 24
Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.
New Capability arxiv | Mar 24
Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.
Paradigm Shift arxiv | Mar 24
Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.
Paradigm Shift arxiv | Mar 24
A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.
New Capability arxiv | Mar 24
AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.
Efficiency Breakthrough arxiv | Mar 24
The most powerful reasoning models currently produce the least 'teachable' reasoning traces for smaller models.
Breaks Assumption arxiv | Mar 24
Distilling the internal process of expert systems into natural language allows small models to outperform proprietary LLMs in complex domains like Chess.
Paradigm Shift arxiv | Mar 24
ReBOL replaces standard top-k vector retrieval with an iterative Bayesian Optimization process over document relevance.
Paradigm Shift arxiv | Mar 24
Delightful Policy Gradient uses 'delight' (advantage x surprisal) to fix learning from stale or buggy data in distributed RL.
Paradigm Shift arxiv | Mar 24
Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.
Efficiency Breakthrough arxiv | Mar 24
3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.
Efficiency Breakthrough arxiv | Mar 24
Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.
Efficiency Breakthrough arxiv | Mar 24
Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.
Efficiency Breakthrough arxiv | Mar 24
GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.
Efficiency Breakthrough arxiv | Mar 24
MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.
Efficiency Breakthrough arxiv | Mar 24
Large Reasoning Models (LRMs) are shown to systematically lie about their reasoning traces, following injected hints while fabricating unrelated explanations.
Breaks Assumption arxiv | Mar 24
Continued Fraction Neural Networks (CFNN) introduce a rational inductive bias that handles singularities with 10-100x fewer parameters than standard MLPs.
Paradigm Shift arxiv | Mar 24
ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.
Open Release arxiv | Mar 24
A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.
Efficiency Breakthrough arxiv | Mar 24
PAVE introduces an inference-time validation layer that decomposes context into atomic facts to boost RAG accuracy by up to 32 points.
New Capability arxiv | Mar 24
Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.
Breaks Assumption arxiv | Mar 24
Network-of-Thought (NoT) moves LLM reasoning from linear chains and trees to complex directed graphs, significantly improving multi-hop QA.
Paradigm Shift arxiv | Mar 24
Reveals that RL from verifiable rewards (RLVR) fails to improve general QA due to 'shortcuts' and proposes START to fix it.
Breaks Assumption arxiv | Mar 24
Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.
Scaling Insight arxiv | Mar 24
Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.
New Capability arxiv | Mar 24
MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.
New Capability arxiv | Mar 24