AI & ML

2079 papers · Page 20 of 21

FlashU is the first framework to accelerate unified multimodal models by exploiting the distinct neuron sets used for generation vs. understanding.

Efficiency Breakthrough arxiv | Mar 17

GVC1D achieves over 60% bitrate reduction in video compression by replacing standard 2D latent grids with compact 1D latent tokens.

Paradigm Shift arxiv | Mar 17

Tagarela releases 8,972 hours of high-quality Portuguese podcast audio, rivaling the scale of GigaSpeech for English.

Open Release arxiv | Mar 17

MeMix is a training-free, plug-and-play module that reduces 3D reconstruction error by up to 40% in long sequences by mitigating state drift.

Efficiency Breakthrough arxiv | Mar 17

FuXiWeather2 is a unified end-to-end neural framework for weather assimilation and forecasting that outperforms global operational systems.

New Capability arxiv | Mar 17

This paper proves that increasing test-time compute via beam search can actually hurt LLM reasoning performance due to overestimation bias.

Scaling Insight arxiv | Mar 17

Sparsity (MoE and GQA) is found to act as a critical regulator for variance propagation, mitigating the 'curse of depth' in LLMs.

Scaling Insight arxiv | Mar 17

Test-time reinforcement learning (TTRL) is found to amplify model harmfulness and jailbreak vulnerability when exposed to malicious prompt injections.

Breaks Assumption arxiv | Mar 17

A large-scale study reveals that 78% of AI failures are 'invisible,' where the system fails without the user realizing or indicating an error.

Paradigm Shift arxiv | Mar 17

Incorporating PDE residuals into fine-tuning allows pre-trained physics foundation models to adapt to new tasks without requiring ground-truth solutions.

New Capability arxiv | Mar 17

PrismMirror is the first monocular human frontal view synthesis model to achieve real-time inference (24 FPS) without external geometric models.

Efficiency Breakthrough arxiv | Mar 17

Challenges the 'Flat Minima' hypothesis by showing that grokking is driven by anisotropic noise rectification rather than finding flat regions.

Breaks Assumption arxiv | Mar 17

A 4B parameter model matches a 120B parameter model in program verification through a rigorous data curation pipeline.

Efficiency Breakthrough arxiv | Mar 17

Bridges the gap between generative (MAE) and predictive (I-JEPA) self-supervised learning, achieving a 10% performance boost.

Efficiency Breakthrough arxiv | Mar 17

Mamba-3 introduces MIMO formulations and complex-valued updates to solve the state-tracking failures of previous linear models.

New Capability arxiv | Mar 17

Democratizes the development of 'Deep Search' agents by open-sourcing the specialized training data and trajectory synthesis methods.

Open Release arxiv | Mar 17

Proves that simple deterministic ranking beats expensive LLM-based structuring for conversational memory retrieval.

Breaks Assumption arxiv | Mar 17

Accelerates state-of-the-art 3D human mesh recovery by over 10x, enabling real-time vision-only humanoid teleoperation.

Efficiency Breakthrough arxiv | Mar 17

Introduces an adversarial co-evolution framework where Code and Test LLMs optimize against each other to improve code generation.

Paradigm Shift arxiv | Mar 17

Uses Sparse Autoencoders (SAEs) to mechanisticially repair 'moral indifference' in LLM latent representations.

New Capability arxiv | Mar 17

A benchmark for unsolved math problems with automated verification, enabling the measurement of true mathematical discovery.

New Capability arxiv | Mar 17

Introduces Mixture-of-Depths Attention (MoDA) to solve signal degradation in deep LLMs with hardware-efficient implementation.

Efficiency Breakthrough arxiv | Mar 17

Proves that standard acquisition functions like UCB are sufficient for asynchronous Bayesian Optimization, debunking the need for complex diversity-enforcing strategies.

Breaks Assumption arxiv | Mar 17

Settles the long-standing practitioner debate over whether to use training or holdout data for interpreting black-box models with PD/ALE plots.

Breaks Assumption arxiv | Mar 17

Enables Bayesian model selection and joint posterior inference over combinatorial spaces of up to billions of simulator model instantiations.

New Capability arxiv | Mar 17

Achieves 1,000x speedups in Bayesian inverse problems by replacing repeated MCMC sampling with one-step preconditioned generative transport.

Efficiency Breakthrough arxiv | Mar 17

Imagine a paper-thin sticker you can slap on a wall to listen to the room next door, and get this—it doesn't even need a battery.

Practical Magic arxiv | Mar 16

Future 6G antennas are going to literally slide around on your phone to grab a signal so sharp it shouldn't even be possible.

Paradigm Challenge arxiv | Mar 16

ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.

Efficiency Breakthrough arxiv | Mar 16

This paper proposes a method to align and personalize LLMs directly from raw user interactions using self-distillation, bypassing the need for explicit human labels or RLHF.

Paradigm Shift arxiv | Mar 16

The researchers demonstrate that prompt injection is caused by 'role confusion' in the latent space, where models assign authority based on the style of writing rather than the source of the text.

Breaks Assumption arxiv | Mar 16

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

Breaks Assumption arxiv | Mar 16

Introduces the Budget-Sensitive Discovery Score (BSDS), a formally verified metric machine-checked in Lean 4 for evaluating AI-guided scientific candidate selection.

Paradigm Shift arxiv | Mar 16

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

Efficiency Breakthrough arxiv | Mar 16

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

Breaks Assumption arxiv | Mar 16

SpectralGuard identifies a 'memory collapse' vulnerability in State Space Models (like Mamba) where adversarial inputs can drive the transition operator's spectral radius to zero.

Breaks Assumption arxiv | Mar 16

Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).

Open Release arxiv | Mar 16

This paper establishes a systematic protocol for 'stitching' heterogeneous Vision Foundation Models (e.g., CLIP and DINOv2) to share early layers while retaining specialized capabilities.

Paradigm Shift arxiv | Mar 16

Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.

Efficiency Breakthrough arxiv | Mar 16

Introduces Modal Logical Neural Networks (MLNNs) as a differentiable logic layer that bridges deep learning with symbolic Kripke semantics for regulated AI.

Paradigm Shift arxiv | Mar 16

Demonstrates a robot that improves its own locomotion by identifying and physically 'self-destructing' redundant or inhibiting limbs during its lifetime.

Paradigm Shift arxiv | Mar 16

Enables training-free infinite video generation (hour-scale) by using evolving memory tokens to solve identity drift and motion stagnation.

New Capability arxiv | Mar 16

Reveals that standard global correlation metrics for LLM judges fail to predict success in 'best-of-n' selection tasks due to within-prompt signal loss.

Breaks Assumption arxiv | Mar 16

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

Efficiency Breakthrough arxiv | Mar 16

Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.

Scaling Insight arxiv | Mar 16

Derives an exact, unbiased policy gradient for Reinforcement Learning on Diffusion LLMs, bypassing the need for sequence-level likelihood approximations.

Paradigm Shift arxiv | Mar 16

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

Breaks Assumption arxiv | Mar 16

Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.

Efficiency Breakthrough arxiv | Mar 16

Challenges the standard practice of deep PPO training by proving that consensus aggregation of 'wider' parallel runs is 8x more sample efficient than multiple epochs.

Breaks Assumption arxiv | Mar 16

Releases Feynman, an agentic pipeline and 100k-sample dataset for generating high-quality, knowledge-rich diagrams with grounded captions.

Open Release arxiv | Mar 16

Introduces the largest-ever multi-modal CAD dataset with 10 million annotations for 1 million models to enable geometric deep learning on BRep data.

Open Release arxiv | Mar 16

Unlocks Maximum Entropy RL for high-dimensional humanoid control, matching or doubling the performance of dominant deterministic baselines.

New Capability arxiv | Mar 16

Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.

Efficiency Breakthrough arxiv | Mar 16

Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.

Efficiency Breakthrough arxiv | Mar 16

Proposes modeling the world in the feature space of frozen geometry foundation models instead of pixels, achieving 5x faster depth forecasting.

Paradigm Shift arxiv | Mar 16

A retrosynthesis model that explicitly learns strategic bond-disconnection reasoning via reinforcement learning with a round-trip accuracy reward.

New Capability arxiv | Mar 16

Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.

Scaling Insight arxiv | Mar 16

A new system enables humanoid robots to play competitive tennis rallies with humans by learning from imperfect, fragmented motion data.

New Capability arxiv | Mar 16

Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.

Scaling Insight arxiv | Mar 16

Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.

Efficiency Breakthrough arxiv | Mar 16

Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.

Breaks Assumption arxiv | Mar 16

SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.

New Capability arxiv | Mar 16

A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.

Efficiency Breakthrough arxiv | Mar 16

The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.

Breaks Assumption arxiv | Mar 16

A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.

New Capability arxiv | Mar 16

AIM enables post-training modulation of large models to change utility levels or focus features without any retraining or additional data.

New Capability arxiv | Mar 16

Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.

Efficiency Breakthrough arxiv | Mar 16

First training-free method for debiasing reward models using Sparse Autoencoder (SAE) interventions.

New Capability arxiv | Mar 16

Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.

Breaks Assumption arxiv | Mar 16

A flow-based navigation policy that achieves zero-shot sim-to-real transfer across wheeled, quadrupedal, and humanoid platforms.

New Capability arxiv | Mar 16

A small-scale molecular reasoning model that outperforms ultra-large foundation models via structured chain-of-thought and RL.

Paradigm Shift arxiv | Mar 16

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

Efficiency Breakthrough arxiv | Mar 16

Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.

Efficiency Breakthrough arxiv | Mar 16

Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.

Breaks Assumption arxiv | Mar 16

MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.

New Capability arxiv | Mar 16

ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.

Paradigm Shift arxiv | Mar 16

This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.

Breaks Assumption arxiv | Mar 16

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

Efficiency Breakthrough arxiv | Mar 16

This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.

Breaks Assumption arxiv | Mar 16

Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.

Efficiency Breakthrough arxiv | Mar 16

Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.

New Capability arxiv | Mar 16

A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.

Breaks Assumption arxiv | Mar 16

Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.

Paradigm Shift arxiv | Mar 16

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

Breaks Assumption arxiv | Mar 16

Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.

Efficiency Breakthrough arxiv | Mar 16

Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.

Efficiency Breakthrough arxiv | Mar 16

Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.

Paradigm Shift arxiv | Mar 16

Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.

Efficiency Breakthrough arxiv | Mar 16

Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.

Efficiency Breakthrough arxiv | Mar 16

Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.

Breaks Assumption arxiv | Mar 16

STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.

Breaks Assumption arxiv | Mar 16

Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.

New Capability arxiv | Mar 16

Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.

Paradigm Shift arxiv | Mar 16

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

Scaling Insight arxiv | Mar 16

Time moving forward might just be a glitch caused by the universe being bad at copying its own homework.

Paradigm Challenge arxiv | Mar 13

We’ve finally made digital messages that are physically impossible to copy—even a perfect hacker couldn't do it because physics won't allow it.

Practical Magic arxiv | Mar 13

Scientists built an AI that treats crop-raiding elephants like chess opponents to predict exactly where they’ll strike next.

Nature Is Weird arxiv | Mar 13

The massive satellite network the government uses is accidentally blasting out people's private passwords in plain text for anyone to see.

Cosmic Scale arxiv | Mar 13

OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.

Open Release arxiv | Mar 13

Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.

Scaling Insight arxiv | Mar 13