AI & ML

1848 papers · Page 18 of 19

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

Breaks Assumption arxiv | Mar 16

Introduces the Budget-Sensitive Discovery Score (BSDS), a formally verified metric machine-checked in Lean 4 for evaluating AI-guided scientific candidate selection.

Paradigm Shift arxiv | Mar 16

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

Efficiency Breakthrough arxiv | Mar 16

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

Breaks Assumption arxiv | Mar 16

SpectralGuard identifies a 'memory collapse' vulnerability in State Space Models (like Mamba) where adversarial inputs can drive the transition operator's spectral radius to zero.

Breaks Assumption arxiv | Mar 16

Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).

Open Release arxiv | Mar 16

This paper establishes a systematic protocol for 'stitching' heterogeneous Vision Foundation Models (e.g., CLIP and DINOv2) to share early layers while retaining specialized capabilities.

Paradigm Shift arxiv | Mar 16

Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.

Efficiency Breakthrough arxiv | Mar 16

Introduces Modal Logical Neural Networks (MLNNs) as a differentiable logic layer that bridges deep learning with symbolic Kripke semantics for regulated AI.

Paradigm Shift arxiv | Mar 16

Demonstrates a robot that improves its own locomotion by identifying and physically 'self-destructing' redundant or inhibiting limbs during its lifetime.

Paradigm Shift arxiv | Mar 16

Enables training-free infinite video generation (hour-scale) by using evolving memory tokens to solve identity drift and motion stagnation.

New Capability arxiv | Mar 16

Reveals that standard global correlation metrics for LLM judges fail to predict success in 'best-of-n' selection tasks due to within-prompt signal loss.

Breaks Assumption arxiv | Mar 16

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

Efficiency Breakthrough arxiv | Mar 16

Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.

Scaling Insight arxiv | Mar 16

Derives an exact, unbiased policy gradient for Reinforcement Learning on Diffusion LLMs, bypassing the need for sequence-level likelihood approximations.

Paradigm Shift arxiv | Mar 16

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

Breaks Assumption arxiv | Mar 16

Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.

Efficiency Breakthrough arxiv | Mar 16

Challenges the standard practice of deep PPO training by proving that consensus aggregation of 'wider' parallel runs is 8x more sample efficient than multiple epochs.

Breaks Assumption arxiv | Mar 16

Releases Feynman, an agentic pipeline and 100k-sample dataset for generating high-quality, knowledge-rich diagrams with grounded captions.

Open Release arxiv | Mar 16

Introduces the largest-ever multi-modal CAD dataset with 10 million annotations for 1 million models to enable geometric deep learning on BRep data.

Open Release arxiv | Mar 16

Unlocks Maximum Entropy RL for high-dimensional humanoid control, matching or doubling the performance of dominant deterministic baselines.

New Capability arxiv | Mar 16

Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.

Efficiency Breakthrough arxiv | Mar 16

Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.

Efficiency Breakthrough arxiv | Mar 16

Proposes modeling the world in the feature space of frozen geometry foundation models instead of pixels, achieving 5x faster depth forecasting.

Paradigm Shift arxiv | Mar 16

A retrosynthesis model that explicitly learns strategic bond-disconnection reasoning via reinforcement learning with a round-trip accuracy reward.

New Capability arxiv | Mar 16

Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.

Scaling Insight arxiv | Mar 16

A new system enables humanoid robots to play competitive tennis rallies with humans by learning from imperfect, fragmented motion data.

New Capability arxiv | Mar 16

Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.

Scaling Insight arxiv | Mar 16

Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.

Efficiency Breakthrough arxiv | Mar 16

Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.

Breaks Assumption arxiv | Mar 16

SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.

New Capability arxiv | Mar 16

A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.

Efficiency Breakthrough arxiv | Mar 16

The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.

Breaks Assumption arxiv | Mar 16

A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.

New Capability arxiv | Mar 16

AIM enables post-training modulation of large models to change utility levels or focus features without any retraining or additional data.

New Capability arxiv | Mar 16

Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.

Efficiency Breakthrough arxiv | Mar 16

First training-free method for debiasing reward models using Sparse Autoencoder (SAE) interventions.

New Capability arxiv | Mar 16

Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.

Breaks Assumption arxiv | Mar 16

A flow-based navigation policy that achieves zero-shot sim-to-real transfer across wheeled, quadrupedal, and humanoid platforms.

New Capability arxiv | Mar 16

A small-scale molecular reasoning model that outperforms ultra-large foundation models via structured chain-of-thought and RL.

Paradigm Shift arxiv | Mar 16

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

Efficiency Breakthrough arxiv | Mar 16

Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.

Efficiency Breakthrough arxiv | Mar 16

Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.

Breaks Assumption arxiv | Mar 16

MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.

New Capability arxiv | Mar 16

ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.

Paradigm Shift arxiv | Mar 16

This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.

Breaks Assumption arxiv | Mar 16

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

Efficiency Breakthrough arxiv | Mar 16

This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.

Breaks Assumption arxiv | Mar 16

Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.

Efficiency Breakthrough arxiv | Mar 16

Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.

New Capability arxiv | Mar 16

A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.

Breaks Assumption arxiv | Mar 16

Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.

Paradigm Shift arxiv | Mar 16

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

Breaks Assumption arxiv | Mar 16

Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.

Efficiency Breakthrough arxiv | Mar 16

Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.

Efficiency Breakthrough arxiv | Mar 16

Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.

Paradigm Shift arxiv | Mar 16

Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.

Efficiency Breakthrough arxiv | Mar 16

Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.

Efficiency Breakthrough arxiv | Mar 16

Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.

Breaks Assumption arxiv | Mar 16

STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.

Breaks Assumption arxiv | Mar 16

Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.

New Capability arxiv | Mar 16

Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.

Paradigm Shift arxiv | Mar 16

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

Scaling Insight arxiv | Mar 16

Time moving forward might just be a glitch caused by the universe being bad at copying its own homework.

Paradigm Challenge arxiv | Mar 13

We’ve finally made digital messages that are physically impossible to copy—even a perfect hacker couldn't do it because physics won't allow it.

Practical Magic arxiv | Mar 13

Scientists built an AI that treats crop-raiding elephants like chess opponents to predict exactly where they’ll strike next.

Nature Is Weird arxiv | Mar 13

The massive satellite network the government uses is accidentally blasting out people's private passwords in plain text for anyone to see.

Cosmic Scale arxiv | Mar 13

OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.

Open Release arxiv | Mar 13

Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.

Scaling Insight arxiv | Mar 13

This paper introduces a graph tokenization framework that allows standard Transformers like BERT to beat specialized Graph Neural Networks without any architectural changes.

Paradigm Shift arxiv | Mar 13

The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.

Efficiency Breakthrough arxiv | Mar 13

Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.

Breaks Assumption arxiv | Mar 13

A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.

New Capability arxiv | Mar 13

REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.

Efficiency Breakthrough arxiv | Mar 13

PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.

Efficiency Breakthrough arxiv | Mar 13

Continual Representation Learning (CoRe) moves PEFT from weight-level updates to representation-space interventions, solving catastrophic forgetting in dynamic environments.

Paradigm Shift arxiv | Mar 13

Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.

Scaling Insight arxiv | Mar 13

SoLA introduces the first reversible model editing framework that allows precise revocation of specific knowledge updates.

New Capability arxiv | Mar 13

LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.

Breaks Assumption arxiv | Mar 13

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

Breaks Assumption arxiv | Mar 13

InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.

Efficiency Breakthrough arxiv | Mar 13

Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.

Paradigm Shift arxiv | Mar 13

HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.

Paradigm Shift arxiv | Mar 13

Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.

Scaling Insight arxiv | Mar 13

RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.

New Capability arxiv | Mar 13

TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.

Efficiency Breakthrough arxiv | Mar 13

MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.

Breaks Assumption arxiv | Mar 13

An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.

Breaks Assumption arxiv | Mar 13

This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.

Paradigm Shift arxiv | Mar 13

Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.

Breaks Assumption arxiv | Mar 13

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

Efficiency Breakthrough arxiv | Mar 13

Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.

Breaks Assumption arxiv | Mar 13

LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.

Efficiency Breakthrough arxiv | Mar 13

Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.

Paradigm Shift arxiv | Mar 13

Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.

Open Release arxiv | Mar 13

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

Breaks Assumption arxiv | Mar 13

Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.

Efficiency Breakthrough arxiv | Mar 13

Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.

Paradigm Shift arxiv | Mar 13

RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.

New Capability arxiv | Mar 13

The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.

Breaks Assumption arxiv | Mar 13