LLM-generated summaries can produce patient embeddings that are more 'portable' and robust to hospital distribution shifts than specialized clinical models.
Paradigm Shift arxiv | Mar 26
A systematic critique explaining why 'self-improving' generative optimization loops fail in production and how to fix them.
Breaks Assumption arxiv | Mar 26
SDZE enables the training of 10-million-dimensional Physics-Informed Neural Networks (PINNs) on a single GPU.
New Capability arxiv | Mar 26
Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.
Efficiency Breakthrough arxiv | Mar 26
Solves the 'vanishing gradient' problem in 3D Gaussian Splatting (3DGS) tracking by optimizing in the frequency domain using spectral moments.
New Capability arxiv | Mar 26
Restores editable, semantically layered structures from flattened vector graphics (SVGs/icons) by using generative completion to recover occluded geometries.
New Capability arxiv | Mar 26
MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.
Efficiency Breakthrough arxiv | Mar 26
Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.
New Capability arxiv | Mar 26
SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.
New Capability arxiv | Mar 26
LLMpedia exposes a massive gap in LLM factuality by generating 1M articles from parametric memory, revealing that actual knowledge retrieval is 15%+ lower than multiple-choice benchmarks suggest.
Breaks Assumption arxiv | Mar 26
Proves that RLHF and DPO alignment cause 'response homogenization,' which effectively breaks standard sampling-based uncertainty estimation methods.
Breaks Assumption arxiv | Mar 26
Formalizes 'likelihood hacking,' a failure mode where RL-trained models learn to generate unnormalized probabilistic programs to artificially inflate rewards.
Paradigm Shift arxiv | Mar 26
Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.
Efficiency Breakthrough arxiv | Mar 26
Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.
Efficiency Breakthrough arxiv | Mar 26
Provides a systematic blueprint for scaling Reinforcement Learning (RL) in LLMs using multi-turn synthetic data generation and difficulty-based curricula.
Scaling Insight arxiv | Mar 26
A model-agnostic framework to boost time-series forecasting by aligning internal representations with those of pretrained foundation models.
Paradigm Shift arxiv | Mar 26
Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.
New Capability arxiv | Mar 26
Unifies input and predicted meshes under a shared topological framework to enable high-fidelity 3D reconstruction with sharp features.
Paradigm Shift arxiv | Mar 26
Releases a high-quality, 92K-sentence parallel dataset for Hindi-Sanskrit translation focusing on contemporary and spoken language.
Open Release arxiv | Mar 26
Quantifies an emergent 'self' in robots as an invariant subnetwork that persists across continual learning of variable tasks.
Paradigm Shift arxiv | Mar 26
Applies reinforcement learning with a cycle-consistency reward to drastically improve natural language to Lean4 autoformalization.
New Capability arxiv | Mar 26
A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.
Efficiency Breakthrough arxiv | Mar 26
Reformulates molecular discovery as an autonomous MCTS planning problem over executable chemical operations rather than just similarity-based prediction.
New Capability arxiv | Mar 26
Identifies a 'critical threshold' in human-AI symbiosis beyond which human capability collapses abruptly and irreversibly due to over-delegation.
Scaling Insight arxiv | Mar 26
Moves automated research from stateless linear pipelines to a persistent Research World Model that maintains a self-correcting knowledge graph of gaps and methods.
Paradigm Shift arxiv | Mar 26
Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.
Efficiency Breakthrough arxiv | Mar 26
Releases 55 hours of continuous 30fps expert human computer-use videos to address the 'missing ingredient' for desktop automation agents.
Open Release arxiv | Mar 26
Introduces a 'sorry-driven' formal decomposition that allows LLM agents to solve complex proofs by isolating and independently verifying subgoals.
Paradigm Shift arxiv | Mar 26
Reveals that self-distillation degrades out-of-distribution reasoning by suppressing 'epistemic verbalization' (the model's expression of uncertainty).
Breaks Assumption arxiv | Mar 26
Enforces hard incompressibility constraints in neural operators using spectral Leray projection, ensuring physically admissible fluid simulations.
Paradigm Shift arxiv | Mar 26
An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.
New Capability arxiv | Mar 26
Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.
Efficiency Breakthrough arxiv | Mar 26
UI-Voyager achieves an 81.0% success rate on AndroidWorld, exceeding human-level performance in mobile GUI automation.
New Capability arxiv | Mar 26
LensWalk introduces a 'reason-plan-observe' loop that allows agents to dynamically control the temporal sampling and density of the videos they analyze.
Paradigm Shift arxiv | Mar 26
The Free-Market Algorithm (FMA) is a zero-parameter metaheuristic that discovers complex pathways in chemistry and economics through emergent supply-and-demand dynamics.
Paradigm Shift arxiv | Mar 26
VFIG enables high-fidelity conversion of rasterized technical figures into editable, scalable SVGs using a new 66K-pair dataset.
Open Release arxiv | Mar 26
MARCH eliminates 'LLM-as-a-judge' confirmation bias by using information asymmetry to force verification agents to check claims without seeing the original response.
Paradigm Shift arxiv | Mar 26
DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.
Efficiency Breakthrough arxiv | Mar 26
The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.
Efficiency Breakthrough arxiv | Mar 26
Wasserstein Parallel Transport provides a formal framework for counterfactual prediction in evolving probability distributions.
New Capability arxiv | Mar 26
So there’s this new AI researcher that’s actually starting to fact-check real math papers and point out exactly where the professors messed up.
Paradigm Challenge arxiv | Mar 25
Get this: only about 10% of the computer code used in those fancy Nature papers actually works if you try to run it yourself.
Paradigm Challenge arxiv | Mar 25
Researchers figured out they could trick a robot into handing someone a knife instead of an apple using nothing but a printed drink coaster.
Practical Magic arxiv | Mar 25
Your AI assistant’s 'brain' can be secretly messed with by random emails in your inbox, changing how it treats you without you ever knowing.
Nature Is Weird arxiv | Mar 25
Imagine wireless internet that's actually as fast as a physical cable—no lag, no matter how many devices the signal bounces through.
Practical Magic arxiv | Mar 25
Effective semantic alignment for low-resource languages can be achieved with only 10,000 noisy synthetic pairs, matching the performance of models trained on 1 million samples.
Breaks Assumption arxiv | Mar 25
Mechanistic interpretability reveals that LLMs possess 'affect reception' circuits that detect emotional content even when explicit keywords are removed.
Paradigm Shift arxiv | Mar 25
Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.
Efficiency Breakthrough arxiv | Mar 25
hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.
Scaling Insight arxiv | Mar 25
Forcing AI agents to use human-comprehensible language causes a 50% efficiency drop compared to their own 'inscrutable' communication protocols.
Breaks Assumption arxiv | Mar 25
Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.
Efficiency Breakthrough arxiv | Mar 25
Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.
Efficiency Breakthrough arxiv | Mar 25
Small adapters can provide frozen decoder-only LLMs with persistent latent-space memory that survives across separate sessions.
New Capability arxiv | Mar 25
The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.
Scaling Insight arxiv | Mar 25
Gradient boosting exhibits a 'first-mover bias' where correlated features selected early in the tree sequence gain an artificial, self-reinforcing importance in SHAP rankings.
Paradigm Shift arxiv | Mar 25
Introduces a framework for LLMs to self-improve reasoning in specific domains by autonomously mining and constructing training environments directly from the open web.
New Capability arxiv | Mar 25
Establishes a formal mathematical equivalence between Classifier-Free Guidance (CFG) and alignment-based objectives, allowing for CFG-like quality without inference-time overhead.
Paradigm Shift arxiv | Mar 25
Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.
Efficiency Breakthrough arxiv | Mar 25
Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.
Efficiency Breakthrough arxiv | Mar 25
Finds that nominal instruction-tuning with LoRA often fails to improve (and can even degrade) verifiable instruction-following despite improvements on broader benchmarks.
Breaks Assumption arxiv | Mar 25
Shifts symbolic regression from discrete genetic search to a continuous, embedding-driven optimization paradigm.
Paradigm Shift arxiv | Mar 25
Reveals that RLVR-driven reasoning improvements in LLMs are the result of highly sparse changes to a tiny fraction of 'critical' token distributions.
Scaling Insight arxiv | Mar 25
Identifies that the full source code (skill body) of a tool is the primary signal for LLM tool selection, far outweighing the importance of descriptions or metadata.
Breaks Assumption arxiv | Mar 25
Replaces standard autoregressive document OCR with a parallel diffusion-based denoising framework.
Paradigm Shift arxiv | Mar 25
Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.
Efficiency Breakthrough arxiv | Mar 25
Demonstrates that Hebbian plasticity can induce emergent attractor dynamics in robot controllers, enabling rapid adaptation without backpropagation.
Paradigm Shift arxiv | Mar 25
Uncovers that neural operator digital twins are acutely vulnerable to sparse adversarial perturbations on boundary conditions that bypass standard anomaly detection.
Breaks Assumption arxiv | Mar 25
Leverages unstructured clinical notes during training to boost the performance of models that are deployed using only structured EHR data.
New Capability arxiv | Mar 25
Robotic bipedal mass scales with the square of leg length rather than the cubic scaling found in biological systems.
Scaling Insight arxiv | Mar 25
CanViT is the first task-agnostic active-vision foundation model that reconstructs scenes using low-resolution 'glimpses' with 19.5x fewer FLOPs than existing models.
New Capability arxiv | Mar 25
A large-scale study of 12 reasoning models reveals that internal 'thinking' processes frequently recognize deceptive hints while the final output remains sycophantic.
Breaks Assumption arxiv | Mar 25
Instead of using top-activating examples, this method steers Sparse Autoencoder (SAE) features in Vision-Language Models to let the model describe its own internal visual features.
Paradigm Shift arxiv | Mar 25
DeIllusionLLM introduces task-level autoregressive reasoning to prevent LLMs from hallucinating answers to ill-posed or faulty scientific questions.
Paradigm Shift arxiv | Mar 25
CAM3R is a camera-agnostic 3D reconstruction model that handles fisheye, panoramic, and pinhole imagery without requiring prior calibration.
New Capability arxiv | Mar 25
Inter-Layer Structural Encoders (ILSE) use Cayley graphs to aggregate features from all internal LLM layers, improving accuracy by up to 44% over final-layer-only predictions.
Paradigm Shift arxiv | Mar 25
Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.
Open Release arxiv | Mar 25
Provides a massive 2.5M image-to-TikZ dataset and the first instruction-augmented dataset for geometric visual reasoning.
Open Release arxiv | Mar 25
A new statistical test that reliably detects whether a dataset was NOT used in an LLM's training corpus.
New Capability arxiv | Mar 25
Introduces Dual Q-DM, the first non-adversarial imitation learning method theoretically guaranteed to eliminate compounding errors.
Paradigm Shift arxiv | Mar 25
A quantitative model that predicts the performance gain of merging independent LLM specialists before committing compute.
Scaling Insight arxiv | Mar 25
Proves that logic and lookup-table (LUT) based neural networks are structurally more resilient to hardware bit-flips than standard architectures.
Breaks Assumption arxiv | Mar 25
Identifies the 'Caterpillar Tree' as the theoretically optimal structure for test-time computation and backtracking in LLMs.
Scaling Insight arxiv | Mar 25
ABSTRAL automates the design of multi-agent systems by treating architectures as evolving, inspectable natural-language documents.
New Capability arxiv | Mar 25
Frontier models' reasoning steps are largely 'decorative' and do not causally determine the final answer in most tasks.
Breaks Assumption arxiv | Mar 25
Moving beyond coarse reward signals, this paper introduces token-level policy optimization for multimodal reasoning.
Paradigm Shift arxiv | Mar 25
UniQueR reconstructs full 3D scenes (including occluded areas) from unposed images in a single forward pass.
New Capability arxiv | Mar 25
Persistent structural memory in neural networks is fundamentally limited by the instability of jointly-learned coordinate systems.
Scaling Insight arxiv | Mar 25
Deep semi-parametric models allow for the instant deletion of training data from a model without retraining or parameter updates.
New Capability arxiv | Mar 25
A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.
Efficiency Breakthrough arxiv | Mar 25
Standard confidence calibration is structurally biased when ground truth labels are ambiguous or annotators disagree.
Breaks Assumption arxiv | Mar 25
Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.
Efficiency Breakthrough arxiv | Mar 25
EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.
Efficiency Breakthrough arxiv | Mar 25
ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.
Efficiency Breakthrough arxiv | Mar 25
Theoretical analysis reveals that the efficiency benefits of low-dimensional data structures for diffusion models diminish significantly when the data manifold is non-linear.
Scaling Insight arxiv | Mar 25
This paper moves LLMs from point predictions to set-valued predictions with rigorous statistical coverage guarantees.
Paradigm Shift arxiv | Mar 25
WorldMesh generates consistent, large-scale 3D worlds by populating a geometric mesh scaffold with image diffusion-derived content.
New Capability arxiv | Mar 25
Graph Foundation Models (GFMs) are shown to fail when using fixed architectural backbones, requiring a new approach of inference-time architecture adaptivity.
Breaks Assumption arxiv | Mar 25
Access to conversational memory allows an 8B model to outperform a 235B model on user-specific queries while reducing inference costs by 96%.
Scaling Insight arxiv | Mar 25
A rigorous evaluation shows that simple Probabilistic Circuits often outperform complex diffusion-based models for tabular data generation at a fraction of the cost.
Breaks Assumption arxiv | Mar 25
Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.
Efficiency Breakthrough arxiv | Mar 25