Breaks Assumption

259 papers · Page 4 of 6

Papers that puncture a smaller working assumption inside a field. Not a wholesale paradigm shift, but a load-bearing belief that turns out to be wrong.

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

Challenges the entire foundation of Spectral Graph Neural Networks, proving their success is due to implementation quirks rather than spectral theory.

Shows that State Space Models (SSMs) like Mamba can match or beat Vision Transformers as vision encoders in VLMs while being more stable.

A mechanistic study reveals that Vision-Language-Action (VLA) models are dominated by visual pathways and often ignore language when visual context is sufficient.

A rigorous re-evaluation shows that a simple linear PCA baseline matches or outperforms SOTA Deep Learning models for multivariate time series anomaly detection.

Uses SMT solvers to formally verify the physical consistency of tree-based ML models across their entire input domain.

Provides a formal proof and empirical evidence that Transformers can learn symbolic rules entirely absent from training, debunking the 'stochastic parrot' interpolation-only hypothesis.

Identifies a fundamental conflict in Direct Preference Optimization (DPO) for unified models, where image generation quality resists alignment while understanding improves.

Reveals that cross-lingual knowledge failure in large reasoning models is primarily a script-translation barrier rather than a linguistic or reasoning deficit.

Exposes 'hidden clones' in VLM ensembles, where models from the same family share correlated errors that naive voting mechanisms fail to detect.

Internal activation probing detects LLM 'rationalization' more reliably than monitoring the model's own Chain-of-Thought (CoT).

Alignment processes induce a 'normative bias' that makes LLMs worse at predicting real human behavior in strategic scenarios.

Identifies that reasoning-induced safety failures occur *during* Chain-of-Thought and proposes a shift to 'decide-then-reason' architectures.

Develops a zero-watermarking framework that survives AI editing by leveraging invariant relations between image patches.

Dense retrieval architectures are fundamentally flawed at detecting negation and contradictions due to 'Semantic Collapse' in vector space.

ARES demonstrates high-fidelity data reconstruction from large Federated Learning batches without requiring any architectural modifications to the model.

FINER discovers that MLLMs are highly prone to hallucination when images contain fine-grained mismatches co-occurring with real elements.

Massive activation outliers in Transformers are an adaptive response to 'gradient sinks' during training, rather than just an inference-time quirk.

In-context memory for LLMs is fundamentally unreliable due to compaction loss and goal drift, but structured 'Knowledge Objects' provide a 252x cheaper and 100% accurate alternative.

Concept erasure in text-to-image models is largely a facade that can be bypassed using text-free inversion attacks.

Large Language Models can maintain performance with only 16-64 unique weight values per matrix, as only the relative rank of weights matters.

Self-reflective program search matches or outperforms recursive language models for long-context tasks, suggesting recursion itself is not the primary driver of performance.

Theoretical and empirical evidence suggests that the 'Key' mechanism in Attention may be redundant, proposing a 'QV' paradigm that simplifies Transformer architectures.

Robot policy performance can be improved by up to 60% by identifying a single 'golden ticket' constant noise vector instead of sampling from a Gaussian.

Reveals that models with identical predictive performance produce fundamentally different feature attributions based solely on their hypothesis class.

Provides empirical evidence that structural sparsity in Vision Transformers does not lead to improved semantic interpretability.

Releases 70B parameter models that operate entirely on bytes, effectively 'liberating' LLMs from static tokenizers.

Provides the first formal proof that safety is non-compositional, meaning two individually safe AI agents can become hazardous when combined.

Challenges the standard use of bilinear/bicubic interpolation for upsampling saliency maps, proving it creates spurious importance regions and proposing a mass-redistribution alternative.

Debunks the widely held 'intra-modal misalignment hypothesis' which claimed CLIP embeddings are inherently poor for image-only tasks.

Discovers that skipping learning rate decay during pre-training, while appearing worse for pre-train loss, significantly improves the model's adaptability during supervised fine-tuning (SFT).

Proves that noisy/incorrect labels are destructive to Reinforcement Learning with Verifiable Rewards (RLVR), contradicting recent high-profile claims that noise doesn't matter.

Challenges the standard 'pretrain-then-finetune' pipeline by showing that repeating domain-specific data during pretraining is significantly more effective.

A rigorous multi-method audit revealing that frontier LLM performance on MMLU is significantly inflated by data contamination and memorization.

A causal analysis reveals that LLMs often ignore their own intermediate reasoning (Chain-of-Thought) when making final decisions.

Achieves high-bandwidth, precise Cartesian control of a fully soft continuum robot, breaking the assumption that softness and precision are incompatible.

Fast-WAM proves that World Action Models do not actually need to generate future 'imagination' frames at test-time to achieve state-of-the-art performance in embodied control.

Chain-of-thought (CoT) reasoning in Vision-Language Models systematically degrades the reliability of uncertainty estimates, making models dangerously overconfident.

The SOMP attack demonstrates that private training text can be reconstructed from shared gradients even at high batch sizes (up to B=128).

Zero-shot sim-to-real transfer for complex robotic manipulation is achievable using only synthetic simulated data at scale.

Using the best-performing models as anchors for 'LLM-as-a-judge' evaluations significantly reduces the reliability of human ranking correlations.

Neural PDE solvers are not learning general operators, but rather a family of solutions specifically indexed to the boundary conditions seen during training.

Researchers identified just three specific attention heads that govern persona and style, enabling precise steering without degrading model coherence.

Robustness certificates based on real arithmetic often fail when executed on actual floating-point hardware.

Prompt complexity in production environments can completely neutralize structured reasoning frameworks like STAR, dropping accuracy from 100% to 0%.

A systematic study reveals that SOTA representation learning methods for microscopy perform no better than untrained models or simple structural baselines.

Replacing the linear Query projection in Transformers with a nonlinear residual MLP significantly improves performance with minimal parameter growth.

Reveals that diffusion models overfit at intermediate noise levels that standard evaluation metrics typically ignore.

Identifies 'ghosts of softmax'—complex singularities that cap the Taylor convergence radius of cross-entropy loss—explaining why models collapse at specific step sizes.

Researchers discovered that just three specific attention heads in frozen Vision-Language-Action (VLA) models can detect trajectory deviations with 44.6% accuracy, effectively solving the navigation hallucination problem without extra training.

Groups with bounded rationality and stochasticity can outperform perfectly rational agents because randomness encodes signals lost in deterministic behavior.