SeriesFusion
Science, curated & edited by AI

Scaling Insight

101 papers  ·  Page 2 of 2

What changes when you scale a system up or down. Laws, regimes, and surprises that only appear at larger or smaller orders of magnitude.

AI
Restores monotonic scaling in LLM tree search by replacing standard MCTS selection with Gumbel sampling and Sequential Halving.
Mar 24
AI
Introduces the Neural Zeroth-order Kernel (NZK) to provide a theoretical foundation for training models without backpropagation.
Mar 24
AI
Proves that structured retrieval is exponentially more efficient than sequential context scanning for agentic reasoning.
Mar 24
AI
Discovers 'silent commitment failure,' where some model architectures produce confident, incorrect outputs with zero detectable warning signals before execution.
Mar 24
AI
Provides a causal explanation for 'embedding collapse' in Transformers, linking it to the concept of semantic shift rather than just text length.
Mar 24
AI
Depth-Recurrent Transformers decouple computational depth from parameter count, revealing a 'computational frontier' where performance on reasoning tasks snaps from zero to perfect based on iteration steps.
Mar 24
AI
Identifies structured table data as a primary driver for scaling long-context reasoning in LLMs.
Mar 24
AI
Introduces a robust framework for optimal Mixture-of-Experts (MoE) architecture design across six orders of magnitude in compute.
Mar 24
AI
Provides a strictly controlled comparison of autoregressive vs. masked diffusion language models on identical compute budgets.
Mar 24
AI
Discovers a multiplicative scaling law governing how LLMs revise their beliefs during iterative reasoning (CoT, reflection).
Mar 23
AI
A massive controlled study reveals that post-training algorithm rankings (DPO, SimPO, etc.) completely invert as models scale.
Mar 23
AI
Extreme neural network sparsification causes a catastrophic interpretability collapse even when global accuracy remains stable.
Mar 20
AI
This paper provides theoretical proof that autocurriculum—where a model selects its own training problems—requires exponentially fewer reasoning demonstrations.
Mar 20
AI
The 'Progressive Intensity Hypothesis' establishes that weaker perturbations (pruning) should precede stronger ones (quantization) for optimal joint model compression.
Mar 20
AI
Mechanistic analysis of 'counting circuits' in VLMs allows for lightweight interventions that improve general visual reasoning performance.
Mar 20
AI
Synthetic data scaling reaches a new level by moving from simple rephrasing to creating 'megadocs' through rationale insertion and stitching.
Mar 20
AI
Discovers how uncertainty estimation signals like self-consistency and verbalized confidence scale and complement each other in reasoning models.
Mar 20
AI
Establishes scaling laws to determine the optimal compute split between general pretraining and domain-specific specialization.
Mar 20
AI
Shows that 'Mid-Training' on high-quality reasoning data is the primary driver of model capability, whereas RL only succeeds as a sparse refinement step.
Mar 19
AI
Video fine-tuning consistently degrades static image understanding in multimodal LLMs, revealing a zero-sum trade-off between spatial and temporal capabilities.
Mar 19
AI
Mechanistic probing reveals a directional asymmetry in how LLMs encode hierarchy: hypernymy is redundant and resilient, while hyponymy is fragile and compact.
Mar 19
AI
Provides the first theoretical proof that Graph Transformers structurally prevent the 'oversmoothing' failure mode inherent to deep GCNs.
Mar 19
AI
A factorial study on EHR foundation models reveals that joint encoding of code-attribute pairs (local binding) is the primary driver of performance and efficiency.
Mar 18
AI
Spectral Edge Dynamics (SED) provides an early-warning signal for grokking, predicting generalization up to 1,700 steps before it occurs.
Mar 18
AI
Demonstrates that massive scaling of diverse simulator resets can replace manual curriculum engineering for complex dexterous manipulation.
Mar 18
AI
Derives closed-form power-law scaling for hyperparameters like learning rate and batch size using modern optimization theory rather than expensive empirical sweeps.
Mar 18
AI
Provides a geometric 'manifold envelopment' framework to explain why unsupervised RL for mathematical reasoning often collapses and how to stabilize it.
Mar 18
AI
The study provides a formal link showing that internal 'world model' representations in transformers are a direct byproduct of the predictive geometry of the training data.
Mar 18
AI
Factual selection in LLMs is driven by rotational dynamics on a hypersphere rather than scalar magnitude shifts, with the behavior emerging suddenly at the 1.6B parameter mark.
Mar 17
AI
Grokking is driven by a norm-driven representational phase transition with a predictable scaling law.
Mar 17
AI
Challenges the monotonic 'bigger is better' scaling paradigm by proving that institutional fitness peaks at an environment-dependent scale.
Mar 17
AI
Proposes spectral clipping to stabilize LLM training by addressing 'spectral spikes' in stochastic gradient noise that adaptive optimizers like AdamW fail to handle.
Mar 17
AI
Introduces Matrix-to-Matrix RNNs (M$^2$RNN) with matrix-valued hidden states that outperform hybrid Transformers while using 3x smaller state sizes.
Mar 17
AI
The Infinite Problem Generator (IPG) uses executable code to synthesize and verify 100% accurate physics reasoning data, overcoming LLM hallucination in data scaling.
Mar 17
AI
Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.
Mar 17
AI
Provides the first theoretical proof that dataset distillation efficiently encodes the low-dimensional structure of non-linear tasks.
Mar 17
AI
Attention Residuals replace fixed-weight residual connections with softmax attention over preceding layers to prevent hidden-state dilution in deep LLMs.
Mar 17
AI
This paper proves that increasing test-time compute via beam search can actually hurt LLM reasoning performance due to overestimation bias.
Mar 17
AI
Sparsity (MoE and GQA) is found to act as a critical regulator for variance propagation, mitigating the 'curse of depth' in LLMs.
Mar 17
AI
Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.
Mar 16
AI
Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.
Mar 16
AI
Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.
Mar 16
AI
Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.
Mar 16
AI
Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.
Mar 13
AI
Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.
Mar 13
AI
Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.
Mar 13
AI
Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.
Mar 13
AI
Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.
Mar 13
AI
Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.
Mar 13
AI
Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.
Mar 13