SERIESFUSION
.
AI
Science Discovery for Humans | Curated by AI & Humans
About
RSS
Scaling Insight
Scaling Insight
28 papers
Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.
AI & ML
arxiv | Mar 13
Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.
AI & ML
arxiv | Mar 13
Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.
AI & ML
arxiv | Mar 13
Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.
AI & ML
arxiv | Mar 13
Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.
AI & ML
arxiv | Mar 13
Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.
AI & ML
arxiv | Mar 13
Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.
AI & ML
arxiv | Mar 13
Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.
AI & ML
arxiv | Mar 16
Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.
AI & ML
arxiv | Mar 16
Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.
AI & ML
arxiv | Mar 16
Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.
AI & ML
arxiv | Mar 16
Factual selection in LLMs is driven by rotational dynamics on a hypersphere rather than scalar magnitude shifts, with the behavior emerging suddenly at the 1.6B parameter mark.
AI & ML
arxiv | Mar 17
Grokking is driven by a norm-driven representational phase transition with a predictable scaling law.
AI & ML
arxiv | Mar 17
Challenges the monotonic 'bigger is better' scaling paradigm by proving that institutional fitness peaks at an environment-dependent scale.
AI & ML
arxiv | Mar 17
Proposes spectral clipping to stabilize LLM training by addressing 'spectral spikes' in stochastic gradient noise that adaptive optimizers like AdamW fail to handle.
AI & ML
arxiv | Mar 17
Introduces Matrix-to-Matrix RNNs (M$^2$RNN) with matrix-valued hidden states that outperform hybrid Transformers while using 3x smaller state sizes.
AI & ML
arxiv | Mar 17
The Infinite Problem Generator (IPG) uses executable code to synthesize and verify 100% accurate physics reasoning data, overcoming LLM hallucination in data scaling.
AI & ML
arxiv | Mar 17
Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.
AI & ML
arxiv | Mar 17
Provides the first theoretical proof that dataset distillation efficiently encodes the low-dimensional structure of non-linear tasks.
AI & ML
arxiv | Mar 17
Attention Residuals replace fixed-weight residual connections with softmax attention over preceding layers to prevent hidden-state dilution in deep LLMs.
AI & ML
arxiv | Mar 17
This paper proves that increasing test-time compute via beam search can actually hurt LLM reasoning performance due to overestimation bias.
AI & ML
arxiv | Mar 17
Sparsity (MoE and GQA) is found to act as a critical regulator for variance propagation, mitigating the 'curse of depth' in LLMs.
AI & ML
arxiv | Mar 17
A factorial study on EHR foundation models reveals that joint encoding of code-attribute pairs (local binding) is the primary driver of performance and efficiency.
AI & ML
arxiv | Mar 18
Spectral Edge Dynamics (SED) provides an early-warning signal for grokking, predicting generalization up to 1,700 steps before it occurs.
AI & ML
arxiv | Mar 18
Demonstrates that massive scaling of diverse simulator resets can replace manual curriculum engineering for complex dexterous manipulation.
AI & ML
arxiv | Mar 18
Derives closed-form power-law scaling for hyperparameters like learning rate and batch size using modern optimization theory rather than expensive empirical sweeps.
AI & ML
arxiv | Mar 18
Provides a geometric 'manifold envelopment' framework to explain why unsupervised RL for mathematical reasoning often collapses and how to stabilize it.
AI & ML
arxiv | Mar 18
The study provides a formal link showing that internal 'world model' representations in transformers are a direct byproduct of the predictive geometry of the training data.
AI & ML
arxiv | Mar 18