Scaling Insight

Scaling Insight

28 papers

Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.

AI & ML arxiv | Mar 13

Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.

AI & ML arxiv | Mar 13

Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.

AI & ML arxiv | Mar 13

Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.

AI & ML arxiv | Mar 13

Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.

AI & ML arxiv | Mar 13

Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.

AI & ML arxiv | Mar 13

Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.

AI & ML arxiv | Mar 13

Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.

AI & ML arxiv | Mar 16

Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.

AI & ML arxiv | Mar 16

Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.

AI & ML arxiv | Mar 16

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

AI & ML arxiv | Mar 16

Factual selection in LLMs is driven by rotational dynamics on a hypersphere rather than scalar magnitude shifts, with the behavior emerging suddenly at the 1.6B parameter mark.

AI & ML arxiv | Mar 17

Grokking is driven by a norm-driven representational phase transition with a predictable scaling law.

AI & ML arxiv | Mar 17

Challenges the monotonic 'bigger is better' scaling paradigm by proving that institutional fitness peaks at an environment-dependent scale.

AI & ML arxiv | Mar 17

Proposes spectral clipping to stabilize LLM training by addressing 'spectral spikes' in stochastic gradient noise that adaptive optimizers like AdamW fail to handle.

AI & ML arxiv | Mar 17

Introduces Matrix-to-Matrix RNNs (M$^2$RNN) with matrix-valued hidden states that outperform hybrid Transformers while using 3x smaller state sizes.

AI & ML arxiv | Mar 17

The Infinite Problem Generator (IPG) uses executable code to synthesize and verify 100% accurate physics reasoning data, overcoming LLM hallucination in data scaling.

AI & ML arxiv | Mar 17

Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.

AI & ML arxiv | Mar 17

Provides the first theoretical proof that dataset distillation efficiently encodes the low-dimensional structure of non-linear tasks.

AI & ML arxiv | Mar 17

Attention Residuals replace fixed-weight residual connections with softmax attention over preceding layers to prevent hidden-state dilution in deep LLMs.

AI & ML arxiv | Mar 17