SeriesFusion
Science, curated & edited by AI

Open Release

57 papers  ·  Page 1 of 2

Open-weight models, open datasets, open code, open hardware. Papers whose primary contribution is making something useful available to everyone.

Open Release  /  Category lead

The first dedicated foundation model for electrodermal activity (EDA) data, released alongside the largest public dataset for physiological signal modeling.

By releasing 25,000 hours of curated physiological data and the UME foundation model, this work democratizes research in wearable health sensing. It demonstrates that a domain-specific model can outperform generalist time-series models while using 20x fewer computational resources.

AI
Independently reproduces OpenAI's gpt-oss-20b scores by reverse-engineering undisclosed tool-calling formats and agent harnesses.
Apr 2
AI
OmniVoice is an open-source TTS model scaling to over 600 languages using a novel diffusion language model architecture.
Apr 2
AI
Releases the GPT-NL Public Corpus, the largest permissively licensed (CC-BY) Dutch-first dataset for LLM pre-training.
Apr 2
AI
Delivers a state-of-the-art universal phone recognition model across 100+ languages with full open-source release.
Apr 2
AI
A unified, open-source framework that converts complex post-training quantization workflows into a single-line, hardware-aware pipeline.
Apr 1
AI
A massive multimodal release for 10 low-resource African languages, reducing SOTA Word Error Rates (WER) by up to 61% relative.
Apr 1
AI
A massive 270K-sample multi-view video corpus specifically for embodied AI agents in complex retail environments.
Apr 1
AI
Releases a massive 117k-instruction dataset and a language-conditioned world model framework for visual navigation.
Mar 31
AI
Releases ROSClaw, a model-agnostic executive layer that allows any foundation model to control any ROS 2 robot through standardized capability discovery and safety envelopes.
Mar 31
AI
Releases ChartNet, a million-scale, high-quality multimodal dataset for chart understanding spanning 24 chart types and 1.5 million samples.
Mar 31
AI
Introduces MeteoCap-3B, a billion-scale meteorological dataset with expert captions and a spectral-aware diffusion model for weather time-series generation.
Mar 31
AI
A fully open industrial-scale pretraining project releasing 8T tokens of processed data, a 3B model, and 200+ controlled pretraining ablations.
Mar 31
AI
The first self-supervised, domain-agnostic model for LiDAR ground segmentation, eliminating the need for per-sensor manual labeling.
Mar 31
AI
A modular, JAX-based framework and taxonomy for Reinforcement Learning with Diffusion and Flow policies.
Mar 31
AI
Kuaishou releases KAT-Coder-V2, an agentic coding model achieving state-of-the-art results on SWE-bench Verified through a 'Specialize-then-Unify' paradigm.
Mar 31
AI
Releases weights for LEMON, a foundation model for single-cell nuclear morphology trained on millions of pathology images.
Mar 30
AI
The first large-scale benchmark for LLM agents based on years of authentic, cross-domain user behavioral data rather than synthetic personas.
Mar 30
AI
Releases DataFlex, a unified open-source framework for data-centric dynamic training (selection, mixture, and reweighting) for LLMs.
Mar 30
AI
Releases Ruka-v2, a fully open-source, 13-DOF tendon-driven humanoid hand with wrist and finger abduction buildable for under $1,300.
Mar 30
AI
Berta is an open-source, production-proven AI clinical scribe that reduces operating costs by up to 95% compared to commercial alternatives.
Mar 26
AI
BioVITA releases a massive multimodal biological dataset of 3.6M image-audio-text samples covering 14,000 species.
Mar 26
AI
Releases a high-quality, 92K-sentence parallel dataset for Hindi-Sanskrit translation focusing on contemporary and spoken language.
Mar 26
AI
Releases 55 hours of continuous 30fps expert human computer-use videos to address the 'missing ingredient' for desktop automation agents.
Mar 26
AI
VFIG enables high-fidelity conversion of rasterized technical figures into editable, scalable SVGs using a new 66K-pair dataset.
Mar 26
AI
Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.
Mar 25
AI
Provides a massive 2.5M image-to-TikZ dataset and the first instruction-augmented dataset for geometric visual reasoning.
Mar 25
AI
Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.
Mar 24
AI
AgentComm-Bench is the first benchmark to stress-test cooperative embodied AI under realistic wireless impairments like packet loss and bandwidth collapse.
Mar 24
AI
ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.
Mar 24
AI
An open-source family of language models for Kazakh that outperforms much larger multilingual models by using a language-specific tokenizer.
Mar 24
AI
CLT-Forge democratizes mechanistic interpretability by providing an end-to-end library for training Cross-Layer Transcoders and generating feature attribution graphs.
Mar 24
AI
LongCat-Flash-Prover is a 560B MoE model that sets a new SOTA for open-weights formal reasoning, achieving a 97.1% pass rate on MiniF2F-Test.
Mar 24
AI
Open-sources a high-fidelity foundation model that jointly generates synchronized video and audio using a unified single-stream Transformer.
Mar 24
AI
Releases the first large-scale family of learned sparse retrieval (LSR) models specialized for code (up to 8B parameters).
Mar 24
AI
Releases the hardware design and training environment for MEVIUS2, an open-source, Spot-scale quadruped robot.
Mar 24
AI
An open foundation suite for universal dexterous robot control trained on over 50k trajectories across eight different robotic hand architectures.
Mar 24
AI
SpecForge provides an open-source framework and high-quality draft models (SpecBundle) to make speculative decoding production-ready.
Mar 20
AI
OpenT2M is a massive open-source motion dataset (2,800+ hours) that addresses the data starvation in text-to-motion generation.
Mar 20
AI
An open release of a multilingual embedding family (80M to 14B) covering 200+ languages and ranking first on 11 MTEB benchmarks.
Mar 20
AI
Democratizes dexterous robot data collection by enabling high-fidelity 21-DoF teleoperation using only a standard smartphone.
Mar 19
AI
Introduces FineViT and a 450M local caption dataset to solve the 'coarse perception' bottleneck in current CLIP-based encoders.
Mar 19
AI
Kamino is a massively parallel GPU physics solver that natively supports complex kinematic loops and multi-body systems.
Mar 18
AI
IQuest-Coder-V1 introduces a series of high-performance code models including a unique 'Loop' variant with a recurrent mechanism for efficiency.
Mar 18
AI
SurgΣ is a massive open-source release of 5.98M multimodal conversations and foundation models for surgical intelligence.
Mar 18
AI
Introduces a unified evaluation harness for Vision-Language-Action (VLA) models that standardizes disparate protocols and exposes hidden flaws in published SOTA models.
Mar 17
AI
Releases an 11-billion example dataset and model (RealVLG-R1) for unified real-world visual-language grounding and robotic manipulation.
Mar 17
AI
Releases a million-scale human preference dataset (29M pairs) specifically for text-to-image editing tasks.
Mar 17
AI
Tagarela releases 8,972 hours of high-quality Portuguese podcast audio, rivaling the scale of GigaSpeech for English.
Mar 17
AI
Democratizes the development of 'Deep Search' agents by open-sourcing the specialized training data and trajectory synthesis methods.
Mar 17
AI
Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).
Mar 16