AI & ML Breaks Assumption

Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.

March 24, 2026

Original Paper

Multi-RF Fusion with Multi-GNN Blending for Molecular Property Prediction

Zacharie Bugaud

arXiv · 2603.20724

The Takeaway

It challenges the dominance of Graph Neural Networks in molecular property prediction by showing that a carefully tuned ensemble of Random Forests on molecular fingerprints can take the top spot on major leaderboards. This suggests that for certain scaffold-split tasks, classical ML with optimal feature engineering remains the baseline to beat.

From the abstract

Multi-RF Fusion achieves a test ROC-AUC of 0.8476 +/- 0.0002 on ogbg-molhiv (10 seeds), placing #1 on the OGB leaderboard ahead of HyperFusion (0.8475 +/- 0.0003). The core of the method is a rank-averaged ensemble of 12 Random Forest models trained on concatenated molecular fingerprints (FCFP, ECFP, MACCS, atom pairs -- 4,263 dimensions total), blended with deep-ensembled GNN predictions at 12% weight. Two findings drive the result: (1) setting max_features to 0.20 instead of the default sqrt(d