Leverages unstructured clinical notes during training to boost the performance of models that are deployed using only structured EHR data.
March 25, 2026
Original Paper
Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment
arXiv · 2603.22530
The Takeaway
This allows practitioners to 'distill' the rich contextual information from private or messy text data into a deployable model that doesn't require those notes at inference time. It solves the common problem where the best features (text) are unavailable in production environments.
From the abstract
Unstructured Electronic Health Record (EHR) data, such as clinical notes, contain clinical contextual observations that are not directly reflected in structured data fields. This additional information can substantially improve model learning. However, due to their unstructured nature, these data are often unavailable or impractical to use when deploying a model. We introduce a multimodal learning framework that leverages unstructured EHR data during training while producing a model that can be