Mixpeek Logo
    5 min read

    Why SNF Documentation Is a Multimodal AI Problem

    Nurses spend 40% of their time on documentation. MDS coordinators abstract charts for 3-4 hours per assessment. PDPM revenue goes uncaptured. The root cause is that clinical documentation is inherently multimodal — and most tools only handle text.

    Why SNF Documentation Is a Multimodal AI Problem
    Healthcare

    Skilled nursing facilities generate some of the most complex clinical documentation in healthcare. A single resident's chart can include EHR-structured entries, handwritten progress notes, scanned intake assessments, wound photographs, therapy logs, medication administration records, and incident reports—all feeding into the MDS 3.0 assessment that drives reimbursement, care planning, and regulatory compliance.

    Yet the tools most SNFs use to manage this documentation were built for a single-modality world: text in, text out. That fundamental mismatch is why nurses spend 40% of their time on documentation, why MDS coordinators abstract charts for 3–4 hours per assessment, and why an estimated $2,840 per patient per day in PDPM revenue goes uncaptured.

    The Documentation Burden Is Real—and Growing

    CMS requires SNFs to complete MDS 3.0 assessments at admission, quarterly, annually, and upon significant change in condition. Each assessment spans 20+ sections covering functional status, health conditions, skin integrity, medications, special treatments, and more. The data that populates these sections lives across every clinical system in the building.

    MDS coordinators are the bridge between bedside documentation and coded assessments. They open the EHR, cross-reference scanned documents, review wound photos in a separate system, check therapy minutes in another, and manually transcribe findings into MDS item codes. It is meticulous, high-stakes work—and it does not scale.

    With 15,243 Medicare/Medicaid-certified SNFs in the US and an average of 2.5 MDS assessments per resident per year, the aggregate documentation burden is staggering. Every hour spent on chart abstraction is an hour not spent on clinical judgment, care coordination, or staff development.

    Why EHR-First Approaches Fall Short

    EHR vendors have added MDS modules and auto-population features, but they share a fundamental limitation: they only work with structured data that already exists in the EHR. The clinical nuance that determines accurate MDS coding often lives elsewhere:

    • Wound photographs contain staging and measurement information that may not be transcribed into structured fields.
    • Scanned intake assessments from referring hospitals arrive as PDFs that sit in a document management system, unlinked to MDS workflows.
    • Handwritten therapy notes capture functional observations that therapists do not always enter into the EHR in coded form.
    • Incident reports document falls, behavioral episodes, and condition changes in narrative form.

    When MDS coding depends solely on what is already structured in the EHR, clinical severity is systematically under-documented. The downstream effects are cascading: lower PDPM classifications, reduced reimbursement, inaccurate care plans, and compliance risk during CMS surveys.

    Clinical Documentation Is Inherently Multimodal

    The core insight is that clinical documentation in an SNF is not a text problem with occasional images—it is a multimodal data problem from the start. A complete picture of a resident's clinical status requires understanding:

    • Text: progress notes, physician orders, medication records, care plan narratives
    • Images: wound photographs, skin assessments, imaging reports
    • Scanned documents: intake packets, transfer summaries, consent forms, legacy paper records
    • Structured data: EHR fields, lab results, vital signs, ADL scores

    Each modality requires different extraction techniques. OCR handles scanned documents. Image analysis processes wound photographs. Named entity recognition (NER) identifies clinical terms in free-text notes. Text embeddings capture semantic meaning for retrieval. No single extraction method covers all of these—which is why a multimodal pipeline is not a nice-to-have but a structural requirement.

    MDS Mapping as an Extraction + Classification Problem

    Populating an MDS assessment is fundamentally a two-step process: (1) extract relevant clinical data from source documents, and (2) classify that data into the correct MDS section and item code.

    Consider Section G (Functional Status). The data might come from a therapy note describing the resident's mobility during a session, a nursing progress note about ADL performance, a fall incident report, and a physician order modifying the mobility plan. Each source uses different terminology, formats, and levels of detail. An extraction pipeline must pull the relevant information from each source, normalize it, and map it to the specific G-section item codes (G0110 through G0120).

    This is where taxonomy classification becomes essential. A well-structured clinical taxonomy maps extracted data elements to MDS sections with the specificity required for accurate coding. The taxonomy acts as the bridge between unstructured clinical language and the coded assessment structure that CMS requires.

    The PDPM Revenue Connection

    PDPM (Patient-Driven Payment Model) determines SNF reimbursement based on five case-mix components: PT, OT, SLP, nursing, and non-therapy ancillary. Each component is driven by clinical characteristics documented in the MDS. When clinical severity is under-documented—because relevant data was trapped in a scanned document or an unstructured note—the PDPM classification understates the actual care burden.

    The revenue impact is not theoretical. When multimodal extraction captures clinical indicators that were previously invisible to MDS coding—comorbidities mentioned in transfer summaries, wound characteristics visible in photographs, functional limitations described in therapy narratives—PDPM classifications more accurately reflect clinical reality. Facilities that have implemented multimodal documentation intelligence report recovering $30–40 per patient per day in previously undocumented revenue.

    Across a 120-bed facility with 95% occupancy, that recovery compounds to meaningful revenue that can be reinvested in staffing, training, and resident care.

    CMS Survey Readiness as a Retrieval Problem

    CMS surveys evaluate whether a facility's documentation supports the care it claims to provide. Surveyors select residents, review their records, and look for evidence that clinical assessments match documented care plans and interventions. The challenge is that evidence is distributed across systems, formats, and time periods.

    When documentation is indexed multimodally and retrievable by clinical concept, audit preparation shifts from a weeks-long manual process to an on-demand query. Need all documentation related to a resident's skin integrity over the past 90 days? That is a retrieval query across progress notes, wound photographs, treatment records, and physician orders—filtered by date range and clinical concept, ranked by relevance.

    Facilities that can generate audit-ready evidence packages in minutes instead of weeks are better prepared for surveys and better positioned to defend against F-tag citations.

    What a Multimodal Approach Looks Like in Practice

    A practical implementation for SNF documentation intelligence involves four stages:

    1. Ingest: Pull clinical content from the EHR via FHIR, batch-process scanned documents from storage, and accept uploaded photographs and handwritten notes.
    2. Extract and structure: Apply modality-appropriate extractors—OCR for scanned documents, image analysis for photographs, NER for clinical text, embeddings for semantic indexing.
    3. Classify and map: Use a clinical taxonomy to map extracted data to MDS 3.0 sections and item codes, building structured assessment drafts.
    4. Retrieve and review: Surface relevant documentation on demand for MDS coding, PDPM optimization, care planning, and audit preparation.

    The critical architectural choice is treating extraction, classification, and retrieval as a unified pipeline rather than three separate systems. When extraction feeds directly into classification which feeds directly into retrieval, the entire documentation workflow accelerates—from bedside charting to billed assessment.

    The Scale of the Opportunity

    With 15,243 Medicare/Medicaid-certified SNFs in the US, approximately 1.3 million residents, and documentation burden consuming 40% of nursing time, the total addressable market for SNF documentation intelligence is substantial. Facilities that reduce documentation burden, improve PDPM capture, and streamline survey readiness gain competitive advantages in staffing retention (less burnout), financial performance (more accurate reimbursement), and regulatory standing (fewer citations).

    The technology to address this exists today. Multimodal extraction pipelines, clinical taxonomy classifiers, and hybrid retrieval systems are production-ready. The question for SNF operators is not whether to automate documentation intelligence, but how quickly they can deploy it.


    Mixpeek provides the multimodal infrastructure for clinical documentation intelligence. Our SNF Documentation Intelligence use case combines feature extractors, taxonomy classifiers, and hybrid retrievers to automate MDS workflows. The FHIR R4 connector integrates with Epic, Cerner, and other EHR systems. Learn more on our healthcare solutions page.