Back to Annotator

Run an LLM to extract findings

Use an LLM (such as ChatGPT, Claude, or your own model) to extract structured findings from your radiology reports. This page gives you the prompt to use, shows you what file to upload, and explains how to fix problems if the upload fails.

Step 1 — Copy the extraction prompt

Prompt below is auto-filled with your active taxonomy.


  

Step 2 — Run the prompt against your reports

Paste this prompt into your LLM. Add the text of your radiology reports below the prompt. Save the LLM's response as a .json file.

The reports CSV format is documented at the Reports CSV Format Guide.

Step 3 — Upload the JSON file to the annotator

Open the annotator's home page. Click the Import LLM Extractions button in the sidebar and select your .json file.

If something doesn't import

The annotator validates every row before import and shows you what to fix. Common cases:

Error Cause Fix
missing required field A row lacks record_id, finding_name, or source_text. Every row must include all three. Update your extraction.
record_id not in loaded reports The row references a record that wasn't imported as a report. Re-import the reports CSV with this record included, or check for typos.
source_text not in named report The text doesn't appear in the report's FINDINGS section. Often a sign of cross-attribution (LLM batched multiple reports with the wrong record_id) or paraphrase (the LLM didn't quote verbatim). Check the validation panel — if the source_text matches another loaded report, your record_id is mismatched. Otherwise verify source_text is a verbatim quote.
source_text matches multiple sentences The same body text appears under multiple section headers (templated report). Add the section header to source_text. Example: "Celiac artery: no dissection."
presence value not recognized A value other than present / absent / indeterminate. Update the extraction to emit one of the three allowed values.
attribute value not recognized A canonical enum attribute (laterality, severity, etc.) has a value outside its vocabulary. Use the allowed values listed in the field reference below.

The validation panel offers a Download error report (CSV) button that exports every rejected row with a _validation_error column appended. Use it to fix issues upstream and re-upload.

JSON field reference

Each finding is one JSON object with these fields. Required fields are checked at import; optional fields are only validated if present.

Field Type Required Valid Values / Constraints Example
record_id string Required Must match a record ID already loaded in the annotator ARN-RSNA-CXR-0001
finding_name string Required A name from the active taxonomy (case-insensitive). If the term doesn't map, emit unmapped:your_term — the annotator preserves it for later review. pleural_effusion
presence enum Required One of: present, absent, indeterminate present
source_text string Required

The verbatim sentence from FINDINGS supporting this finding.

For templated reports where the same body text repeats under different sub-headers: include the section header as a prefix in source_text so each match is unique.

Small right pleural effusion.
Canonical optional fields below — include only when applicable
laterality enum Optional left, right, bilateral right
temporal_status enum Optional unchanged, new, resolved, larger, smaller, increased_extent, decreased_extent, more_conspicuous, less_conspicuous, increased_complexity, decreased_complexity, indeterminate unchanged
chronicity enum Optional acute, subacute, chronic, remote, evolving, resolving, healing, healed, indeterminate chronic
size string Optional Numerical measurements with units only (e.g., "7 mm", "2.3 cm"). Qualitative descriptors like "small" or "large" go in severity. 7 mm
severity enum Optional Intensity: mild, moderate, severe
Extent: small, medium, large
small
anatomic_site string Optional Compound anatomic terms stay intact (e.g., "right upper lobe"). right upper lobe
features string or array Optional Descriptive properties of the finding (e.g., cavitation, loculated). Comma-separated string or JSON array. "loculated, septated"
tip_location string Optional For device findings (catheters, lines): where the device tip terminates. SVC-RA junction
position_status enum Optional satisfactory, malpositioned satisfactory

Example JSON

[
  {
    "record_id": "ARN-RSNA-CXR-0001",
    "finding_name": "pleural_effusion",
    "presence": "present",
    "source_text": "Small right pleural effusion.",
    "laterality": "right",
    "severity": "small"
  },
  {
    "record_id": "ARN-RSNA-CXR-0001",
    "finding_name": "cardiomegaly",
    "presence": "present",
    "source_text": "The heart is mildly enlarged.",
    "severity": "mild"
  }
]

Custom fields beyond the canonical set are preserved on each finding as free-text attributes. Edit, display, and export work alongside canonical fields; only canonical enums are vocabulary-checked.

CSV files with these same field names as column headers also import. JSON is the recommended format because LLMs produce it more reliably than CSV.