Back to Annotator

Run an LLM to extract findings

Use an LLM (such as ChatGPT, Claude, or your own model) to extract structured findings from your radiology reports. This page gives you the prompt to use, shows you what file to upload, and explains how to fix problems if the upload fails.

Step 1 — Copy the extraction prompt

Prompt below is auto-filled with your active taxonomy.

Step 2 — Run the prompt against your reports

Paste this prompt into your LLM. Add the text of your radiology reports below the prompt. Save the LLM's response as a .json file.

The reports CSV format is documented at the Reports CSV Format Guide.

Step 3 — Upload the JSON file to the annotator

Open the annotator's home page. Click the Import LLM Extractions button in the sidebar and select your .json file.

If something doesn't import

The annotator validates every row before import and shows you what to fix. Common cases:

Error	Cause	Fix
missing required field	A row lacks `record_id`, `finding_name`, or `source_text`.	Every row must include all three. Update your extraction.
record_id not in loaded reports	The row references a record that wasn't imported as a report.	Re-import the reports CSV with this record included, or check for typos.
source_text not in named report	The text doesn't appear in the report's FINDINGS section. Often a sign of cross-attribution (LLM batched multiple reports with the wrong record_id) or paraphrase (the LLM didn't quote verbatim).	Check the validation panel — if the source_text matches another loaded report, your record_id is mismatched. Otherwise verify source_text is a verbatim quote.
source_text matches multiple sentences	The same body text appears under multiple section headers (templated report).	Add the section header to source_text. Example: `"Celiac artery: no dissection."`
presence value not recognized	A value other than present / absent / indeterminate.	Update the extraction to emit one of the three allowed values.
attribute value not recognized	A canonical enum attribute (laterality, severity, etc.) has a value outside its vocabulary.	Use the allowed values listed in the field reference below.

The validation panel offers a Download error report (CSV) button that exports every rejected row with a _validation_error column appended. Use it to fix issues upstream and re-upload.

JSON field reference

Each finding is one JSON object with these fields. Required fields are checked at import; optional fields are only validated if present.

Field	Type	Required	Valid Values / Constraints	Example
record_id	string	Required	Must match a record ID already loaded in the annotator	ARN-RSNA-CXR-0001
finding_name	string	Required	A name from the active taxonomy (case-insensitive). If the term doesn't map, emit `unmapped:your_term` — the annotator preserves it for later review.	pleural_effusion
presence	enum	Required	One of: `present`, `absent`, `indeterminate`	present
source_text	string	Required	The verbatim sentence from FINDINGS supporting this finding. For templated reports where the same body text repeats under different sub-headers: include the section header as a prefix in `source_text` so each match is unique.	Small right pleural effusion.
Canonical optional fields below — include only when applicable
laterality	enum	Optional	`left`, `right`, `bilateral`	right
temporal_status	enum	Optional	`unchanged`, `new`, `resolved`, `larger`, `smaller`, `increased_extent`, `decreased_extent`, `more_conspicuous`, `less_conspicuous`, `increased_complexity`, `decreased_complexity`, `indeterminate`	unchanged
chronicity	enum	Optional	`acute`, `subacute`, `chronic`, `remote`, `evolving`, `resolving`, `healing`, `healed`, `indeterminate`	chronic
size	string	Optional	Numerical measurements with units only (e.g., "7 mm", "2.3 cm"). Qualitative descriptors like "small" or "large" go in `severity`.	7 mm
severity	enum	Optional	Intensity: `mild`, `moderate`, `severe` Extent: `small`, `medium`, `large`	small
anatomic_site	string	Optional	Compound anatomic terms stay intact (e.g., "right upper lobe").	right upper lobe
features	string or array	Optional	Descriptive properties of the finding (e.g., cavitation, loculated). Comma-separated string or JSON array.	"loculated, septated"
tip_location	string	Optional	For device findings (catheters, lines): where the device tip terminates.	SVC-RA junction
position_status	enum	Optional	`satisfactory`, `malpositioned`	satisfactory

Example JSON

[
  {
    "record_id": "ARN-RSNA-CXR-0001",
    "finding_name": "pleural_effusion",
    "presence": "present",
    "source_text": "Small right pleural effusion.",
    "laterality": "right",
    "severity": "small"
  },
  {
    "record_id": "ARN-RSNA-CXR-0001",
    "finding_name": "cardiomegaly",
    "presence": "present",
    "source_text": "The heart is mildly enlarged.",
    "severity": "mild"
  }
]

Custom fields beyond the canonical set are preserved on each finding as free-text attributes. Edit, display, and export work alongside canonical fields; only canonical enums are vocabulary-checked.

CSV files with these same field names as column headers also import. JSON is the recommended format because LLMs produce it more reliably than CSV.