Reports CSV Format Guide

Field definitions and constraints for the report-text CSV uploaded via "Upload Reports CSV."

The contract

The annotator parses your reports into sentences deterministically and uses them as the source of truth for finding alignment. You don't pre-format reports; the splitter handles common variations natively. If a report can't be parsed, validation surfaces it before any data is committed — so you never start labeling on broken input.

Required Columns

Column	Type	Constraints
record_id	text	Unique within the CSV. Auto-detected synonyms: `id`, `report_id`, `case_id`, `accession`.
report_text	text	The full report text including a `FINDINGS:` section. Auto-detected synonyms: `report`, `text`, `findings`, `rad_deid_report`.

FINDINGS Section

Annotation operates on the FINDINGS section only. Reports must contain a FINDINGS: header followed by content. Anything before FINDINGS (clinical history, technique, comparison) and anything after IMPRESSION is ignored.

A report whose FINDINGS section can't be located or that yields zero sentences will be flagged at upload — you'll see it in the validation panel with an actionable fix message before anything is committed.

Supported Report Formats

One splitter handles multiple report styles uniformly. You don't pick a format; the parser absorbs variation. Every sentence in a section uniformly carries that section's header as a prefix in stored form (the UI strips it for display).

1. Prose paragraphs

Multi-sentence paragraphs separated by blank lines. Sentences split on period + uppercase boundary.

FINDINGS:

Intracranial structures:
Mild patchy hypoattenuation in periventricular white matter. No acute hemorrhage. No mass effect.

Osseous structures: No acute fracture.

2. Bulleted findings

Each bullet on its own line, separated by blank lines. Each bullet becomes its own sentence.

FINDINGS:
Brain Parenchyma:

- Mild patchy hypoattenuation in periventricular white matter.

- No acute hemorrhage.

- No mass effect.

3. Numbered impressions

Numbered lists (1., 2., …) split on the boundary between items.

1. Chronic small vessel ischemic changes.
2. No acute infarct.
3. Mild generalized atrophy.

Section Headers and Templated Reports

The splitter recognizes section headers in two forms:

Bare header on its own line — e.g., "Brain Parenchyma:" followed by content below.
Inline header with content — e.g., "Brain Parenchyma: Mild patchy hypoattenuation..." on one line.

In both cases, every sentence within that section is stored with the header label prefixed (e.g., "Brain Parenchyma: - Mild patchy hypoattenuation..."). This is the integrity guarantee that makes templated reports work: a vascular CTA whose body text repeats the same observation under multiple per-vessel sub-headers (Celiac, SMA, IMA, etc.) produces unique sentences because each carries its vessel-specific section prefix.

When uploading extractions for templated reports, your source_text values must include the section header to match the splitter's view. See the LLM extractions playbook for details.

Validation Errors and Fixes

When the upload validation rejects rows, you'll see them inline with fix suggestions. Common cases:

Error	Fix
Encoding error: unrecognized characters detected	Re-save the source CSV as UTF-8 (Excel: Save As → CSV UTF-8).
No parseable FINDINGS section	The report text doesn't contain a `FINDINGS:` header followed by content. Confirm the report is complete.
Zero sentences extracted from FINDINGS	The FINDINGS section is empty after parsing. Check that the section has actual content.
Duplicate ID	Each `record_id` must be unique within the CSV. Deduplicate upstream.

The upload review panel offers a "Skip N invalid and import valid reports" button so you can move forward with the clean rows while keeping a record of what failed.

Example CSV

record_id,report_text
CASE-001,"FINDINGS:
Intracranial structures:
Mild patchy hypoattenuation in periventricular white matter. No acute hemorrhage. No mass effect.

Osseous structures:
No acute fracture.

IMPRESSION:
Chronic small vessel ischemic changes. No acute findings."
CASE-002,"FINDINGS:
Brain Parenchyma:

- Mild patchy hypoattenuation in periventricular white matter.

- No acute hemorrhage.

IMPRESSION:
Chronic microvascular ischemic disease."

Related Guides

LLM extractions playbook — get the prompt for your taxonomy, see the upload format, and find help when imports fail.