Field definitions and constraints for the report-text CSV uploaded via "Upload Reports CSV."
The annotator parses your reports into sentences deterministically and uses them as the source of truth for finding alignment. You don't pre-format reports; the splitter handles common variations natively. If a report can't be parsed, validation surfaces it before any data is committed — so you never start labeling on broken input.
| Column | Type | Constraints |
|---|---|---|
| record_id | text | Unique within the CSV. Auto-detected synonyms: id, report_id, case_id, accession. |
| report_text | text | The full report text including a FINDINGS: section. Auto-detected synonyms: report, text, findings, rad_deid_report. |
Annotation operates on the FINDINGS section only. Reports must contain a FINDINGS: header followed by content. Anything before FINDINGS (clinical history, technique, comparison) and anything after IMPRESSION is ignored.
A report whose FINDINGS section can't be located or that yields zero sentences will be flagged at upload — you'll see it in the validation panel with an actionable fix message before anything is committed.
One splitter handles multiple report styles uniformly. You don't pick a format; the parser absorbs variation. Every sentence in a section uniformly carries that section's header as a prefix in stored form (the UI strips it for display).
1. Prose paragraphs
Multi-sentence paragraphs separated by blank lines. Sentences split on period + uppercase boundary.
FINDINGS:
Intracranial structures:
Mild patchy hypoattenuation in periventricular white matter. No acute hemorrhage. No mass effect.
Osseous structures: No acute fracture.
2. Bulleted findings
Each bullet on its own line, separated by blank lines. Each bullet becomes its own sentence.
FINDINGS:
Brain Parenchyma:
- Mild patchy hypoattenuation in periventricular white matter.
- No acute hemorrhage.
- No mass effect.
3. Numbered impressions
Numbered lists (1., 2., …) split on the boundary between items.
1. Chronic small vessel ischemic changes.
2. No acute infarct.
3. Mild generalized atrophy.
The splitter recognizes section headers in two forms:
"Brain Parenchyma:" followed by content below."Brain Parenchyma: Mild patchy hypoattenuation..." on one line.
In both cases, every sentence within that section is stored with the header label prefixed (e.g., "Brain Parenchyma: - Mild patchy hypoattenuation..."). This is the integrity guarantee that makes templated reports work: a vascular CTA whose body text repeats the same observation under multiple per-vessel sub-headers (Celiac, SMA, IMA, etc.) produces unique sentences because each carries its vessel-specific section prefix.
When uploading extractions for templated reports, your source_text values must include the section header to match the splitter's view. See the LLM extractions playbook for details.
When the upload validation rejects rows, you'll see them inline with fix suggestions. Common cases:
| Error | Fix |
|---|---|
| Encoding error: unrecognized characters detected | Re-save the source CSV as UTF-8 (Excel: Save As → CSV UTF-8). |
| No parseable FINDINGS section | The report text doesn't contain a FINDINGS: header followed by content. Confirm the report is complete. |
| Zero sentences extracted from FINDINGS | The FINDINGS section is empty after parsing. Check that the section has actual content. |
| Duplicate ID | Each record_id must be unique within the CSV. Deduplicate upstream. |
The upload review panel offers a "Skip N invalid and import valid reports" button so you can move forward with the clean rows while keeping a record of what failed.
record_id,report_text
CASE-001,"FINDINGS:
Intracranial structures:
Mild patchy hypoattenuation in periventricular white matter. No acute hemorrhage. No mass effect.
Osseous structures:
No acute fracture.
IMPRESSION:
Chronic small vessel ischemic changes. No acute findings."
CASE-002,"FINDINGS:
Brain Parenchyma:
- Mild patchy hypoattenuation in periventricular white matter.
- No acute hemorrhage.
IMPRESSION:
Chronic microvascular ischemic disease."