APA-Compliant Table Formatting β Implementation Plan
Goal
Add an optional βAPA Table Styleβ output format that reformats extracted tables to comply with APA 7th Edition table guidelines. This is a user-selectable option, not the default β most users want tables to match the original PDF.
APA 7th Edition Table Requirements
- Table number β bold, flush left, above table (βTable 1β)
- Table title β italic, flush left, below number, title case
- Borders β horizontal rules only: top border, header separator, bottom border
- No vertical lines anywhere
- No cell shading or background colors
- Column headers β bold, centered or left-aligned
- Body text β left-aligned; numbers right-aligned
- Table notes β below bottom border, flush left, in order: general note, specific notes (superscript letters), probability notes
- Font β same as document body (no special table font)
- Spacing β double-spaced (or single-spaced if journal permits)
Current Architecture
PDF β Struct-tree extractor (zero-cost) or Vision extractor (AI) β Semantic HTML (<table>/<thead>/<tbody>/<caption>/<th scope>) β CSS Designer (LLM-based styling) β Output (HTML / Markdown / PDF)Key files
| File | Role |
|---|---|
workers/api/src/services/struct-table-extractor.ts | Struct-tree table extraction, inline styles, STRUCT_TABLE_CSS, assembleStructTableHtml() |
workers/api/src/services/vision-table-extractor/prompts.ts | Vision extraction prompt (Claude generates HTML tables) |
workers/api/src/services/css-designer.ts | LLM-based CSS generation (row shading, header backgrounds) |
workers/api/src/routes/convert.ts | Conversion router β chooses extractor, assembles output |
workers/api/src/services/html-to-markdown.ts | convertTable() β markdown output |
apps/web/src/app/dashboard/page.tsx | Dashboard UI β conversion options |
Current gaps
- No table numbering (sequence not tracked)
- Inline color/border styles baked into extractor output
- CSS designer auto-applies row shading and header backgrounds
- No cell-type detection (numeric vs text)
- Table notes/footnotes not extracted
Implementation Plan
Phase 1: User Option & Plumbing
Add apaTableStyle option to the conversion pipeline.
@accessible-pdf/sharedtypes β AddapaTableStyle?: booleanto conversion options schema- Dashboard UI β Add toggle in conversion options: βFormat tables in APA styleβ
convert.tsβ PassapaTableStyleflag through to table extraction and assembly functionscss-designer.tsβ AcceptapaTableStyleparam; when true, use APA CSS template instead of default table styling
Phase 2: APA CSS Preset
Create a dedicated APA table stylesheet.
File: workers/api/src/services/apa-table-css.ts
/* APA 7th Edition Table Styling */table { border-collapse: collapse; width: 100%; font-family: inherit; margin: 1.5em 0;}
/* Table number: bold, flush left */.apa-table-number { font-weight: 700; margin-bottom: 0.25em;}
/* Table title: italic, flush left */table caption,.apa-table-title { font-style: italic; text-align: left; caption-side: top; margin-bottom: 0.5em;}
/* Horizontal rules only */table { border-top: 2px solid black; border-bottom: 2px solid black; }thead { border-bottom: 1px solid black; }
/* No vertical borders, no shading */th, td { border-left: none; border-right: none; background: none !important; padding: 4px 12px;}
th { font-weight: 700; }
/* Right-align numeric columns (applied via class) */.apa-numeric { text-align: right; }
/* Table notes */.apa-table-notes { font-size: 0.9em; margin-top: 0.5em; border-top: none;}Phase 3: Strip Inline Styles
When apaTableStyle is true, remove conflicting inline styles from extracted HTML.
In struct-table-extractor.ts:
- Add a post-processing function
stripNonApaStyles(html: string)that removes:background-colorfrom allstyleattributesborder-left,border-rightfrom cellscolorfrom cells (use document default)- Struct CSS classes (
.struct-section-header,.struct-summarybackgrounds)
- Call this in
assembleStructTableHtml()when APA mode is on
In vision-table-extractor/prompts.ts:
- Add an APA-specific instruction addendum:
- βDo not include any inline styles, background colors, or border stylesβ
- βUse
<caption>for the table title only β do not include a table numberβ
Phase 4: Table Numbering
Track table sequence and prepend βTable Nβ above each table.
In struct-table-extractor.ts and vision-table-extractor/index.ts:
- Accept a
tableStartNumberparam (default 1) - Wrap each table in:
<div class="apa-table-wrapper"><p class="apa-table-number">Table {n}</p><table><caption>Descriptive title here</caption>...</table></div>
- Return the next table number so sequential pages maintain count
In convert.ts:
- Pass running counter through the page assembly loop
Phase 5: Numeric Column Detection
Heuristic to detect and right-align numeric columns.
Add detectNumericColumns(tableHtml: string) utility:
- Parse the HTML table
- For each column, check if >70% of body cells match numeric patterns:
- Integers, decimals, percentages, currency (
$,β¬) - Negative numbers, parenthetical negatives
- Integers, decimals, percentages, currency (
- Add
class="apa-numeric"to matching<td>cells - Run this as a post-processing step in the assembly function
Phase 6: Table Notes (Future)
Extract and render table footnotes. This is the hardest part β defer to a later iteration.
Approach:
- In vision extractor, add prompt instruction: βIf there are notes below the table (marked with superscript letters or asterisks), extract them as a
<div class='apa-table-notes'>block after the</table>tagβ - In struct extractor, detect paragraphs immediately following a table that contain superscript markers
- Classify notes into: general (prefixed βNote.β), specific (superscript letters), probability (asterisks with p-values)
- Render in APA order below the table
Testing
- Unit tests β
stripNonApaStyles(),detectNumericColumns(), table numbering - Snapshot tests β Before/after HTML for known table PDFs
- Visual regression β Render APA-formatted tables to PDF, compare against APA manual examples
- Test PDFs β Use academic papers with complex tables (multi-level headers, merged cells, footnotes)
Rollout
- Ship behind feature flag (
apaTableStyleconversion option) - Add to UI as toggle in conversion settings
- Document in user docs: βAPA Table Formattingβ guide
- Announce to academic/institutional users
Scope Summary
| Phase | Effort | Delivers |
|---|---|---|
| 1. Option & plumbing | 1 day | Toggle flows through pipeline |
| 2. APA CSS preset | 1 day | Clean horizontal-only styling |
| 3. Strip inline styles | 1-2 days | No colors/shading/vertical borders |
| 4. Table numbering | 1 day | βTable 1β, βTable 2β, etc. |
| 5. Numeric detection | 1-2 days | Right-aligned number columns |
| 6. Table notes | 3-5 days | Footnote extraction (defer) |
| MVP (Phases 1-4) | 4-5 days | |
| Full (Phases 1-6) | 8-12 days |