Skip to content

APA-Compliant Table Formatting β€” Implementation Plan

Goal

Add an optional β€œAPA Table Style” output format that reformats extracted tables to comply with APA 7th Edition table guidelines. This is a user-selectable option, not the default β€” most users want tables to match the original PDF.

APA 7th Edition Table Requirements

  1. Table number β€” bold, flush left, above table (β€œTable 1”)
  2. Table title β€” italic, flush left, below number, title case
  3. Borders β€” horizontal rules only: top border, header separator, bottom border
  4. No vertical lines anywhere
  5. No cell shading or background colors
  6. Column headers β€” bold, centered or left-aligned
  7. Body text β€” left-aligned; numbers right-aligned
  8. Table notes β€” below bottom border, flush left, in order: general note, specific notes (superscript letters), probability notes
  9. Font β€” same as document body (no special table font)
  10. Spacing β€” double-spaced (or single-spaced if journal permits)

Current Architecture

PDF β†’ Struct-tree extractor (zero-cost) or Vision extractor (AI)
β†’ Semantic HTML (<table>/<thead>/<tbody>/<caption>/<th scope>)
β†’ CSS Designer (LLM-based styling)
β†’ Output (HTML / Markdown / PDF)

Key files

FileRole
workers/api/src/services/struct-table-extractor.tsStruct-tree table extraction, inline styles, STRUCT_TABLE_CSS, assembleStructTableHtml()
workers/api/src/services/vision-table-extractor/prompts.tsVision extraction prompt (Claude generates HTML tables)
workers/api/src/services/css-designer.tsLLM-based CSS generation (row shading, header backgrounds)
workers/api/src/routes/convert.tsConversion router β€” chooses extractor, assembles output
workers/api/src/services/html-to-markdown.tsconvertTable() β€” markdown output
apps/web/src/app/dashboard/page.tsxDashboard UI β€” conversion options

Current gaps

  • No table numbering (sequence not tracked)
  • Inline color/border styles baked into extractor output
  • CSS designer auto-applies row shading and header backgrounds
  • No cell-type detection (numeric vs text)
  • Table notes/footnotes not extracted

Implementation Plan

Phase 1: User Option & Plumbing

Add apaTableStyle option to the conversion pipeline.

  1. @accessible-pdf/shared types β€” Add apaTableStyle?: boolean to conversion options schema
  2. Dashboard UI β€” Add toggle in conversion options: β€œFormat tables in APA style”
  3. convert.ts β€” Pass apaTableStyle flag through to table extraction and assembly functions
  4. css-designer.ts β€” Accept apaTableStyle param; when true, use APA CSS template instead of default table styling

Phase 2: APA CSS Preset

Create a dedicated APA table stylesheet.

File: workers/api/src/services/apa-table-css.ts

/* APA 7th Edition Table Styling */
table {
border-collapse: collapse;
width: 100%;
font-family: inherit;
margin: 1.5em 0;
}
/* Table number: bold, flush left */
.apa-table-number {
font-weight: 700;
margin-bottom: 0.25em;
}
/* Table title: italic, flush left */
table caption,
.apa-table-title {
font-style: italic;
text-align: left;
caption-side: top;
margin-bottom: 0.5em;
}
/* Horizontal rules only */
table { border-top: 2px solid black; border-bottom: 2px solid black; }
thead { border-bottom: 1px solid black; }
/* No vertical borders, no shading */
th, td {
border-left: none;
border-right: none;
background: none !important;
padding: 4px 12px;
}
th { font-weight: 700; }
/* Right-align numeric columns (applied via class) */
.apa-numeric { text-align: right; }
/* Table notes */
.apa-table-notes {
font-size: 0.9em;
margin-top: 0.5em;
border-top: none;
}

Phase 3: Strip Inline Styles

When apaTableStyle is true, remove conflicting inline styles from extracted HTML.

In struct-table-extractor.ts:

  1. Add a post-processing function stripNonApaStyles(html: string) that removes:
    • background-color from all style attributes
    • border-left, border-right from cells
    • color from cells (use document default)
    • Struct CSS classes (.struct-section-header, .struct-summary backgrounds)
  2. Call this in assembleStructTableHtml() when APA mode is on

In vision-table-extractor/prompts.ts:

  1. Add an APA-specific instruction addendum:
    • β€œDo not include any inline styles, background colors, or border styles”
    • β€œUse <caption> for the table title only β€” do not include a table number”

Phase 4: Table Numbering

Track table sequence and prepend β€œTable N” above each table.

In struct-table-extractor.ts and vision-table-extractor/index.ts:

  1. Accept a tableStartNumber param (default 1)
  2. Wrap each table in:
    <div class="apa-table-wrapper">
    <p class="apa-table-number">Table {n}</p>
    <table>
    <caption>Descriptive title here</caption>
    ...
    </table>
    </div>
  3. Return the next table number so sequential pages maintain count

In convert.ts:

  1. Pass running counter through the page assembly loop

Phase 5: Numeric Column Detection

Heuristic to detect and right-align numeric columns.

Add detectNumericColumns(tableHtml: string) utility:

  1. Parse the HTML table
  2. For each column, check if >70% of body cells match numeric patterns:
    • Integers, decimals, percentages, currency ($, €)
    • Negative numbers, parenthetical negatives
  3. Add class="apa-numeric" to matching <td> cells
  4. Run this as a post-processing step in the assembly function

Phase 6: Table Notes (Future)

Extract and render table footnotes. This is the hardest part β€” defer to a later iteration.

Approach:

  1. In vision extractor, add prompt instruction: β€œIf there are notes below the table (marked with superscript letters or asterisks), extract them as a <div class='apa-table-notes'> block after the </table> tag”
  2. In struct extractor, detect paragraphs immediately following a table that contain superscript markers
  3. Classify notes into: general (prefixed β€œNote.”), specific (superscript letters), probability (asterisks with p-values)
  4. Render in APA order below the table

Testing

  1. Unit tests β€” stripNonApaStyles(), detectNumericColumns(), table numbering
  2. Snapshot tests β€” Before/after HTML for known table PDFs
  3. Visual regression β€” Render APA-formatted tables to PDF, compare against APA manual examples
  4. Test PDFs β€” Use academic papers with complex tables (multi-level headers, merged cells, footnotes)

Rollout

  1. Ship behind feature flag (apaTableStyle conversion option)
  2. Add to UI as toggle in conversion settings
  3. Document in user docs: β€œAPA Table Formatting” guide
  4. Announce to academic/institutional users

Scope Summary

PhaseEffortDelivers
1. Option & plumbing1 dayToggle flows through pipeline
2. APA CSS preset1 dayClean horizontal-only styling
3. Strip inline styles1-2 daysNo colors/shading/vertical borders
4. Table numbering1 day”Table 1”, β€œTable 2”, etc.
5. Numeric detection1-2 daysRight-aligned number columns
6. Table notes3-5 daysFootnote extraction (defer)
MVP (Phases 1-4)4-5 days
Full (Phases 1-6)8-12 days