Course-Map Review Workflow & Full-Degree Modeling (issue #1029)
CourseLeaf-only v1 of the #1029 plan: generated degree maps are validated, filled to the degree total, and gated behind a review-by-exception workflow.
State machine
course_maps.review_status: needs_review ↔ published
(migration 20260610_165_course_map_review_workflow.sql).
- Default
published— user-uploaded maps have no generation pipeline to gate; existing rows keep their visibility. - The program generator (
POST /api/admin/programs/:id/generate-map) runs validation gates after generating. Anyerror-severity flag →needs_review; clean maps publish immediately (warnings don’t block). Flags are stored incourse_maps.validation_flags(jsonb array). - Publish gate: share-link creation (
POST /api/course-maps/:id/shares) returns 409NEEDS_REVIEWfor any map notpublished, and the public share resolver (GET /api/shares/:token) returns 404 for demoted maps — existing links stop serving when a map is pulled back for review. The map view and editor show a draft banner with the flags. - Admins publish from
/admin/review(callsPOST /api/admin/review/maps/:id/publish, which stampsreviewed_by/reviewed_at). Publishing with outstanding flags is allowed — the human review is the override, and the backfill respects it (see below). - Admins can view and edit any map regardless of owner (
GET/PATCH /api/course-maps/:idare owner-or-admin); delete and share minting stay owner-only.
Validation gates (services/map-validation.ts)
Error flags (block auto-publish): unknown_degree_total, major_only_total
(parsed table total < 75% of degree total — the Skidmore 43-credit case),
credits_below_total, no_prereq_edges (≥5 real courses, zero prereq
edges), implausible_total (backfill only, <90 credits, bachelor maps
only). Warning flags (informational): missing_gen_ed,
filled_free_electives. The full code list is the ValidationFlagCode
union in @course-map/shared.
Degree total resolution (resolveDegreeTotal)
The parsed total_credits from a CourseLeaf program table is only trusted
for bachelor’s degrees when ≥90 — a major-only page reports the major’s
hours, not the degree’s. Fallback order: plausible parsed total →
institutions.default_degree_total → 120. Minors/certs/masters keep their
parsed total with no fallback defaults, and never receive fill: a minor
whose ingested rows fall short of its declared total means the parser missed
requirements, so the map gets credits_below_total and routes to review
instead of being padded with electives a minor doesn’t have. For bachelor
maps the generator closes gaps of ≥1 credit with “Free Elective” placeholder
slots, split into ~3-credit slots when larger than 6 credits — whole-credit
gaps always split into whole-credit slots (10 → 4+3+3, never 3.4+3.3+3.3).
Gen-ed layer
A catalog’s general-education page is ingested once per catalog and stored
as a GEN-ED pseudo-program (degree_type='GenEd', hidden from the program
picker) with all requirements forced to general_education; the source URL
is recorded on institution_catalogs.gen_ed_url for a future re-ingest flow.
Automatic (preferred): onboarding’s final phase (stepIngestGenEd) scans
the discovered departments for a gen-ed-looking slug or name
(core-curriculum, general-education, university-core, …) and ingests
the match as the gen-ed layer, logging the outcome in the job log. It is
idempotent (skips when a GEN-ED program already exists for the catalog) and
non-fatal: when nothing matches, the job log says so and bachelor maps carry
missing_gen_ed until the layer is added manually.
Manual (fallback): the program-ingest form on /admin/programs has a
“gen-ed / all-college requirements” checkbox (POST /api/admin/programs/ingest with gen_ed: true) for catalogs where
detection misses or the page needs replacing.
Every subsequent generate-map call for a bachelor’s program in that catalog merges the GEN-ED rows after the major’s rows (deduped by course id). Non-bachelor credentials (minors, certs, masters) are self-contained and never receive the gen-ed merge. A failed gen-ed lookup fails the generate request — it is never silently treated as “no gen-ed page”.
Crawl resilience
- Unknown CourseLeaf variants: the deterministic parser knows three
template shapes (classic / split-detail / cobj). When a page has
.courseblockmarkup that no variant matches, the crawl hands the blocks’ text to Gemini 2.5 Flash (llm-course-extract.ts), validates the output strictly (code shape, credit bounds, majority-rejection), and logs a WARN per page: “unknown CourseLeaf variant — LLM fallback extracted courses; add a deterministic parser variant.” Capped at 25 pages per ingestion run (~$0.50 worst case); spend is recorded incost_ledger(metadata.kind = 'courseleaf-llm-fallback'). The WARN is the signal to codify a new variant and re-crawl. - Rate limiting (SHSU incident): the program-discovery walk now spaces fetches (~400 ms + jitter), retries a rejected fetch once after a 3 s cool-down, and aborts the walk after 12 consecutive rejections with an operator-visible message — courses already ingested are kept and a later onboarding re-run finishes the programs.
- Duplicate listings: the same course code on multiple crawled pages (undergrad + grad description trees) is deduped before the registry upsert, keeping the fullest listing.
- Non-program pages (second SHSU incident): a candidate page that fetches
fine but has no
sc_courselisttable is classifiedNOT_A_PROGRAM, notFETCH_FAILED— it is never retried, never counts toward the rate-limit breaker, and shows up in the job log as “N non-program pages skipped” instead of ✗ failures. (Before this split, college landing pages walked as program candidates masqueraded as a rate-limit storm and falsely tripped the breaker.) At depth 0 a non-program page is treated as a possible container and gets one level of descent: its children become program candidates (capped at 30 containers per department). - Hybrid catalog trees (SHSU): the level-page hub walk
(
/undergraduate/colleges-academic-departments/) now always runs — additively when the catalog root also has real colleges — instead of only as a zero-colleges fallback. Hub-shaped department slugs (college-departments) are promoted to their children when a sampled child has children of its own, so trees nested college → department → program land within reach of the walk. Course-description departments are excluded from the program walk entirely (they’re crawl seeds).
Minor-as-overlay
A minor is not a standalone plan — it overlays a major’s map. Generated maps
keep their major in course_maps.source_program_id; attached overlays
(degree types Minor, Cert, Concentration, Track) live in the
course_map_minors junction (migration 166).
- UI: on a generated map’s view page, admins see an “Add a minor” panel (hidden for non-admins, non-generated maps, and catalogs with no overlay programs). Applying regenerates the combined plan.
- API:
GET /api/admin/programs/addable-minors/:mapIdlists overlay programs from the map’s catalog not yet attached;POST /api/admin/programs/:minorId/apply-minor({course_map_id}) attaches and regenerates. - Merge semantics: requirement rows merge major → gen-ed → overlays, deduped by course id, so double-counted courses appear once. The degree total stays the major’s; free-elective fill shrinks as overlay courses consume elective slots — how real plans absorb a minor.
- Regeneration replaces the layout (manual edits are rebuilt from the
catalog) and clears
reviewed_by/reviewed_atbefore re-running the validation gates, so changed content can’t ride a stale human approval. - Idempotent/self-healing: the attach is an upsert and regeneration always reads back the full attached set, so re-applying after a partial failure converges. An invalid overlay (wrong catalog, non-overlay degree type) is detached again on rejection.
Backfill
POST /api/admin/review/backfill-validation (button on /admin/review)
re-validates all maps with extraction_model='program-to-coursemap-v1'
using doc-level checks and pulls failing unreviewed published maps back
to needs_review. Idempotent and safe to re-run anytime; primarily needed
once after this feature first ships. It skips maps a human has published
(reviewed_at set) so it never undoes an admin’s override, and it joins the
source program’s degree_type so generated minor/cert maps don’t
false-positive on the 90-credit floor. Response reports checked,
flagged, skipped_reviewed, skipped_no_doc, and update_failures
(failed rows stay in their prior state — retry on failures).
Operational notes
- Migration 165 is backward-compatible: additive columns with defaults plus
a widening of the
catalog_programsdegree_type CHECK (addsGenEd). Safe to apply before the Node rebuild on 10.1.1.4. - Per-school setup for a new CourseLeaf institution: set
institutions.default_degree_totalif not 120, ingest the gen-ed page once per catalog year, then generate maps per program.