Skip to content

Course-Map Review Workflow & Full-Degree Modeling (issue #1029)

CourseLeaf-only v1 of the #1029 plan: generated degree maps are validated, filled to the degree total, and gated behind a review-by-exception workflow.

State machine

course_maps.review_status: needs_reviewpublished (migration 20260610_165_course_map_review_workflow.sql).

  • Default published — user-uploaded maps have no generation pipeline to gate; existing rows keep their visibility.
  • The program generator (POST /api/admin/programs/:id/generate-map) runs validation gates after generating. Any error-severity flag → needs_review; clean maps publish immediately (warnings don’t block). Flags are stored in course_maps.validation_flags (jsonb array).
  • Publish gate: share-link creation (POST /api/course-maps/:id/shares) returns 409 NEEDS_REVIEW for any map not published, and the public share resolver (GET /api/shares/:token) returns 404 for demoted maps — existing links stop serving when a map is pulled back for review. The map view and editor show a draft banner with the flags.
  • Admins publish from /admin/review (calls POST /api/admin/review/maps/:id/publish, which stamps reviewed_by/reviewed_at). Publishing with outstanding flags is allowed — the human review is the override, and the backfill respects it (see below).
  • Admins can view and edit any map regardless of owner (GET/PATCH /api/course-maps/:id are owner-or-admin); delete and share minting stay owner-only.

Validation gates (services/map-validation.ts)

Error flags (block auto-publish): unknown_degree_total, major_only_total (parsed table total < 75% of degree total — the Skidmore 43-credit case), credits_below_total, no_prereq_edges (≥5 real courses, zero prereq edges), implausible_total (backfill only, <90 credits, bachelor maps only). Warning flags (informational): missing_gen_ed, filled_free_electives. The full code list is the ValidationFlagCode union in @course-map/shared.

Degree total resolution (resolveDegreeTotal)

The parsed total_credits from a CourseLeaf program table is only trusted for bachelor’s degrees when ≥90 — a major-only page reports the major’s hours, not the degree’s. Fallback order: plausible parsed total → institutions.default_degree_total → 120. Minors/certs/masters keep their parsed total with no fallback defaults, and never receive fill: a minor whose ingested rows fall short of its declared total means the parser missed requirements, so the map gets credits_below_total and routes to review instead of being padded with electives a minor doesn’t have. For bachelor maps the generator closes gaps of ≥1 credit with “Free Elective” placeholder slots, split into ~3-credit slots when larger than 6 credits — whole-credit gaps always split into whole-credit slots (10 → 4+3+3, never 3.4+3.3+3.3).

Gen-ed layer

A catalog’s general-education page is ingested once per catalog and stored as a GEN-ED pseudo-program (degree_type='GenEd', hidden from the program picker) with all requirements forced to general_education; the source URL is recorded on institution_catalogs.gen_ed_url for a future re-ingest flow.

Automatic (preferred): onboarding’s final phase (stepIngestGenEd) scans the discovered departments for a gen-ed-looking slug or name (core-curriculum, general-education, university-core, …) and ingests the match as the gen-ed layer, logging the outcome in the job log. It is idempotent (skips when a GEN-ED program already exists for the catalog) and non-fatal: when nothing matches, the job log says so and bachelor maps carry missing_gen_ed until the layer is added manually.

Manual (fallback): the program-ingest form on /admin/programs has a “gen-ed / all-college requirements” checkbox (POST /api/admin/programs/ingest with gen_ed: true) for catalogs where detection misses or the page needs replacing.

Every subsequent generate-map call for a bachelor’s program in that catalog merges the GEN-ED rows after the major’s rows (deduped by course id). Non-bachelor credentials (minors, certs, masters) are self-contained and never receive the gen-ed merge. A failed gen-ed lookup fails the generate request — it is never silently treated as “no gen-ed page”.

Crawl resilience

  • Unknown CourseLeaf variants: the deterministic parser knows three template shapes (classic / split-detail / cobj). When a page has .courseblock markup that no variant matches, the crawl hands the blocks’ text to Gemini 2.5 Flash (llm-course-extract.ts), validates the output strictly (code shape, credit bounds, majority-rejection), and logs a WARN per page: “unknown CourseLeaf variant — LLM fallback extracted courses; add a deterministic parser variant.” Capped at 25 pages per ingestion run (~$0.50 worst case); spend is recorded in cost_ledger (metadata.kind = 'courseleaf-llm-fallback'). The WARN is the signal to codify a new variant and re-crawl.
  • Rate limiting (SHSU incident): the program-discovery walk now spaces fetches (~400 ms + jitter), retries a rejected fetch once after a 3 s cool-down, and aborts the walk after 12 consecutive rejections with an operator-visible message — courses already ingested are kept and a later onboarding re-run finishes the programs.
  • Duplicate listings: the same course code on multiple crawled pages (undergrad + grad description trees) is deduped before the registry upsert, keeping the fullest listing.
  • Non-program pages (second SHSU incident): a candidate page that fetches fine but has no sc_courselist table is classified NOT_A_PROGRAM, not FETCH_FAILED — it is never retried, never counts toward the rate-limit breaker, and shows up in the job log as “N non-program pages skipped” instead of ✗ failures. (Before this split, college landing pages walked as program candidates masqueraded as a rate-limit storm and falsely tripped the breaker.) At depth 0 a non-program page is treated as a possible container and gets one level of descent: its children become program candidates (capped at 30 containers per department).
  • Hybrid catalog trees (SHSU): the level-page hub walk (/undergraduate/colleges-academic-departments/) now always runs — additively when the catalog root also has real colleges — instead of only as a zero-colleges fallback. Hub-shaped department slugs (college-departments) are promoted to their children when a sampled child has children of its own, so trees nested college → department → program land within reach of the walk. Course-description departments are excluded from the program walk entirely (they’re crawl seeds).

Minor-as-overlay

A minor is not a standalone plan — it overlays a major’s map. Generated maps keep their major in course_maps.source_program_id; attached overlays (degree types Minor, Cert, Concentration, Track) live in the course_map_minors junction (migration 166).

  • UI: on a generated map’s view page, admins see an “Add a minor” panel (hidden for non-admins, non-generated maps, and catalogs with no overlay programs). Applying regenerates the combined plan.
  • API: GET /api/admin/programs/addable-minors/:mapId lists overlay programs from the map’s catalog not yet attached; POST /api/admin/programs/:minorId/apply-minor ({course_map_id}) attaches and regenerates.
  • Merge semantics: requirement rows merge major → gen-ed → overlays, deduped by course id, so double-counted courses appear once. The degree total stays the major’s; free-elective fill shrinks as overlay courses consume elective slots — how real plans absorb a minor.
  • Regeneration replaces the layout (manual edits are rebuilt from the catalog) and clears reviewed_by/reviewed_at before re-running the validation gates, so changed content can’t ride a stale human approval.
  • Idempotent/self-healing: the attach is an upsert and regeneration always reads back the full attached set, so re-applying after a partial failure converges. An invalid overlay (wrong catalog, non-overlay degree type) is detached again on rejection.

Backfill

POST /api/admin/review/backfill-validation (button on /admin/review) re-validates all maps with extraction_model='program-to-coursemap-v1' using doc-level checks and pulls failing unreviewed published maps back to needs_review. Idempotent and safe to re-run anytime; primarily needed once after this feature first ships. It skips maps a human has published (reviewed_at set) so it never undoes an admin’s override, and it joins the source program’s degree_type so generated minor/cert maps don’t false-positive on the 90-credit floor. Response reports checked, flagged, skipped_reviewed, skipped_no_doc, and update_failures (failed rows stay in their prior state — retry on failures).

Operational notes

  • Migration 165 is backward-compatible: additive columns with defaults plus a widening of the catalog_programs degree_type CHECK (adds GenEd). Safe to apply before the Node rebuild on 10.1.1.4.
  • Per-school setup for a new CourseLeaf institution: set institutions.default_degree_total if not 120, ingest the gen-ed page once per catalog year, then generate maps per program.