Plan: `/data` shows everything that isn't gated, organised by tags

IMPLEMENTATION RULES: Before implementing this plan, read and follow:

WORKFLOW.md - The implementation process

PLANS.md - Plan structure and best practices

Status: Completed 2026-05-07 (Phases 1-5 shipped)

Last Updated: 2026-05-07 (Phase 1 closed: UIS PR #140 merged + GHCR republished; Phase 4 task 4.3 partial via PR #79; this update reflects the actual ship state vs the original plan).

What's done

Phase 1 — UIS schema exposure: ✅ closed 2026-05-07. UIS shipped the --schemas api_v1,marts,raw flag (per-app explicit opt-in, not the original "global default" framing). UIS PR #140 merged as f377fef; ghcr.io/helpers-no/uis-provision-host:latest @sha256:42cd40d5f66916a6f6071ab4d69fcf0080a2915b1cf93295bd3b169b8af42f31. Atlas's setup.md updated via PR #76. Schema-list flag thread documented end-to-end in talk.md (Messages 1-4 atlas + 1-3 uis). The reconfigure-already-deployed step is user-managed via the UIS tester CLI.
Phase 2 — manifest registry: ✅ shipped via PR #36 (catalogue 21 → 38 sources; manifest schema includes eu_theme, attribution, dimensions: block; recordIngestRun() lifecycle wrapper landed; _sources_manifest.csv + _sources_dimensions.csv seeds materialised at marts._sources_manifest / _sources_dimensions). Catalogue is now 41 sources after subsequent FHI / SSB / Bufdir / Cursor BG additions.
Phase 3 — meta marts + auto-wrap: ✅ shipped via PR #73 (3 new marts under models/marts/api/) + PR #77 (override-map → manifest.yml raw_tables: field refactor). api_v1.meta_sources (41 rows), api_v1.meta_endpoints (121 rows after refresh: 13 api_v1 + 61 marts + 47 raw), api_v1.meta_dimensions (215 rows). Lineage seed via new scripts/extract_lineage.py (129 edges). Tag inheritance uses union semantics; fact_kommune_indicators picks up 18 tags from its many indicator sources. mart_meta_dimensions cardinality enrichment deferred to a follow-up — see INVESTIGATE-mart-meta-dimensions-cardinality.md (PR #78) for the design.
Phase 4 — frontend rewrite: ✅ closed 2026-05-07. PR #79 shipped sources index. PR #85 shipped task 4.1 (full /data rewrite with tag-filter sidebar against meta_endpoints, 119 cards, 6 namespace-grouped facets, faceted-search counts) + the /data/[endpoint]/ → /data/[schema]/[table]/ route restructure (Accept-Profile dispatch) + the cache-no-store fix for first-load-empty + homepage copy update (4.4). PR #88 (this PR) shipped task 4.3 (per-source detail page at /data/sources/[source_id]/page.tsx rendering source metadata + freshness + raw ingest link + derived endpoints joined live against the marts.lineage seed) + task 4.5 (atlas-frontend README refresh covering every route, the lib's acceptProfile option, and the bookmarkable Tag URL patterns). Task 4.2 closed as no-op — the tag-driven catalog reads dynamic schemas from meta_endpoints, the typed api-types.ts union is no longer load-bearing for discovery; regen is now an optional contributor maintenance step.
Phase 5 — docs: ✅ closed 2026-05-07 in PR #88. Four files updated: setup.md (manifest convention + corrected customer-frontend section), ingest-modules.md (expanded manifest workflow with heuristic warnings), developers/index.md (open-by-default + Accept-Profile + tag-filter URL pattern), atlas-data/ingest/src/sources/README.md (added programmatic-access callout pointing at api_v1.meta_sources).

Goal (unchanged)

Execute INVESTIGATE-customer-frontend-data-display.md. After this PLAN, the customer frontend's /data page shows every queryable endpoint across api_v1, marts, and raw schemas (everything that isn't private_marts), each tagged with provider, topic, geo, cadence, eu_theme, and layer. A filter sidebar lets users slice the catalogue by any combination of tags. A first-class sources list (/data/sources + api_v1.meta_sources) carries provider, upstream URL, last-ingested timestamp, and downstream-model count for every Atlas ingest source — currently 41, growing as the cloud-agent pipeline drains the backlog.

Investigation

INVESTIGATE-customer-frontend-data-display.md — settled the open-by-default principle, the per-source manifest.yml shape ([Q2]), the dbt-model-as-substrate path ([Q3]), and the multi-namespace tag UX ([Q4]). Phase 2.10 + 2.11 extended the namespace set with eu_theme: (DCAT-AP alignment) and the editorial dimensions: block.

Prerequisites

✅ PostgREST live with api_v1.* (PLAN-004 + UIS PLAN-002 — verified 2026-04-30).
✅ PostgREST also serves marts.* and raw.* via Accept-Profile header (UIS PR #140 — Phase 1 of this PLAN).
✅ Customer frontend with /data, /data/[endpoint], /data/[endpoint]/spec (PLAN-005 — shipped at 2266f21).
✅ raw.ingest_runs populated by every ingest module (lifecycle wrapper from Phase 2.8).
✅ api_v1.meta_sources / meta_endpoints / meta_dimensions live (Phase 3).

Blocks

None remaining — UIS Phase 1 dependency closed 2026-05-07.

The `manifest.yml` shape

Phase 2's deliverable. One file per source folder. All structured catalogue metadata lives here; per-source READMEs are prose-only (what the script does, quirks, TODOs, references). After commit, the manifest is human-authored — ingest runs do NOT modify it.

# atlas-data/ingest/src/sources/ssb-08764/manifest.yml
source_id: ssb-08764
upstream_id: "08764"
upstream_url: https://www.ssb.no/statbank/table/08764
upstream_title: "08764: Personer under 18 år i husholdninger med lavinntekt (EU- og OECD-skala). Antall og prosent (K) (B) 2005-2024"
description: "Ingestion module for SSB statistikkbanktabell 08764 — Personer under 18 år i husholdninger med lavinntekt (EU- og OECD-skala)."
publisher: Statistisk sentralbyrå
license: NLOD
license_url: https://data.norge.no/nlod/no/2.0
periodicity: P1Y
eu_theme: SOCI
attribution: "Kilde: Statistisk sentralbyrå, tabell 08764"

tags:
  provider: ssb
  topic: income
  geo: kommune
  cadence: annual

dimensions:
  - code: Region
    meaning: Region (national / fylke / kommune / bydel / historical)
    value_format: "Numeric code: 0 national, 2-digit fylke, 4-digit kommune, 6-digit bydel"
    notes: "~1036 codes when pulling full range"
  - code: ContentsCode
    meaning: Statistic measure
    value_format: 5 codes
    notes: "Personer (count under 18), EUskala50/EUskala60 (% under 18 below 50%/60% of median, EU scale), OECDskala50/OECDskala60 (same, OECD scale)."
  - code: Tid
    meaning: Year
    value_format: 4-digit year as text
    notes: 2005–2024 (20 years); default v2-beta response is latest year only

Required top-level fields:

Field	Purpose
`source_id`	Folder name; primary key; e.g. `ssb-08764`.
`upstream_id`	The upstream's own identifier (SSB table number, FHI dataset slug, etc.) so external developers can reconcile against upstream catalogues.
`upstream_url`	Canonical link to the source on the upstream's own site.
`upstream_title`	The source's authoritative title — usually Norwegian, sometimes bilingual. Gives developers something to search for in upstream tooling.
`description`	One paragraph framing the dataset for the customer-facing catalogue.
`publisher`	Institution that publishes the data (often = provider but sometimes different — e.g. an SSB table published on behalf of another body).
`license` + `license_url`	Critical for external developers building products. Default `NLOD` (Norwegian Licence for Open Government Data) for Norwegian public-sector sources; declare explicitly so consumers don't guess.
`periodicity`	ISO 8601 — `P1Y` annual, `P3M` quarterly, `P1M` monthly, `P1D` daily, `irregular` for ad-hoc / one-shot. More precise than the `cadence:` tag.
`eu_theme`	EU Publications Office Data Theme code (one of: `AGRI`, `ECON`, `EDUC`, `ENER`, `ENVI`, `GOVE`, `HEAL`, `INTR`, `JUST`, `REGI`, `SOCI`, `TECH`, `TRAN`). Aligns Atlas with Felles datakatalog (DCAT-AP `dcat:theme`). Auto-derived from `tags.topic` by `fill-manifest-todos.ts`; lookup table at `seeds/sources/eu_data_theme.csv`.
`attribution`	Citation string for academic / legal compliance (e.g. `Kilde: Statistisk sentralbyrå, tabell 08764`). External developers must use this when republishing or citing data.

The tags: map carries the four declared namespaces (provider, topic, geo, cadence) — exactly one value per namespace per source. The cadence: tag is the human-readable shorthand of periodicity (so users can filter by cadence:annual without writing ISO 8601).

The dimensions: list carries one entry per upstream dimension with code (upstream's own dimension name), meaning (short human-readable interpretation), value_format (encoding of values), notes (cardinality, gotchas). Hand-authored — this is editorial semantic content the catalogue can't compute. Phase 3's mart_meta_dimensions joins this with computed cardinality + example values from raw.* tables.

The layer: namespace is not declared here; it's derived per-endpoint in Phase 3 from the schema + dbt model path.

Plus a sibling change to capture upstream freshness: a new column on raw.ingest_runs named upstream_updated_at timestamptz (nullable). The shared recordIngestRun() wrapper at lib/ingest_run.ts extracts the upstream's own "updated" timestamp from the JSON-stat2 response and writes it. The lag between MAX(finished_at) (we ingested) and MAX(upstream_updated_at) (they published) is a meaningful signal in mart_meta_sources.

How a `manifest.yml` gets created — bootstrap once, human-authored after

Three-stage workflow:

(1) Skeleton — automatic. npm run sources:bootstrap-manifest -- <source_id> fetches upstream metadata and writes a starter manifest with source_id, upstream_id, upstream_url, upstream_title, publisher, periodicity, license (NLOD default for SSB / FHI / KOSTRA) populated. Other fields left as TODO.

(2) Auto-fillable fields — automatic. npm run sources:fill-manifest-todos parses the per-source README and fills description, attribution, and the four tags: namespaces (topic via regex first-match-wins; cadence derived from periodicity; geo via priority kommune > fylke > bydel; eu_theme derived from topic via static map). Idempotent — only fills TODO/empty fields.

(3) Editorial — hand-authored. The contributor authors the dimensions: list by hand (semantic content the catalogue can't derive). Reviews the auto-filled fields. Commits.

After commit the manifest is human-authored — ingest runs do NOT modify it. npm run ingest:<source_id> reads upstream data, writes rows to raw.<source_id>, captures upstream_updated_at to raw.ingest_runs — but does not touch manifest.yml. Avoids "PR diff has mystery edits from a CI run."

For the 21 existing sources (Phase 2.3): same flow in batch. SSB (14) + FHI (4) cover via the bootstrap script — that's 18 sources auto-bootstrapped. The 3 outliers (redcross-branches, frr, ssb-klass-* if treated separately from SSB) use MANUAL_OVERRIDES in fill-manifest-todos.ts. Dimension blocks were hand-authored in ~30 minutes by reading each README's ## Response shape section.

Phase 1: UIS-side schema exposure

Cross-repo coordination with the UIS contributor. Atlas's atlas-postgrest instance starts serving marts.* and raw.* alongside api_v1.*. UIS-side change is a one-time configure-postgrest patch; after it lands, every new table in those schemas is queryable automatically (the existing ALTER DEFAULT PRIVILEGES clause already auto-grants SELECT on new tables).

Tasks

1.1 Open a new round of cross-repo coordination via talk.md. Inaugural message from atlas to uis lays out the change asked for: extend PGRST_DB_SCHEMAS from api_v1 to api_v1,marts,raw; add matching GRANT USAGE ON SCHEMA marts, raw TO <app>_web_anon and GRANT SELECT ON ALL TABLES IN SCHEMA marts, raw TO <app>_web_anon plus ALTER DEFAULT PRIVILEGES IN SCHEMA marts, raw GRANT SELECT ON TABLES TO <app>_web_anon to configure-postgrest.sh. private_marts stays excluded.
1.2 UIS contributor responded + shipped. Six-message thread in talk.md settled the design (UIS pushed back on the global-default framing in their Message 1; atlas accepted in Message 3 — the per-app --schemas flag avoids the GRANT-failure trap for non-Atlas consumers and keeps dbt-isms out of the platform tool). UIS PR #140 merged as f377fef on 2026-05-07; State Matrix dispatch with 5 reconcile paths; --schema (singular) removed entirely; PGRST_DB_SCHEMAS lives on the per-app secret + read by deploy template via secretKeyRef so configure/deploy can't drift.
1.3 Atlas-side validation passed against the contributor's local-image deployment (talk.md Message 4) — six spot-checks across api_v1 / marts / raw plus the privacy-boundary check confirming private_marts.frr_resources returns 404 by default and 406 with Accept-Profile: private_marts. Atlas's setup.md updated via PR #76 (configure line gains --schemas api_v1,marts,raw). The user's ./uis pull + reconfigure step is the final ack — runs through their UIS tester CLI; expected "status": "already_configured" no-op since the contributor's local image had identical semantics.

Outcome (Phase 1 — closed 2026-05-07): schema-list extension landed end-to-end. Single-day round-trip from atlas Message 4 (validation) to UIS Message 3 (PR + GHCR rebuild). PostgREST now serves marts.* and raw.* via Accept-Profile in addition to the default api_v1; private schemas (private_raw, private_marts) stay excluded by design. GHCR :latest SHA: 42cd40d5f66916a6f6071ab4d69fcf0080a2915b1cf93295bd3b169b8af42f31.

Operational gotcha logged for Phase 4 (talk40 Round 4 closeout): PostgREST routes header-less requests to the default schema only — the first one in --schemas, i.e. api_v1. To reach marts.* or raw.*, callers send Accept-Profile: <schema> on each request. Naive curl /dim_kommune returns 404 because PostgREST resolves it as api_v1.dim_kommune (which doesn't exist) — that's correct routing, not a misconfiguration. Symmetric for the OpenAPI document: curl / advertises only the default schema's ~14 paths; curl -H 'Accept-Profile: marts' / advertises ~64 marts paths. Sum across profiles ≈ 123, which matches what api_v1.meta_endpoints carries (119 rows after the latest regen). The Phase 4.1 frontend rewrite must send Accept-Profile per row, keyed off meta_endpoints.schema.

Validation

# Marts table is reachable via Accept-Profile
curl -fsS -H 'Accept-Profile: marts' "http://api-atlas.localhost/dim_kommune?limit=3" | jq 'length'      # → 3

# Raw table is reachable via Accept-Profile
curl -fsS -H 'Accept-Profile: raw' "http://api-atlas.localhost/ssb_08764?limit=2" | jq 'length'          # → 2

# private_marts.* is NOT reachable, even with explicit profile
curl -sS -o /dev/null -w "%{http_code}\n" "http://api-atlas.localhost/frr_resources"                                            # → 404
curl -sS -o /dev/null -w "%{http_code}\n" -H 'Accept-Profile: private_marts' "http://api-atlas.localhost/frr_resources"          # → 406

# OpenAPI: default profile (api_v1) advertises ~14 paths; multi-schema sum is exposed via meta_endpoints
curl -sS "http://api-atlas.localhost/" | jq '.paths | keys | length'                                                             # → 14 (api_v1 default)
curl -sS -i "http://api-atlas.localhost/meta_endpoints?limit=0" -H 'Prefer: count=exact' | grep -i 'content-range'                # → */119

Done when

marts.* and raw.* tables are queryable via api-atlas.localhost.
private_marts.* returns 404 (still gated).
Customer frontend's existing /data catalog auto-discovers the new endpoints on next page load (no code change needed; introspection-driven design from PLAN-005 handles it).

Phase 2: Per-source `manifest.yml` registry

Promote the existing Markdown table at atlas-data/ingest/src/sources/README.md and the per-source READMEs into structured per-source manifest.yml files. First-pass tag curation across the 21 existing sources.

Tasks

Validation

# Every source folder has a manifest.yml (live count)
ls atlas-data/ingest/src/sources/*/manifest.yml | wc -l                       # → live count from current catalogue

# No remaining TODOs after the auto-fill pass
grep -l "TODO" atlas-data/ingest/src/sources/*/manifest.yml | wc -l           # → 0

# Topic distribution looks plausible (no "ngo-supply" misclassifications across SSB/FHI)
grep -h "  topic:" atlas-data/ingest/src/sources/*/manifest.yml | sort | uniq -c

# Re-running fill-manifest-todos is a no-op (idempotent — it only fills TODO/empty fields)
npm run sources:fill-manifest-todos                                           # → "filled 0 of 21"

# Build the seed CSV; validation fails loudly if any required field is missing
cd atlas-data/dbt && python scripts/build_sources_seed.py
ls -la seeds/sources/manifest.csv                                             # exists

# All four declared tag namespaces present per row
python -c "import csv; rows=list(csv.DictReader(open('seeds/sources/manifest.csv'))); print(all(set(t.split(':')[0] for t in r['tags'].split(',')) >= {'provider','topic','geo','cadence'} for r in rows))"  # → True

# Migration applied
psql "$DATABASE_URL" -c "\d raw.ingest_runs" | grep upstream_updated_at        # column visible

# SSB ingest modules write upstream_updated_at on next run
npm run ingest:ssb-08764
psql "$DATABASE_URL" -c "select source_slug, upstream_updated_at from raw.ingest_runs where source_slug='ssb-08764' order by run_id desc limit 1"  # non-null

# dbt seed loads the manifest
uv run --env-file ../ingest/.env dbt seed --select _sources_manifest          # success

Done when

All 21 source folders contain a valid manifest.yml with all eight required top-level fields and four tag namespaces.
No TODO placeholders remain in any manifest.
bootstrap-manifest.ts + fill-manifest-todos.ts are both idempotent (re-running them is a no-op against a fully-populated state).
build_sources_seed.py produces a clean CSV; validation rejects missing fields.
raw.ingest_runs.upstream_updated_at migration applied; nullable.
SSB ingest modules populate upstream_updated_at on runs (14 sources).
dbt seed loads the manifest into marts._sources_manifest.
The legacy Markdown table at atlas-data/ingest/src/sources/README.md is either auto-generated from the YAMLs (preferred) or replaced with a pointer.

Phase 3: `marts.meta_sources` + `marts.meta_endpoints` + `marts.meta_dimensions` dbt models

The joins. After this phase, three new mart_* views exist (and via the PLAN-004 generator, three new api_v1.meta_* wrappers) that carry the full tagged catalogue: per-source metadata + freshness, per-endpoint inventory + tag inheritance, and per-dimension editorial semantics joined with computed cardinality.

Tasks

3.1 Add atlas-data/dbt/models/marts/api/mart_meta_sources.sql:
- From: marts._sources_manifest (Phase 2 seed; currently 38 rows, growing)
- Left-join to raw.ingest_runs aggregates per source:
  - last_ingested_at: MAX(finished_at) WHERE exit_code = 0
  - last_upstream_update_at: MAX(upstream_updated_at) WHERE exit_code = 0 (nullable — only populated for sources whose ingest module captures it)
  - latest_row_count: rows_parsed from the most recent successful run
  - total_runs: COUNT(*) FILTER (WHERE exit_code = 0)
- Add downstream_model_count: count of distinct downstream models from the lineage seed (Phase 3.3).
- Output columns: source_id, upstream_id, upstream_url, upstream_landing_page, upstream_title, description, publisher, license, license_url, periodicity, eu_theme, attribution, tags (text[]), last_ingested_at, last_upstream_update_at, latest_row_count, total_runs, downstream_model_count.
- Add full schema.yml description per column (PLAN-001's gate enforces this).
3.2 Add atlas-data/dbt/models/marts/api/mart_meta_endpoints.sql:
- From: information_schema.tables filtered to table_schema in ('api_v1','marts','raw') (and not in ('private_marts') defensively). Skip marts._* private seeds (_sources_manifest, _sources_dimensions, eu_data_theme, lineage).
- Output columns: endpoint, schema, table, tags (text[]), row_count (via dynamic SQL or a daily-refreshed snapshot — see 3.3 for lineage), is_public_api (boolean: schema='api_v1')
- Tag derivation: layer:<schema> from the schema; union of all provider: / topic: / geo: / cadence: / eu_theme: tags from the source(s) the endpoint derives from (via the lineage seed in 3.3). Union over intersection: a mart_* derived from 17 indicator sources picks up every source's tag — easier to filter, "this mart involves something annual" is a more useful signal than "this mart is purely annual." Decision recorded inline so 3.2 doesn't re-litigate it.
- Add full schema.yml description per column.
3.3 Add atlas-data/dbt/scripts/extract_lineage.py that reads target/manifest.json after dbt parse, walks the dependency graph from each api_v1.* and marts.* model up to its root raw.* ancestors, and emits a dbt seed CSV at seeds/sources/lineage.csv with rows (model_name, source_id) — one row per (model, source) edge. Multiple rows per model when it derives from multiple sources (e.g. fact_kommune_indicators → many indicator sources). Hardcoded multi-table override map shipped via PR #73; moved into manifest.yml's raw_tables: field via PR #77 so the script is now generic.
[~] 3.4 Add atlas-data/dbt/models/marts/api/mart_meta_dimensions.sql. Editorial pass-through shipped (PR #73); cardinality / example_values / null_count columns deferred — design in INVESTIGATE-mart-meta-dimensions-cardinality.md (PR #78).
- From: marts._sources_dimensions (Phase 2.11 seed; ~198 rows = sum of dimensions across all 38 sources). Left-joined to per-(source, dimension) introspection of the corresponding raw.* table.
- For every (source_id, dim_code) pair, compute against the raw table:
  - cardinality: COUNT(DISTINCT <dim_column>) — how many unique values appear.
  - example_values: array of up to ~10 distinct values (sorted by frequency desc, then alpha) for users to see what the dimension actually contains.
  - null_count: rows where the dim value is null (should be 0 for non-degenerate dims).
- Output columns: source_id, code (upstream dim name), meaning, value_format, notes (from the seed), cardinality, example_values (text[]), null_count. Frontend renders "what each column means × what values it actually contains" in one card.
- Implementation note: introspecting raw.* tables means generating one SELECT per (source × dim) pair via dbt Jinja iteration over the seed contents. Use run_query() at parse time to read the seed; build a per-source UNION ALL. Keep an eye on dbt-Core's parse-time query budget — if it slows, fall back to a static CTE per source the seed-gen script emits.
- Add full schema.yml description per column.
3.5 Run ./regenerate-api-v1.sh + ./apply-api-v1.sh. The PLAN-004 generator picks up mart_meta_sources, mart_meta_endpoints, and mart_meta_dimensions, emits api_v1.meta_sources / api_v1.meta_endpoints / api_v1.meta_dimensions wrappers, all five validation gates pass. Wrapper count went 10 → 13.

Validation

Counts assume the catalogue at the moment of running. Substitute the live count from select count(*) from marts._sources_manifest; for any "X rows" assertion below — the catalogue grows continuously.

cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt seed --select sources
uv run --env-file ../ingest/.env dbt run --select mart_meta_sources mart_meta_endpoints mart_meta_dimensions
./regenerate-api-v1.sh && ./apply-api-v1.sh

# meta_sources row count matches manifest seed
N=$(psql "$DATABASE_URL" -tAc 'select count(*) from marts._sources_manifest;')
curl -sS "http://api-atlas.localhost/meta_sources" | jq 'length'              # → $N

# Every row has the required fields
curl -sS "http://api-atlas.localhost/meta_sources" | jq '[.[] | select(.license == null or .publisher == null or .upstream_title == null or .periodicity == null or .eu_theme == null)] | length'  # → 0

# Every row has all five declared tag namespaces (provider/topic/geo/cadence/eu_theme)
curl -sS "http://api-atlas.localhost/meta_sources" | jq '[.[] | select(.tags | length < 5)] | length'   # → 0

# Filter by tag
curl -sS "http://api-atlas.localhost/meta_sources?tags=cs.{provider:ssb}" | jq 'length'  # > 0

# SSB sources have last_upstream_update_at populated
curl -sS "http://api-atlas.localhost/meta_sources?tags=cs.{provider:ssb}" | jq '[.[] | select(.last_upstream_update_at != null)] | length'  # > 0 after a full ingest cycle

# meta_endpoints row count: roughly N indicators marts + dims + facts + supply marts + api_v1 wrappers
curl -sS "http://api-atlas.localhost/meta_endpoints" | jq 'length'            # → ~80+ at 38 sources, grows linearly

# Endpoints inherit tags from sources
curl -sS "http://api-atlas.localhost/meta_endpoints?tags=cs.{topic:income}" | jq 'length'  # > 0

# meta_dimensions has one row per (source × upstream-dimension); ~198 rows at 38 sources
curl -sS "http://api-atlas.localhost/meta_dimensions" | jq 'length'           # > 0
curl -sS "http://api-atlas.localhost/meta_dimensions?source_id=eq.ssb-08764" | jq 'length'  # → 3 (Region, ContentsCode, Tid)

# Every dimension row has cardinality + example_values populated
curl -sS "http://api-atlas.localhost/meta_dimensions" | jq '[.[] | select(.cardinality == null or (.example_values | length) == 0)] | length'  # → 0

Done when

marts.meta_sources exists; each row has provider, topic, geo, cadence, eu_theme tags + latest_run_at from raw.ingest_runs. Row count = _sources_manifest row count.
marts.meta_endpoints exists with one row per public endpoint (skipping marts._* private seeds + private_marts.*); each has a layer: tag plus inherited source tags via the union rule.
marts.meta_dimensions exists with one row per (source_id × upstream-dim); each has hand-authored meaning/value_format/notes joined with computed cardinality and example_values from raw.*.
api_v1.meta_sources, api_v1.meta_endpoints, and api_v1.meta_dimensions wrap them; all PLAN-004 validation gates pass.
PostgREST returns the same row counts under Prefer: count=exact for the three meta_* endpoints.

Phase 4: Customer frontend `/data` rewrite + per-source detail

Replace the existing flat catalogue with the tag-filter sidebar layout. Add a per-source detail page.

Tasks

4.1 Rewrite atlas-frontend/src/app/data/page.tsx:
- ✅ Fetches api_v1.meta_endpoints directly via server component (fetch() with next: { revalidate: 60 }).
- ✅ Reads searchParams.tag (string | string[]) for active filters; URL state ?tag=topic:income&tag=geo:kommune&q=oslo.
- ✅ Filtering happens in node, not via PostgREST ?tags=cs.{...} (the original plan said server-side via PostgREST). Pivot rationale: meta_endpoints is 119 rows; node-side filter trivially fast and supports the full faceted-search semantics (AND across namespaces, OR within) without composing complex PostgREST or=() clauses. Pure helpers extracted to src/lib/catalog-filter.ts for testability.
- ✅ Two-column layout: sidebar (18rem fixed) with 6 namespace-grouped checkboxes + faceted-search counts (counts re-compute as filters apply); endpoint cards on the right.
- ✅ Cards show layer-coloured badge (api_v1 emerald, marts sky, raw zinc), table_name in mono, layer-stripped tag pills (clickable to add), and right-aligned "View as table" + "View spec" links to /data/{schema}/{table} and /data/{schema}/{table}/spec.
- ✅ Pure URL-driven, no client JS, no React state.
Bundled scope (extension to original plan, atlas Phase 4.1 PR): the /data/[endpoint]/ route was restructured to /data/[schema]/[table]/ so the table + spec viewers know which schema to send Accept-Profile for. Without this, marts/raw cards on the new catalog would 404 on click. The route is hard-cut (no back-compat redirect from /data/[endpoint]); old URLs aren't externally linked yet. The viewers' fetchSpec / fetchRows / fetchCount calls all gain { acceptProfile: schema }; the lib drops the header for api_v1 so default-schema requests stay header-less (matches the talk40 gotcha note above).
4.2 Update npm run api:types so the new meta_sources and meta_endpoints endpoint types appear in api-types.ts. Closed as no-op — Phase 4.1's catalog rewrite reads meta_endpoints dynamically (the typed api-types.ts union is no longer load-bearing for catalog discovery), so the regen is a routine maintenance step the contributor runs whenever they want IDE autocompletion freshened. No explicit task.
4.3 Per-source detail page shipped at atlas-frontend/src/app/data/sources/[source_id]/page.tsx. Three parallel live PostgREST fetches (meta_sources filtered by source_id, meta_endpoints full list, marts.lineage filtered by source_id via Accept-Profile: marts); 404 when source_id missing. Renders: source-metadata card (provider / license / periodicity / EU theme / upstream id / attribution), freshness card (last_ingested_at / last_upstream_update_at / total_runs / latest_row_count), tags as click-throughs to /data?tag=..., upstream link, raw-ingest-table card, derived-endpoints list with View as table + View spec per row. PR #79's sources index now links source ids to this detail page; the prior interim direct-to-raw link is preserved in the action row as Raw data →.
4.4 Homepage copy updated in PR #85: primary button reads "Browse all endpoints →" (was "Browse the data"), and a sibling "Sources →" button + caption distinguishes the two surfaces.
4.5 atlas-frontend/README.md refreshed: lists every route (homepage / /data / /data/[schema]/[table] / /data/[schema]/[table]/spec / /data/sources / /data/sources/[source_id]); documents acceptProfile on the lib helpers; adds a "Tag URLs" section with five bookmarkable example query strings; cross-links to PLAN-005 (initial split) and PLAN-007 (this PLAN's open-by-default rewrite).

Validation

cd atlas-frontend && npm run typecheck && npm run lint && npm run build       # all clean
npm run dev                                                                   # boots on :3001

# /data renders with the sidebar
curl -sS http://localhost:3001/data | grep -c "namespace-group\|filter-sidebar"  # > 0

# Tag filter URL works
curl -sS "http://localhost:3001/data?tag=provider:ssb" | grep -oE "ssb-[0-9]+" | sort -u  # 14 entries

# Per-source detail works
curl -sS -o /dev/null -w "%{http_code}\n" "http://localhost:3001/data/sources/ssb-08764"  # → 200
curl -sS -o /dev/null -w "%{http_code}\n" "http://localhost:3001/data/sources/notreal"    # → 404

Done when

/data renders the tag-filter sidebar + cards layout against the live API.
Every public endpoint visible (when no filter active); filtering by any tag combination works via URL params.
/data/sources/[source_id] renders for valid source IDs; 404 for invalid.
All PLAN-005 routes (/data/[endpoint] table viewer + /data/[endpoint]/spec viewer) carry forward unchanged.

Phase 5: Docs

Tasks

5.1 setup.md: new "Per-source manifest.yml" subsection after "Set up the ingest layer", documenting the 11 required top-level fields (incl. eu_theme, attribution), the four tags: namespaces with allowed values, and the dimensions: block with example. Plus the "Set up the customer frontend" section refreshed to list the actual routes shipped (/data, /data/sources, /data/sources/[source_id], /data/[schema]/[table]) instead of the stale "lists every api_v1.* endpoint" copy.
5.2 ingest-modules.md: the manifest workflow at step 4 expanded into 4 explicit sub-steps (bootstrap → fill → review-with-heuristic-warnings → commit). Names the topic-regex first-match-wins behaviour and the geo priority (kommune > fylke > bydel). Documents MANUAL_OVERRIDES for sources without an ## Upstream README table. Closes with the "manifest is human-authored after commit; ingest runs don't modify it" rule.
5.3 developers/index.md: new "Open by default" section explaining the three-schema posture (api_v1 / marts / raw), how to reach non-default schemas via Accept-Profile, the catalog-as-queryable-endpoint pattern, and a tag-filter URL pattern table (7 examples) external developers can use to deep-link filtered views. The customer-app section's description updated to reflect the multi-schema dispatch + lib's acceptProfile option.
5.4 atlas-data/ingest/src/sources/README.md: the auto-generated table from build_sources_seed.py --readme was already in place (BEGIN/END markers since Phase 2.9). Added a "Programmatic access" callout above the table pointing at api_v1.meta_sources as the canonical live view, with a curl example, so external developers see the API path alongside the offline Markdown table.

Validation

User reads the updated docs and confirms a new contributor could (a) author a manifest.yml for a new source and (b) understand the tag-driven catalogue without consulting this PLAN.

Done when

All four doc files reflect the new shape; no stale references to "the 9 endpoints" remain in contributor or developer surfaces.

Acceptance criteria

Files to modify

New (atlas-data):

atlas-data/ingest/src/sources/<id>/manifest.yml — one per source, currently 38 (auto-bootstrapped + auto-filled + hand-authored dimensions: block)
atlas-data/ingest/scripts/bootstrap-manifest.ts — provider-aware bootstrap (SSB PxWebAPI extractor + FHI extractor + fallback template); npm alias sources:bootstrap-manifest
atlas-data/ingest/scripts/fill-manifest-todos.ts — README-parsing TODO-filler (description, upstream_id, upstream_title, license, tags) with topic/geo regex rules + MANUAL_OVERRIDES for redcross-branches/frr; npm alias sources:fill-manifest-todos
atlas-data/ingest/src/lib/ingest_run.ts — shared recordIngestRun(sourceId, work) wrapper that owns start/finish + sql lifecycle; replaces the original "one-place change" plan
atlas-data/migrations/028_raw_ingest_runs_upstream_updated.sql — adds upstream_updated_at column
atlas-data/dbt/scripts/build_sources_seed.py — YAML scanner → dbt seed CSV (validates required fields, refuses TODO placeholders)
atlas-data/dbt/scripts/extract_lineage.py — manifest.json → lineage seed CSV
atlas-data/dbt/seeds/sources/_sources_manifest.csv — generated, committed
atlas-data/dbt/seeds/sources/_sources_dimensions.csv — 90-row editorial dimension reference (one row per source × dimension)
atlas-data/dbt/seeds/sources/eu_data_theme.csv — 13-row EU Data Theme lookup
atlas-data/dbt/seeds/sources/schema.yml — column descriptions + tests for all three seeds (incl. eu_theme→eu_data_theme.code + dimensions.source_id→_sources_manifest.source_id relationships)
atlas-data/dbt/seeds/sources/lineage.csv — generated, committed
atlas-data/dbt/models/marts/api/mart_meta_sources.sql
atlas-data/dbt/models/marts/api/mart_meta_endpoints.sql
atlas-data/dbt/models/marts/api/mart_meta_dimensions.sql
atlas-data/dbt/models/marts/api/schema.yml — descriptions for all three new models

Updated (atlas-data):

atlas-data/dbt/dbt_project.yml — seed config for seeds/sources/
atlas-data/ingest/src/sources/README.md — auto-generated from YAMLs (or pointer)
atlas-data/ingest/src/sources/<id>/index.ts — SSB modules updated to capture upstream updated field and pass to run-record helper (one shared code path)
atlas-data/ingest/src/lib/ingest-runs.ts (or wherever the run-record write lives) — accepts upstream_updated_at from caller, writes to the new column
Generated: atlas-data/dbt/api_v1_generated.sql + api_v1_state.json (PLAN-004 generator output)

Updated (atlas-frontend):

atlas-frontend/src/app/data/page.tsx — rewritten as tag-filter sidebar
atlas-frontend/src/app/data/sources/[source_id]/page.tsx — new per-source detail route
atlas-frontend/src/app/page.tsx — minor copy update
atlas-frontend/README.md — mention the tag-driven catalogue
Regenerated: atlas-frontend/src/lib/api-types.ts (via npm run api:types)

Updated docs:

website/docs/contributors/setup.md
website/docs/contributors/ingest-modules.md
website/docs/developers/index.md

UIS-side (cross-repo):

urbalurba-infrastructure/provision-host/uis/lib/configure-postgrest.sh — extend PGRST_DB_SCHEMAS + grants
urbalurba-infrastructure/website/docs/services/integration/postgrest.md — document the new schema-set defaults

Out of scope

The auth story for private_marts.* — covered by INVESTIGATE-private-atlas-deployments.md.
Column-level descriptions on raw.* tables — they remain undocumented; external consumers see meta_sources.upstream_url for canonical docs.
Lineage visualisation (mermaid graphs) — meta_endpoints carries the data; rendering a graph is a v2 polish.
Tag governance / curation tooling — manual for v1 (a quarterly review by whoever's stewarding source ingests). Automate later if it gets messy.
Search-relevance scoring across sources — keep the existing free-text search; tags are the structured navigation, search is the unstructured complement.

Captured here so the PLAN serves as project documentation, not just an aspirational checklist. Each entry is work that surfaced during the PLAN's execution but didn't fit a numbered task.

Catalogue growth + cloud-agent pipeline (parallel to Phase 2/3)

Catalogue grew 21 → 41 sources during the FHI / Bufdir / SSB-crime onboarding waves (2026-04-30 → 2026-05-06). 17 FHI sources from human-driven onboarding; 4 ssb-crime tables + bufdir-barnefattigdom + ssb-10826 from Cursor BG cloud-agent runs.
Cloud-agent runbook (AGENT-onboard-source.md + .cursor/rules/onboard-source.mdc) shipped via PR #36, refined via subsequent commits to support both queue-mode (issue-claim) and named-candidate-mode invocations.
npm run ingest:all catch-up script + raw.ingest_runs validation shipped via PR #80 — discovers every ingest:* script in package.json, runs sequentially, validates each via the recordIngestRun() lifecycle wrapper, prints per-source row count + duration. Closes the post-reset workflow's "ingest 36 sources by hand" gap.

Bufdir hardening track (PRs #67, #68, #69, #70, #71)

PR #67: split bufdir-barnefattigdom/index.ts into pure parse.ts + 29 golden-file tests; multi-tier ZIP-URL discovery (canonical → loose-date-format → loose-monitor → loose-bare with logged fallback tier).
PR #71: surrogate indicator_api_id migration — bf_zip_<24-hex> → bf_zip_ind_<N> (number-prefix); alias seed bufdir_indicator_alias.csv for renumbering events (Indikator 9 → 9a/9b split, Indikator 10 retired). Wraps via PLAN-004 generator as api_v1.bufdir_indicator_alias.
The lib/output.ts per-line streaming refactor (writeNdjson + new ndjsonStreamingWriter) shipped along the way to fix a V8 Invalid string length crash on bufdir's 395k-row output.

Cluster rebuild + setup workflow hardening (PRs #62, #65, #66)

Postgres + UIS cluster wiped + rebuilt 2026-05-05 (rancher-desktop reset). Surfaced gaps in the post-reset workflow:
- PR #62: setup.md gains the docker-psql fallback for hosts without libpq + an explicit "After a cluster reset / fresh start" recovery sequence.
- PR #66: Klass dim-spine ingests (ssb-klass-kommuner + ssb-klass-fylker) made mandatory in step 6 — without them, every relationships → dim_kommune test fails by definition (the dim builds but is empty).
- PR #65: dbt-osmosis canonicalisation fix for the YAML-style drift introduced by Cursor BG's bufdir descriptions.

Frontend scaffolding (PR #79; partial Phase 4 task 4.3)

/data/sources/page.tsx — sources index reading api_v1.meta_sources live. Same introspection-driven pattern as PLAN-005's /data catalog. Grouped by provider (pragmatic v1; full tag-filter sidebar is task 4.1).

Doc / process improvements (PRs #58, #59, #66, #76, #77)

PR #59: validated INVESTIGATE + PLAN, added mart_meta_dimensions to Phase 3 task list (it was missing despite the seed being built for it).
PR #76: setup.md --schemas api_v1,marts,raw flag added to ./uis configure postgrest line (paired with UIS PR #140's flag landing).
PR #77: moved extract_lineage.py's hardcoded multi-table override map into manifest.yml raw_tables: field. Closes a follow-up flagged in PR #73's outcome notes.
PR #78: INVESTIGATE for mart_meta_dimensions cardinality enrichment (deferred from Phase 3.4).

Open follow-ups (tracked outside PLAN-007)

mart_meta_dimensions cardinality + example_values + null_count — design in INVESTIGATE-mart-meta-dimensions-cardinality.md. Recommends Python extract script (analogous to extract_lineage.py) + optional column_name: field on each dim entry. Estimated half-day implementation once accepted.

Cross-references

INVESTIGATE-customer-frontend-data-display.md — the architectural commitments this PLAN executes.
PLAN-004-postgrest-api-v1-wrapper.md — the auto-generator wraps mart_meta_* into api_v1.meta_*. This PLAN reuses that pipeline unchanged.
PLAN-005-frontend-split-and-rebuild.md — built the introspection-driven /data catalogue this PLAN extends. The existing /data/[endpoint] table viewer + /data/[endpoint]/spec viewer are unaffected.
INVESTIGATE-frontend-data-access-architecture.md — established forkability + no-DB-role for the customer frontend; this PLAN preserves both.
atlas-data/ingest/src/sources/README.md — the legacy Markdown registry that becomes the structured manifest.yml set in Phase 2.
raw.ingest_runs — the run-history substrate meta_sources joins to.
talk.md — empty placeholder; Phase 1 opens a new round here.

Implementation notes

Phase 1 had cross-repo asynchrony — and the parallel sequencing worked as intended. Atlas Phase 2 + 3 + Phase 4 task 4.3 (sources index) all shipped against the existing single-schema PostgREST while UIS's PR was in flight. UIS Message 1 pushed back on the original "global default" framing in favour of an explicit per-app --schemas flag; atlas accepted in Message 3; UIS's PR #140 merged 2026-05-07 (single-day round-trip from atlas Message 4 validation to UIS Message 3 close-out). Total elapsed: 8 days from atlas Message 1 to Phase 1 close. Lesson for future cross-repo asks: validate against the contributor's local-image deployment before they push the PR — saved a CI round-trip here.
Tag inheritance — union, not intersection. Recorded inline at Phase 3.2. A mart_* derived from many sources picks up the union of source tags so filters like topic:income surface every mart that involves income data, not just marts where every source happens to be income-shaped. Don't re-litigate.
marts._* private seeds stay out of mart_meta_endpoints. _sources_manifest, _sources_dimensions, eu_data_theme, and the future lineage seed are dbt internals — they live in marts (so models can ref() them) but the underscore prefix marks them not-for-API. The auto-generator at regenerate-api-v1.sh already skips them by convention. mart_meta_endpoints's information_schema.tables query needs an explicit WHERE table_name NOT LIKE '\_%' filter to match.
Editorial vs computed in mart_meta_dimensions. The _sources_dimensions seed carries hand-authored editorial content (meaning, value_format, notes — what the dimension is). The mart joins it with introspection of raw.* (cardinality, example_values — what the dimension actually contains). Both are valuable; one without the other gives only half the picture. The seed is deliberately the only source of editorial truth — don't add computed fields to the seed itself, and don't add hand-authored fields to the introspection layer.
Don't over-engineer the lineage extraction. A flat (model_name, source_id) seed is enough — recursive walks of the dbt graph happen at extract time, not at query time. PostgREST consumers see meta_endpoints.tags as an already-flattened array.
Catalogue grows continuously. Every Cursor BG run lands a new source. The PLAN's validation gates expressed as live count(*) queries against _sources_manifest rather than fixed numbers — keeps the doc maintainable as the catalogue moves from 38 → 50 → 100+.

Status: Completed 2026-05-07 (Phases 1-5 shipped)​

What's done​

Goal (unchanged)​

Investigation​

Prerequisites​

Blocks​

The manifest.yml shape​

How a manifest.yml gets created — bootstrap once, human-authored after​

Phase 1: UIS-side schema exposure​

Tasks​

Validation​

Done when​

Phase 2: Per-source manifest.yml registry​

Tasks​

Validation​

Done when​

Phase 3: marts.meta_sources + marts.meta_endpoints + marts.meta_dimensions dbt models​

Tasks​

Validation​

Done when​

Phase 4: Customer frontend /data rewrite + per-source detail​

Tasks​

Validation​

Done when​

Phase 5: Docs​

Tasks​

Validation​

Done when​

Acceptance criteria​

Files to modify​

Out of scope​

Related work shipped during PLAN-007 execution​

Catalogue growth + cloud-agent pipeline (parallel to Phase 2/3)​

Bufdir hardening track (PRs #67, #68, #69, #70, #71)​

Cluster rebuild + setup workflow hardening (PRs #62, #65, #66)​

Frontend scaffolding (PR #79; partial Phase 4 task 4.3)​

Doc / process improvements (PRs #58, #59, #66, #76, #77)​

Open follow-ups (tracked outside PLAN-007)​

Cross-references​

Implementation notes​

Status: Completed 2026-05-07 (Phases 1-5 shipped)

What's done

Goal (unchanged)

Investigation

Prerequisites

Blocks

The `manifest.yml` shape

How a `manifest.yml` gets created — bootstrap once, human-authored after

Phase 1: UIS-side schema exposure

Tasks

Validation

Done when

Phase 2: Per-source `manifest.yml` registry

Tasks

Validation

Done when

Phase 3: `marts.meta_sources` + `marts.meta_endpoints` + `marts.meta_dimensions` dbt models

Tasks

Validation

Done when

Phase 4: Customer frontend `/data` rewrite + per-source detail

Tasks

Validation

Done when

Phase 5: Docs

Tasks

Validation

Done when

Acceptance criteria

Files to modify

Out of scope

Related work shipped during PLAN-007 execution

Catalogue growth + cloud-agent pipeline (parallel to Phase 2/3)

Bufdir hardening track (PRs #67, #68, #69, #70, #71)

Cluster rebuild + setup workflow hardening (PRs #62, #65, #66)

Frontend scaffolding (PR #79; partial Phase 4 task 4.3)

Doc / process improvements (PRs #58, #59, #66, #76, #77)

Open follow-ups (tracked outside PLAN-007)

Cross-references

Implementation notes