Skip to main content

The data journey — SSB 08764 end-to-end

If you've never seen Atlas's data pipeline before, this is the page to read first. It traces one source — SSB 08764 (children in low-income households) — from upstream to a user's browser, so the pieces have names and shapes when you read adding-a-source.md or ingest-modules.md.

SSB 08764 is the canonical example: a clean annual measurement source, ~10⁴ rows, no scraping, the same shape as ~16 other implemented sources. Where the journey wouldn't generalise (HTML scrapes, NGO supply data), there's a note at the end.

For the original design investigation that ratified this pattern (and the alternatives we considered), see INVESTIGATE-data-journey-pattern.md.


The journey at a glance

  [SSB PxWebAPI — table 08764]                upstream, not ours
│ annual release (~March)

[TypeScript: src/sources/ssb-08764/index.ts]
│ fetch + parse JSON-stat2 + upsert

[Postgres raw.ssb_08764] ← raw landing (verbatim)


[dbt model: indicators__ssb_08764] ← per-source passthrough
│ one of ~16 sibling indicator models

[dbt fact: fact_kommune_indicators] ← unioned cross-source mart
│ joined to dim_kommune, dim_fylke

[Postgres marts.fact_kommune_indicators] ← serving layer (table)
│ + mart_*_summary, mart_coverage_gap_* views

[PostgREST api.atlas.helpers.no] ← public API (forthcoming)
│ + Atlas's Next.js dogfood (today: direct SQL)

[Next.js page: /coverage-gap/barnefattigdom]


[User's browser] choropleth + chapters layer

Stage 1 — The source (upstream, not ours)

SSB publishes table 08764 annually. Typical release window: late February / early March, covering the prior calendar year.

  • Endpoint: https://data.ssb.no/api/pxwebapi/v2/tables/08764
  • Format: JSON-stat2 (dimension-oriented; standard across SSB tables)
  • Auth: none
  • Rate limits: 30 requests/minute/IP, 800 000 cells/request — plenty for one full table pull
  • Release calendar: SSB has a metadata endpoint per table announcing next update; we can poll it

Nothing we own; nothing we control. Everything downstream has to be resilient to the source being slow, stale, or temporarily broken.

Stage 2 — TypeScript ingestion

The ingest module lives at atlas-data/ingest/src/sources/ssb-08764/index.ts. The full template is in ingest-modules.md; the shape is:

// src/sources/ssb-08764/index.ts (sketch)
import { fetchPxWebTable, parseJsonStat2 } from "../../lib/pxweb";
import { writeRawRows } from "../../lib/postgres";

export const SOURCE_ID = "ssb-08764";

export async function run() {
const data = await fetchPxWebTable("08764", {
Region: "*", // all kommuner, fylker, nasjon
Tid: "*", // all years
});

const rows = parseJsonStat2(data).map(row => ({
region_code: row.dimensions.Region,
year: Number(row.dimensions.Tid),
contents_code: row.dimensions.ContentsCode,
contents_label: row.dimensions.ContentsCodeLabel,
value: row.value,
status: row.status ?? null,
loaded_at: new Date(),
}));

await writeRawRows("raw.ssb_08764", rows, {
conflictKey: ["region_code", "year", "contents_code"],
});

return { rowCount: rows.length, latestYear: Math.max(...rows.map(r => r.year)) };
}

Key points:

  • Shared utilities (pxweb, postgres, etc.) live in src/lib/. Source modules should never reimplement them.
  • No reshaping — column names match upstream. Renaming happens in dbt, not here.
  • Idempotent — re-running against an already-loaded year is safe (upsert on the natural key).
  • Duration: typically 3–6 seconds for a full pull (network-dominated).

Run locally:

cd atlas-data/ingest
npm run ingest:ssb-08764

Stage 3 — Raw landing in Postgres

Schema and table created once via a migration (atlas-data/migrations/):

create schema if not exists raw;

create table if not exists raw.ssb_08764 (
region_code text not null,
year int not null,
contents_code text not null,
contents_label text,
value numeric,
status text,
loaded_at timestamptz not null default now(),
primary key (region_code, year, contents_code)
);

comment on table raw.ssb_08764 is
'SSB 08764 — Persons under 18 in low-income households (EU/OECD scale), per region, annual.';

The raw.* schema is the landing layer — verbatim upstream shape. Atlas does not expose raw.* to any consumer; reshaping happens in dbt.

Stage 4 — dbt transformation

dbt models live under atlas-data/dbt/models/. Three transformation steps:

(a) Per-source model

-- dbt/models/indicators/indicators__ssb_08764.sql
{{ config(materialized='table', schema='marts') }}

select
'ssb-08764' as source_id, -- hard-coded literal
region_code,
case when length(region_code) = 4 then region_code end as kommune_nr,
year,
contents_code,
contents_label,
value,
status,
loaded_at as updated_at
from {{ source('raw', 'ssb_08764') }}

Every per-source model follows the same shape — source_id literal first, canonical column names per naming-conventions.md, loaded_at renamed to updated_at. ~20 lines per source.

(b) Cross-source fact

models/marts/fact_kommune_indicators.sql unions every indicators__* model and joins to the conformed dimensions:

{{ config(materialized='table', schema='marts') }}

with all_indicators as (
{{ dbt_utils.union_relations(refs([
ref('indicators__ssb_08764'),
ref('indicators__ssb_06913'),
ref('indicators__fhi_bor_alene'),
-- ... one per implemented source
])) }}
)
select
i.source_id, i.kommune_nr, k.kommune_name, k.fylke_nr, f.fylke_name,
i.year, i.contents_code, i.contents_label,
i.value, i.status,
k.is_active as kommune_is_active,
i.updated_at
from all_indicators i
join {{ ref('dim_kommune') }} k using (kommune_nr)
left join {{ ref('dim_fylke') }} f using (fylke_nr)
where i.kommune_nr is not null

(c) API-shaped views

Frontend queries against fact_kommune_indicators would be too low-level for a public API. So Atlas adds purpose-built views under models/marts/api/ — one per public endpoint:

  • mart_indicator_summary — backs /data (per-indicator latest-year summary)
  • mart_coverage_gap_barnefattigdom — backs /coverage-gap/barnefattigdom (this map)
  • mart_kommune_local_chapters — backs /kommuner/[kommune_nr]
  • ...

These are the API contract. Each has full descriptions on every column (enforced by check-osmosis) so PostgREST projects them as documented OpenAPI endpoints.

For the full mart-views story, see PLAN-001.

Stage 5 — Tests catch anomalies

dbt tests run on every change:

  • not_null on key columns (kommune_nr, year, contents_code)
  • accepted_range on years (e.g. 2000 ≤ year ≤ 2040)
  • accepted_values on bounded enums (contents_code in (...))
  • unique_combination_of_columns on the natural key
  • relationships from kommune_nrdim_kommune (FK enforcement)

Run them:

cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt test --select mart_coverage_gap_barnefattigdom+

Test failures block merge. See testing.md for the full local-testing workflow.

Stage 6 — Serving layer in Postgres

After dbt, marts.mart_coverage_gap_barnefattigdom looks like:

 kommune_nr | kommune_name | fylke_name | year | value_pct | personer
-----------+--------------+------------+------+-----------+----------
5601 | Alta | Finnmark | 2023 | 11.2 | 487
0301 | Oslo | Oslo | 2023 | 12.8 | 28402
...

Indexed on kommune_nr. Queries against it are millisecond-scale.

This is the schema the Next.js frontend reads from today. marts.* is the internal data layer — the stable contract internal Atlas code (frontend, ad-hoc analysts) builds on. The frontend has no knowledge of raw.*, no knowledge of PxWebAPI, no knowledge of dbt.

Stage 7 — PostgREST projects the public API via api_v1.*

PostgREST (owned + deployed by UIS) sits in front of an auto-generated api_v1.* schema — wrapper views over marts.mart_* (one per models/marts/api/ model). The wrapper layer is the external public contract; marts.* stays internal. PostgREST projects:

  • One GET endpoint per api_v1.* view
  • Filtering via ?col=eq.X query params
  • An auto-generated Swagger 2.0 spec at api.atlas.helpers.no/docs, with column descriptions sourced verbatim from schema.yml (propagated through dbt → COMMENT ON COLUMN on marts.* → copied onto api_v1.* by the generator)

So the descriptions you write in schema.yml are the docs an external developer reads. See api-v1.md for the wrapper layer + generator, and dbt-osmosis.md § why Atlas relies on it for the description propagation.

Stage 8 — Two frontends, two access patterns

Atlas has two Next.js apps that each take a different path through the data layer:

Contributor frontend (atlas-contributor-frontend/) — direct SQL against marts.* for ingestion verification, FK integrity checks, raw counts. Dev/staging only. Example query:

// atlas-contributor-frontend/app/coverage-gap/barnefattigdom/page.tsx
const rows = await sql<KommuneRow[]>`
select kommune_nr, kommune_name, fylke_name, year, value_pct, personer
from marts.mart_coverage_gap_barnefattigdom
order by kommune_nr
`;

Customer frontend (atlas-frontend/) — fetches from api-atlas.helpers.no (PostgREST against api_v1.*). No DB role; consumes the same public API external developers use. This is the dogfood pattern. Example query:

// atlas-frontend/app/coverage-gap/barnefattigdom/page.tsx (Phase 5 of PLAN-005)
import { fetchRows } from "@/lib/api";
const rows = await fetchRows(
"coverage_gap_barnefattigdom",
"?order=value_pct.desc&limit=10",
);

Both apps render server components against the same underlying mart data; the contributor app sees it as SQL rows and the customer app sees it as typed API responses. The customer app is structured to be forkable — external developers clone it, change NEXT_PUBLIC_API_URL, and have a working starting point for their own UI on Atlas's data.

Stage 9 — Render in the browser

/coverage-gap/barnefattigdom is a React Server Component. It renders:

  • Choropleth base layer — 356 kommune polygons from a static Kartverket GeoJSON, fill colour mapped from each polygon's value_pct
  • Tooltip — kommune name + value + count
  • Chapter overlay (forthcoming) — Red Cross / Folkehjelp / etc. local chapters that operate in each kommune

End of journey. The user sees a map.


What this generalises to

The same shape applies to every API-based measurement source:

StageSameDiffers
1 — upstreamAPI-basedendpoint, auth, format
2 — ingestTypeScript module + shared libsdimension parsing
3 — raw landingraw.<source> tablecolumn set
4a — per-source dbtindicators__<source>.sqlcolumn renaming, kommune_nr derivation
4b — cross-source factfact_kommune_indicatorsunioned in
4c — api viewmart_* viewper-feature shape
5 — testssame batteryper-source key columns
6–9 — downstreamidenticaln/a

For most new sources, the workflow in adding-a-source.md walks you through the per-source pieces in 11 steps.

Where this doesn't generalise

  • HTML scraping sources (Folkehjelp and other NGOs without an API) — Stage 2 has an extended folder layout (discover.ts, parse.ts, __tests__/fixtures/), Stage 3 raw tables have additional mandatory columns (url, record_hash, is_active). See ingest-modules.md § scraping convention.
  • NGO supply data (chapters, activities, coverage) — different fact / dim shape (fact_chapter_activities, dim_chapter, dim_activity); different per-NGO supply staging models (supply__redcross_*, supply__frr_*). The journey shape is the same; just different facts/dims.
  • Brreg legal-entity metadata — handled by the generic refresh:brreg-enheter flow; you don't add a new ingest source per NGO. See adding-a-source.md § prerequisites.

Cross-references