Plan 002: Apply hybrid decoding across indicator models
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md - The implementation process
- PLANS.md - Plan structure and best practices
Status: Completed
Goal: Wire the five marts.ref_* seeds from PLAN-001 into the indicator models, parse the structured codes (age, age_group, period) into integer min/max columns, and unify the FHI cryptic column names (kjonn_code, sex_code) onto the canonical sex vocabulary at the marts boundary.
Last Updated: 2026-04-22 Completed: 2026-04-22
Investigation: INVESTIGATE-code-label-mapping.md
Prerequisites: PLAN-001 (seeds must exist as marts.ref_* tables) — completed 2026-04-22.
Blocks: PLAN-003 (tests + naming-conventions update) cannot start until this is done.
Priority: Medium
Overview
PLAN-001 delivered the seeds. This plan applies the hybrid decoding strategy from INVESTIGATE-code-label-mapping.md — CASE for tiny universal enums, left join ref_* for domain enums, and integer parsing for structured codes — across the 9 indicator models that surface coded fields. The other 8 indicator models (region+year+contents only) need no changes.
After this plan, every indicator row in marts.indicators__* carries human-readable labels alongside the raw codes. The Next.js /data explorer and downstream consumers see family_type_label_no = "Par uten barn" instead of family_type = "006".
Scope by model (probed from models/indicators/*.sql on 2026-04-22)
| Model | Current coded columns | Treatment |
|---|---|---|
indicators__ssb_06083 | family_type text | left join ref_ssb_family_type → add _label_no/_en |
indicators__ssb_06944 | household_type text | left join ref_ssb_household_type → add _label_no/_en |
indicators__ssb_07459 | sex (CASE done), age text | add age_int (null for 105+), age_min int |
indicators__ssb_09429 | sex (CASE done), education_level text | left join ref_ssb_nivaa → add _label_no/_en |
indicators__ssb_12944 | age_group (00-17 form), period (2022-2024 form) | add age_group_min/_max, period_start_year/_end_year ints |
indicators__fhi_bor_alene | age_group (16_120 form), period | add age_group_min/_max, period_*_year ints |
indicators__fhi_mobbing | kjonn_code, grade (readable), period | rename kjonn_code → sex + CASE; add period_*_year ints |
indicators__fhi_trangbodd | age_group, education_level (FHI scheme), housing_status (readable), period | left join ref_fhi_utdann; add age_group_* + period_*_year ints |
indicators__fhi_vgs_gjennomforing | kjonn_code → sex_code, parents_education (FHI), immigration_category (FHI), period | rename sex_code → sex + CASE; left joins for ref_fhi_utdann and ref_fhi_innvkat; period_*_year ints |
The other 8 indicator models (indicators__ssb_06913, _06947, _08764, _12063, _12131, _12132, _12292, _13995) carry no coded dims that need decoding — leave them alone.
Conventions adopted in this plan
- Label columns:
<field>_label_noand<field>_label_en(separate columns per language). Cheap; matches the seed schema; downstream picks one. - Parsed integer columns:
<field>_min,<field>_maxfor ranges (age_group, SSB single-yearage);period_start_year,period_end_yearfor periods. Allint, nullable. - FHI column renames at marts boundary:
kjonn_codeandsex_codebecomesex(text,male/female/all) — matching SSB indicator models that already CASE this. fact_kommune_indicatorsis not changed by this plan beyond what the synthetic-contents patterns need (see Phase 2). The fact stays narrow (region × year × contents). Per-source labels live onindicators__*.- Macros for repeated logic. Period parsing appears in 5+ models, age-range parsing in 4+. One small macro file keeps the SQL DRY without proliferating helpers.
Phase 1: Macros for repeated logic — DONE
Tasks
- 1.1 Created
atlas-data/dbt/macros/parse_codes.sqlwithdecode_sex,period_start_year,period_end_year,age_range_min(col, sep),age_range_max(col, sep). Period macros normalise both_and-separators in one path. Age macros also handle the067+open-ended form (min=67, max=null). ✓ - 1.2
dbt parseruns clean. ✓
Validation
cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt parse
User confirms parse runs clean (no Jinja errors).
Phase 2: Apply to the 9 indicator models — DONE
Tasks
- 2.1
indicators__ssb_06083— left joinref_ssb_family_type;family_type_label_no/_enadded. ✓ - 2.2
indicators__ssb_06944— left joinref_ssb_household_type;household_type_label_no/_enadded. ✓ - 2.3
indicators__ssb_07459—decode_sexmacro applied;age_int(null for105+) andage_min(105 for105+) added. ✓ - 2.4
indicators__ssb_09429—decode_sexmacro applied; left joinref_ssb_nivaa;education_level_label_no/_enadded. ✓ - 2.5
indicators__ssb_12944—age_group_min/_max(separator-) andperiod_start_year/_end_yearadded.999A→ null/null verified;00-17→ 0/17;067+→ 67/null. ✓ - 2.6
indicators__fhi_bor_alene—age_group_min/_max(separator_) andperiod_start_year/_end_yearadded. ✓ - 2.7
indicators__fhi_mobbing—kjonn_codedropped;sex(viadecode_sex) replaces it;period_start_year/_end_yearadded. ✓ - 2.8
indicators__fhi_trangbodd—utdann_codealiased toparents_education, joined toref_fhi_utdann(Norwegian only);age_group_min/_max,period_*_yearadded. ✓ - 2.9
indicators__fhi_vgs_gjennomforing—kjonn_code→sexviadecode_sex(verifiedmale/female/all8180 rows each);parents_education+immigration_categoryjoined to their seeds;period_*_yearadded. ✓
dbt run --select indicators rebuilt all 17 models clean (4.7s). 9 with new columns, 8 untouched. Spot checks all pass.
Validation
cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt run --select indicators
uv run --env-file ../ingest/.env dbt show --inline "select family_type, family_type_label_no, family_type_label_en from marts.indicators__ssb_06083 where family_type = '006' limit 1"
uv run --env-file ../ingest/.env dbt show --inline "select age_group, age_group_min, age_group_max from marts.indicators__fhi_bor_alene where age_group in ('0_120', '16_120') limit 4"
uv run --env-file ../ingest/.env dbt show --inline "select sex, count(*) from marts.indicators__fhi_vgs_gjennomforing group by sex"
User confirms: family-type label resolves to "Par uten barn"; age_group_min/max are 0/120 and 16/120; FHI vgs sex column is now male/female/all (no 0/1/2).
Phase 3: Update fact_kommune_indicators synthetic-contents patterns — DONE
Tasks
- 3.1 ssb_06083 CTE: synthetic contents_label now uses
family_type_label_no. Verified viadbt show— labels read "Familier (Enpersonfamilie)" etc. ✓ - 3.2 ssb_09429 CTE: synthetic contents_label uses
education_level_label_no. ✓ - 3.3 fhi_mobbing —
gradealready readable, no change. ✓ - 3.4 Discovered + fixed downstream stale references: the fhi_vgs CTE filtered on
sex_code = '0'(nowsex = 'all'); the fhi_trangbodd CTE filtered oneducation_level = '0'(nowparents_education = '0'per the column rename in 2.8). Fact rebuilt clean with 135 698 rows. ✓
Validation
cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt run --select fact_kommune_indicators
uv run --env-file ../ingest/.env dbt show --inline "select source_id, contents_code, contents_label from marts.fact_kommune_indicators where source_id in ('ssb-06083','ssb-09429') order by 1, 2 limit 10"
User confirms contents_label now contains the Norwegian label, not the code.
Phase 4: Full rebuild + smoke test — DONE
Tasks
- 4.1
dbt run— 20/20 models built clean in 3.7s. ✓ - 4.2
dbt test— 290 PASS, 0 ERROR, 15 WARN (same warn count as onmain; the warns are pre-existing relationship warns for historical kommune codes inssb_06913/12131/12292/13995). Three stale schema.yml references caught and fixed:fhi_mobbingandfhi_vgs_gjennomforingunique-key combinations referred tosex_code(nowsex);fhi_trangboddnot_null + unique-key referred toeducation_level(nowparents_education). ✓ - 4.3
/dataexplorer (already-running dev server on:4000) returns HTTP 200 and the HTML carries the human labels: all 9 family types render asFamilier (Enpersonfamilie),Familier (Par uten barn),Familier (Andre familier), …; all 7 Nivaa levels render asPersoner 16 år og over (Utdanningsnivå i alt),… (Universitets- og høgskolenivå, kort), etc. ✓
Cosmetic note (out of scope)
The fact's synthetic-contents pattern wraps labels in ' (' || label || ')'. When labels themselves contain parens (e.g. "Par med små barn (yngste barn 0-5 år)"), this produces unbalanced parens in the rendered string: Familier (Par med små barn (yngste barn 0-5 år). Data is correct; this is a fact-layer formatting choice that worked fine for opaque codes and only became visible once labels got real text. Not in PLAN-002 scope; can be a one-liner cleanup later (e.g. drop the outer parens; or use a separator like ' — ').
Validation
cd atlas-data/dbt
uv run --env-file ../ingest/.env dbt run
uv run --env-file ../ingest/.env dbt test
cd ../.. # repo root
npm run dev # http://localhost:4000
# Open /data, click into ssb-06083 or ssb-09429 — confirm labels render.
User confirms dbt test passes (zero errors, warns excluded as today) AND the labels appear in the browser.
Acceptance Criteria
-
dbt runbuilds all 17 indicator models +fact_kommune_indicatorsclean. -
dbt testpasses (no new errors vs. main). - The 5 indicator models with seed joins (
ssb_06083,ssb_06944,ssb_09429,fhi_trangbodd,fhi_vgs_gjennomforing) expose<field>_label_nocolumns; SSB-sourced ones also expose_label_en. NULL labels only appear for codes that aren't in the corresponding seed. - The 5 indicator models with parsed-range columns expose integer
_min/_max(or_start_year/_end_year) alongside the original text code. - FHI
kjonn_code/sex_codecolumns are gone —sex(male/female/all) replaces them. -
fact_kommune_indicators.contents_labelfor ssb-06083 and ssb-09429 contains Norwegian labels, not codes. -
/dataexplorer in Next.js shows the human labels in the contents_label column for both sources.
Implementation Notes
- Why parse in dbt, not at ingest. Adding columns to
raw.*would mean migrations + re-ingest. Parsing in indicator models is reversible, requires no schema change inraw, and keepsrawfaithful to upstream. - Why separate
_label_no/_label_encolumns. Doubles the column count for label-bearing models — cheap. Lets a future English UI pick_label_enwithout re-running dbt. Matches the seed schema. - Why
decode_sexmacro instead of inline CASE. Two SSB models already inline this; FHI variants will newly do so. A macro avoids drift if SSB ever introduces a fourth code (e.g.9for unknown, hypothetical). - FHI UTDANN aliasing as
parents_education. Infhi_trangboddandfhi_vgs_gjennomforing, UTDANN is the parent's education level (children's overcrowding / VGS completion stratified by parental education). Calling iteducation_levelwould suggest the subject's own — misleading. The seed file is stillref_fhi_utdann.csv(matches the upstream dimension code); the column in indicators isparents_education. - What this plan does not do. No new tests (PLAN-003), no
naming-conventions.mdupdates (PLAN-003), no changes to non-dim-bearing indicator models, nofact_kommune_indicatorsshape change beyond label substitution in synthetic patterns.
Files to Modify
New:
atlas-data/dbt/macros/parse_codes.sql
Edit (9 indicator models):
atlas-data/dbt/models/indicators/indicators__ssb_06083.sqlatlas-data/dbt/models/indicators/indicators__ssb_06944.sqlatlas-data/dbt/models/indicators/indicators__ssb_07459.sqlatlas-data/dbt/models/indicators/indicators__ssb_09429.sqlatlas-data/dbt/models/indicators/indicators__ssb_12944.sqlatlas-data/dbt/models/indicators/indicators__fhi_bor_alene.sqlatlas-data/dbt/models/indicators/indicators__fhi_mobbing.sqlatlas-data/dbt/models/indicators/indicators__fhi_trangbodd.sqlatlas-data/dbt/models/indicators/indicators__fhi_vgs_gjennomforing.sql
Edit (1 mart):
atlas-data/dbt/models/marts/fact_kommune_indicators.sql