Investigate: Multi-NGO supply data-model extensions
IMPLEMENTATION RULES: Before implementing any plan from this investigation, read and follow:
- WORKFLOW.md — The implementation process
- PLANS.md — Plan structure and best practices
Status: Backlog
Goal: Capture three data-model extensions that surfaced during the Folkehjelp investigation but are multi-NGO concerns — dim_chapter.source_url, dim_chapter.chapter_subtype, and the chapter_kommune_coverage link table that lets chapter_level='regional' rows declare which kommuner they serve. All three apply to every NGO Atlas ingests, not only Folkehjelp.
Last Updated: 2026-04-24
Scope
In scope:
- Schema design and rationale for the three extensions.
- Population strategy (each NGO populates from its own ingest).
- Retroactive backfill for Red Cross.
Out of scope:
- Per-NGO scrape implementation — each NGO has its own investigation + PLANs.
- Generic scraping infrastructure — see
INVESTIGATE-ngo-scraping-infrastructure.md. - Coverage-gap mart consumption — designed so it can opt in later without a schema change.
Why this exists
The Folkehjelp investigation surfaced three changes to the supply data model that were originally framed as NF-specific but are actually generic:
source_url— once we scrape, every chapter has a public URL we want to deep-link to. Red Cross has it too (branchUrlin the API dump); we just didn't store it.chapter_subtype— non-geographic chapters (Solidaritetsungdom, hospital-anchored, student-anchored) need to be distinguishable from geographic chapters that are missing kommune data. NF has many; Red Cross's RØFF entities are similar; future NGOs likely have more.chapter_kommune_coverage— every multi-level NGO defines its own region structure (Red Cross 19 distrikter, NF 14 regions, N.K.S. independent foreninger, Frelsesarmeen 5 divisioner). Region boundaries don't align with SSB fylker. The data model knows the regional row exists but not which kommuner it serves.
Each is small individually; bundling them in one investigation keeps the rationale together and lets one PLAN ship them.
Section A — dim_chapter.source_url
A.1 Column
-- dim_chapter is a dbt-materialised table; in practice the column is added by
-- updating dim_chapter.sql's SELECT and re-running dbt. SQL shown here for
-- design intent.
ALTER TABLE marts.dim_chapter ADD COLUMN source_url TEXT;
NULL when:
- The chapter is synthesised (e.g., a regional rollup with no public page).
- The NGO doesn't expose per-chapter URLs.
Otherwise: full canonical URL of the chapter's public page on the NGO's website.
A.2 Population per NGO
dim_chapter.source_url is a marts column. Its raw-side source depends on whether the NGO ingest is scraper-sourced or API-sourced:
| NGO | Ingest type | Raw column | Notes |
|---|---|---|---|
| Red Cross | API-sourced (Red Cross JSON dump) | raw.redcross_branches.source_url (added by this PLAN; populated from branchUrl API field) | Scraping infra §C.5 explicitly does not apply to API-sourced raw tables; column name is local choice. |
| Folkehjelp | Scraper-sourced | raw.folkehjelp_chapters.url (the §C.5 mandatory column) | The §C.5 url column is the public chapter page URL — same data we want in dim_chapter.source_url. Supply staging propagates url → source_url. |
| Future NGO | Either | Scraper: raw.<source>_*.url (§C.5). API: per ingest convention. | Supply staging always lands the value in dim_chapter.source_url. |
A.3 Red Cross retroactive backfill
The current raw.redcross_branches doesn't store branchUrl. Steps:
- Add
source_url TEXTcolumn toraw.redcross_branches(migration). Naming choice:source_url(noturl) to make the API-vs-scraper distinction explicit; §C.5 doesn't apply because Red Cross is API-sourced. - Update
ingest/src/sources/redcross-branches/index.tsto write the field. - Re-run
npm run ingest:redcross-branchesto backfill. - Update
supply__redcross_branches.sqlto propagate the column todim_chapter.source_url.
Section B — dim_chapter.chapter_subtype
B.1 Column
ALTER TABLE marts.dim_chapter ADD COLUMN chapter_subtype TEXT;
NULL by default (normal geographic chapter). Filled when the chapter is non-geographic or has a structurally distinct role.
B.2 Initial vocabulary
| Value | Meaning | Example |
|---|---|---|
youth-political | Politically-engaged youth wing | NF Solidaritetsungdom |
youth-health | First-aid / rescue youth wing | NF Sanitetsungdom (when separately registered), Red Cross RØFF (when separately registered) |
student | University-anchored | NF Studentgruppe Blindern, Bislett, Kristiania |
hospital | Hospital-anchored | NF Sanitet Haukeland |
umbrella | National-level entity not tied to a kommune | NF Sentralt |
Frontend uses this to filter chapter-finder maps to "geographic chapters only" (chapter_subtype IS NULL) when showing kommune-anchored supply.
B.3 Constraint policy — [Q1] open
For v1: free-text TEXT, no accepted_values constraint. Vocabulary will stabilise as more NGOs ingest. Promote to enum-via-accepted_values test once 3+ NGOs are using subtypes consistently.
B.4 Why a column instead of inferring
We could infer "non-geographic" from kommune_nr IS NULL AND chapter_level = 'local'. Rejected: that pattern also matches "data missing" (chapter exists in source but we couldn't resolve a kommune), which is a different signal. Explicit subtype is a deliberate "this chapter is intentionally not kommune-anchored".
Section C — chapter_kommune_coverage link table
C.1 The problem
Every multi-level NGO defines its own region structure that does not necessarily align with SSB's 15 fylker:
- Red Cross: 19 distrikter (boundaries pre-date the 2020 fylke reform).
- Norsk Folkehjelp: 14 named regions (close to fylker but with consolidations).
- N.K.S.: federation of independent foreninger; loose regional structure.
- Frelsesarmeen: 5 divisioner that don't map to fylker at all.
Today's dim_chapter carries these as chapter_level='regional' rows with kommune_nr=NULL. The data model knows the row exists but does not know which kommuner it serves. So queries like "which chapters serve Stange?" can use the local-level rows directly, but cannot include the regional-level supply (Distrikt Innlandet's office in Hamar) without additional structure.
C.2 Proposed schema
-- New marts model: marts.chapter_kommune_coverage
-- One row per (regional_chapter, kommune_it_covers).
-- Local-level chapters do NOT need rows here — their kommune_nr lives on dim_chapter.
CREATE TABLE marts.chapter_kommune_coverage (
chapter_id TEXT NOT NULL REFERENCES dim_chapter(chapter_id) ON DELETE CASCADE,
kommune_nr TEXT NOT NULL REFERENCES dim_kommune(kommune_nr),
source TEXT NOT NULL, -- 'declared' | 'inferred'
updated_at TIMESTAMPTZ NOT NULL,
PRIMARY KEY (chapter_id, kommune_nr)
);
C.3 source distinguishes
declared— the NGO publishes the region's territorial coverage explicitly. Authoritative.inferred— derived from membership: "this region's lokallag are in kommuner X, Y, Z, so the region covers at least these". Cheap to compute, may under-report (region might cover kommuner with no lokallag).
C.4 Population per NGO
Built by per-NGO supply__<ngo>_chapter_kommune_coverage.sql staging models. UNION ALL into the marts model. For v1, most rows are inferred (rolled up from parent_chapter_id → child kommune_nr); declared rows wait for a future scrape of NGO-published region pages.
C.5 Retroactive backfill for Red Cross
Backfill inferred rows by joining dim_chapter to itself: for each regional chapter, find children where parent_chapter_id matches; collect their kommune_nr values. Same staging-model pattern that Folkehjelp will use.
C.6 Query implications
Coverage-gap queries change shape:
-- Without coverage: only local-level chapters count
select kommune_nr from dim_chapter where chapter_level='local' and is_active
-- With coverage: regional rows also count, via the link table
select kommune_nr from dim_chapter where chapter_level='local' and is_active
union
select kommune_nr from chapter_kommune_coverage cc
join dim_chapter dc on cc.chapter_id = dc.chapter_id
where dc.chapter_level='regional' and dc.is_active
Per the [Q48] resolution in INVESTIGATE-ngo-supply-data-model.md, v1 Coverage-gap still uses local-only. The link table is built so Coverage-gap can opt in to "regional presence counts" later without a schema change.
Decisions resolved
These decisions are inherited from the Folkehjelp investigation conversation (2026-04-23) and recorded here as the canonical home:
[Q2]Adddim_chapter.source_urlcolumn. Resolved 2026-04-23.[Q3]Adddim_chapter.chapter_subtypecolumn. Resolved 2026-04-23.[Q4]Addmarts.chapter_kommune_coveragelink table withsource IN ('declared', 'inferred'). Resolved 2026-04-23.[Q5]Populate from membership ('inferred') for v1; declared rows wait for future scrapes. Resolved 2026-04-23.[Q6]Retroactive backfill for Red Cross —source_urlfrombranchUrl, coverage fromparent_chapter_idrollup. Resolved 2026-04-23.
Open Questions
- [Q1]
chapter_subtypeaccepted_valuesconstraint — wait until vocabulary stabilises across 3+ NGOs. - [Q7] Should the staging models for coverage live under
dbt/models/supply/(with the per-NGO staging) ordbt/models/dimensions/(alongsidedim_chapterbecause the table feels dimensional)? Recommendsupply/to keep all per-source staging together. - [Q8] Do we want a
coverage_rolecolumn too — "primary chapter for this kommune" vs "secondary"? Probably yes once we ingest 3+ NGOs and start asking "who's the main provider in kommune X". Defer.
Next Steps
- PLAN-001-multi-ngo-supply-model-extensions.md (~3–4h)
- Add
source_url TEXTtoraw.redcross_branches(migration). - Update
ingest/src/sources/redcross-branches/index.tsto writebranchUrl. - Re-run
npm run ingest:redcross-branchesto backfill. - Add
source_urlandchapter_subtypecolumns todim_chapter.sqlSELECT anddim_chapterschema.yml. - Update
supply__redcross_branches.sqlto propagatesource_url. - New dbt model
marts.chapter_kommune_coverage+ schema.yml + tests. - New staging
supply__redcross_chapter_kommune_coverage.sql(Red Cross retro coverage). - Update
naming-conventions.mdwithsource_url,chapter_subtype,chapter_kommune_coverage. dbt run && dbt test.
- Add
This PLAN is a prerequisite for the Folkehjelp scrape PLAN, which expects source_url and chapter_subtype to exist on dim_chapter and writes its own coverage rows alongside chapter rows.
Files this investigation will produce
New tables:
marts.chapter_kommune_coverage
New columns on existing tables:
dim_chapter.source_urldim_chapter.chapter_subtyperaw.redcross_branches.source_url(raw-side change for Red Cross backfill)
New dbt models:
marts.chapter_kommune_coverage+ schema.yml + testsmarts.supply__redcross_chapter_kommune_coverage(Red Cross-specific staging)
Documentation:
- Extend
docs/stack/naming-conventions.md:source_url,chapter_subtype,chapter_kommune_coverage. - Auto-regenerate
docs/stack/erd.md.
Companion investigations
INVESTIGATE-ngo-scraping-infrastructure.md— generic scraping toolkit. Of particular relevance: §C.5 (mandatory raw-table columns includingurl) — for scraper-sourced NGOs, the public-facing URL surfaces asraw.<source>_*.urland propagates from there todim_chapter.source_url(see A.2). API-sourced NGOs (Red Cross today) are exempt and use a per-ingest column name likesource_urlinstead.INVESTIGATE-folkehjelp-supply.md— first new consumer of these extensions.