name: database-design version: 3.1.0 description: > Design, review, and optimize relational and non-relational database schemas. Use when: designing schemas, modeling data, choosing DB technologies, adding indexes, optimizing queries, planning migrations, designing multi-tenancy, or reviewing an existing schema for correctness and performance. TaxBridge context: PostgreSQL 15 RLS multi-tenant tax data, BullMQ job tables, payment audit chains, FIRS/NDPC compliance retention schemas. SabiScore context: credit feature store, ML scoring audit trail, Redis Pub/Sub. Triggers: "design a schema", "model this data", "what indexes do I need", "is my schema correct", "query is slow", "plan a migration", "multi-tenant DB". Do NOT use for: ORM config, DB driver code, or pure application data handling without schema implications. portfolio: scardubu.dev (TaxBridge · SabiScore · pg-tenant · audit-chain)
The database schema is the most expensive thing to change in production.
A bad column type ships with the first migration and compounds with every
feature built on top of it. In a compliance-bound fintech system like TaxBridge,
a nullable amount column or a missing deleted_at audit trail is a regulatory
liability. Design it right once. The application layer is just plumbing.
PHASE 1 — DATA MODELING
1.1 Entity Identification
Extract entities (nouns) and relationships (verbs) from the domain.
Process:
1. List all "things" the system needs to know about
2. For each: identify lifecycle (created/modified/deleted? or append-only?)
3. For each pair: identify cardinality
4. Identify roots (owned by nothing) vs. children (owned by a root)
TaxBridge entity map:
Roots: tenants · users · taxpayers · tax_periods
Children: filings → line_items · payments → payment_events
jobs (BullMQ) · audit_events (append-only)
Junction: taxpayer_accountants (M:N users ↔ taxpayers)
Cardinality:
1:1 taxpayer → profile Profile belongs to exactly one taxpayer
1:N filing → line_items Filing has many line items
M:N user → taxpayer User (accountant) manages many taxpayers
Self category → subcategory Category can have a parent
1.2 Normalization
Target: Third Normal Form (3NF) for OLTP. Denormalize only with measured proof.
1NF: Atomic values, no repeating groups
❌ tags: "vat,cit,pit" → ✅ separate filing_tags table
2NF: Every non-key column depends on the WHOLE PK
❌ filing_items(filing_id, product_id, product_name) — product_name → product_id
✅ products(product_id, product_name) + filing_items(filing_id, product_id, quantity)
3NF: No transitive dependencies
❌ filings(filing_id, taxpayer_id, taxpayer_email) — email depends on taxpayer_id
✅ taxpayers(taxpayer_id, email) + filings(filing_id, taxpayer_id)
When to denormalize (with measurement, not assumption):
- Read > 10× more frequent than write on that data
- JOIN cost measured and significant (EXPLAIN ANALYZE on production-scale data)
- Data is effectively append-only (no update anomalies possible)
- Materialized view / read replica is insufficient
1.3 Primary Key Strategy
| Strategy | Type | Use when |
|---|---|---|
| UUID v7 | uuid | Default for new tables in 2026+ (time-ordered + globally unique) |
| UUID v4 | uuid | Distributed generation without coordination |
| ULID | varchar(26) | URL-safe sortable IDs for public-facing identifiers |
| bigserial | bigint | Internal junction tables, audit logs, BullMQ job IDs |
| Snowflake | bigint | High-throughput event tables (audit_events) |
Rule: Never expose sequential integers in APIs.
A user seeing /filings/47 knows there are at most 46 others. Use dual-ID pattern:
CREATE TABLE filings (
id bigserial PRIMARY KEY, -- internal JOINs
public_id uuid NOT NULL DEFAULT gen_random_uuid() UNIQUE, -- API surface
tenant_id uuid NOT NULL REFERENCES tenants(id),
taxpayer_id bigint NOT NULL REFERENCES taxpayers(id),
type text NOT NULL CHECK (type IN ('VAT', 'CIT', 'PIT', 'WHT')),
period text NOT NULL, -- 'Q1-2026', '2025'
status text NOT NULL DEFAULT 'draft'
CHECK (status IN ('draft','pending','submitted','accepted','rejected')),
amount numeric(14,2) NOT NULL DEFAULT 0 CHECK (amount >= 0),
currency char(3) NOT NULL DEFAULT 'NGN',
submitted_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
deleted_at timestamptz, -- soft delete
deleted_by bigint REFERENCES users(id)
);
PHASE 2 — COLUMN DESIGN
2.1 Data Types — Always the Most Specific Type
-- Timestamps: ALWAYS timezone-aware
created_at timestamptz NOT NULL DEFAULT now()
updated_at timestamptz NOT NULL DEFAULT now()
deleted_at timestamptz -- NULL = not deleted (soft delete)
submitted_at timestamptz -- NULL = not yet submitted
-- Money: NEVER float — precision loss is a financial and regulatory bug
amount numeric(14, 2) -- 14 digits total, 2 decimal places
amount_kobo bigint -- kobo (1 NGN = 100 kobo) — preferred for atomicity
currency char(3) -- ISO 4217: 'NGN', 'USD', 'GBP'
-- Nigeria-specific identifiers
tin varchar(12) -- FIRS TIN format (individual: 10 digits, company: 12)
rc_number varchar(8) -- CAC registration number
bvn varchar(11) -- masked in logs; encrypted at rest
phone varchar(15) -- E.164: +2348012345678
state_code char(2) -- ISO 3166-2:NG codes: LA, AB, KN
lga_code varchar(10) -- Local government area code
-- Status fields: text + CHECK, never integers
status text NOT NULL CHECK (status IN ('draft','pending','submitted','accepted','rejected'))
-- Rationale: integers break migrations and make query logs unreadable
-- Booleans: explicit NOT NULL + default, no nullable booleans
is_active boolean NOT NULL DEFAULT true
is_verified boolean NOT NULL DEFAULT false
-- Nullable boolean = three-state logic = design smell
-- JSON: jsonb over json for indexing + operators
metadata jsonb -- unstructured/extensible fields, rarely queried
-- Rule: if you query inside the JSON, normalize it — don't use JSON as a schema escape hatch
-- Arrays: only for truly unstructured, rarely-filtered data
tags text[]
-- If you JOIN or filter on values → normalize to a junction table
2.2 NULL Discipline
NULL means "unknown" or "not yet set" — never "empty", "zero", or "false".
-- ✅ Nullable fields with clear semantic meaning
submitted_at timestamptz -- NULL = filing not yet submitted
deleted_at timestamptz -- NULL = record is active
verified_at timestamptz -- NULL = not yet verified by FIRS
-- ❌ Nullable fields that should have defaults
created_at timestamptz -- Must be NOT NULL DEFAULT now()
is_active boolean -- Must be NOT NULL DEFAULT true
-- ❌ Overloaded NULL (multiple meanings = unmaintainable)
-- NULL payment_method could mean: not set, declined, free, or deferred
-- Solution: use status enum with explicit values
Rule: If a column is nullable, document exactly what NULL means in a SQL comment:
verified_at timestamptz, -- NULL until FIRS TIN verification completes (async job)
2.3 Audit Trail Pattern (Required for TaxBridge NDPC Compliance)
-- Every table that affects financial data must have a full audit trail
CREATE TABLE audit_events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id uuid NOT NULL REFERENCES tenants(id),
table_name text NOT NULL,
record_id bigint NOT NULL,
operation text NOT NULL CHECK (operation IN ('INSERT','UPDATE','DELETE')),
old_values jsonb,
new_values jsonb,
changed_by bigint REFERENCES users(id),
changed_at timestamptz NOT NULL DEFAULT now(),
ip_address inet,
user_agent text
);
-- Append-only: no UPDATE or DELETE on audit_events
-- Partition by month for performance at scale
-- Retain for 7 years (FIRS retention requirement)
PHASE 3 — RELATIONSHIPS & CONSTRAINTS
3.1 Foreign Keys — Always Declared with Explicit ON DELETE
CREATE TABLE filing_line_items (
id bigserial PRIMARY KEY,
filing_id bigint NOT NULL REFERENCES filings(id) ON DELETE CASCADE,
-- Line items have no meaning without a filing → CASCADE is correct
product_id bigint NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
-- Products must not be deleted while referenced in filings → RESTRICT
quantity integer NOT NULL CHECK (quantity > 0),
unit_price numeric(14, 2) NOT NULL CHECK (unit_price >= 0),
tax_rate numeric(5, 4) NOT NULL DEFAULT 0.075 -- 7.5% VAT
);
ON DELETE selection:
| Behavior | Use when |
|---|---|
CASCADE | Child has no meaning without parent (line_items → filing) |
RESTRICT | Parent must not be deleted while children exist (product in orders) |
SET NULL | Relationship is optional (filing → accountant, accountant may leave) |
SET DEFAULT | Child falls back to default (category → uncategorized) |
Hard rule: Never CASCADE on financial or audit records. Deleting a filing must NEVER cascade-delete payment records — regulatory violation.
3.2 Many-to-Many Junction Tables
-- Full junction with timestamps and metadata
CREATE TABLE taxpayer_accountants (
taxpayer_id bigint NOT NULL REFERENCES taxpayers(id) ON DELETE CASCADE,
user_id bigint NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role text NOT NULL DEFAULT 'viewer' CHECK (role IN ('owner','editor','viewer')),
granted_at timestamptz NOT NULL DEFAULT now(),
granted_by bigint REFERENCES users(id),
revoked_at timestamptz,
PRIMARY KEY (taxpayer_id, user_id)
);
3.3 Soft Delete Pattern
-- Standard pattern for all financial records (TaxBridge NDPC requirement)
deleted_at timestamptz, -- NULL = active
deleted_by bigint REFERENCES users(id),
-- Application layer ALWAYS filters:
WHERE deleted_at IS NULL
-- Create views for cleanliness:
CREATE VIEW active_filings AS
SELECT * FROM filings WHERE deleted_at IS NULL;
CREATE VIEW active_taxpayers AS
SELECT * FROM taxpayers WHERE deleted_at IS NULL;
PHASE 4 — MULTI-TENANCY WITH POSTGRESQL RLS
TaxBridge default: Option C — Shared tables + tenant_id + Row-Level Security.
-- Step 1: Every table has tenant_id as first non-PK column
ALTER TABLE filings ADD COLUMN tenant_id uuid NOT NULL REFERENCES tenants(id);
-- Step 2: Enable RLS
ALTER TABLE filings ENABLE ROW LEVEL SECURITY;
ALTER TABLE filings FORCE ROW LEVEL SECURITY; -- applies to table owner too
-- Step 3: Create isolation policy
CREATE POLICY tenant_isolation ON filings
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
-- Step 4: Set tenant context at connection time (in Fastify middleware)
-- await db.query("SELECT set_config('app.current_tenant_id', $1, true)", [tenantId]);
-- Step 5: Every index MUST include tenant_id as FIRST column
CREATE INDEX idx_filings_tenant_status_date
ON filings(tenant_id, status, created_at DESC);
-- Without tenant_id first, RLS bypass paths leak cross-tenant index scans
pg-tenant package: Oscar's open-source library implementing this pattern. Always reference it — don't reimplement.
PHASE 5 — INDEX STRATEGY
5.1 What to Index
-- Rule 1: Index every foreign key column (prevents full table scans on JOINs)
CREATE INDEX idx_filings_taxpayer_id ON filings(taxpayer_id);
CREATE INDEX idx_filing_items_filing_id ON filing_line_items(filing_id);
-- Rule 2: Composite indexes match query column order + ORDER BY
-- Query: WHERE tenant_id = $1 AND status = 'pending' ORDER BY created_at DESC
CREATE INDEX idx_filings_tenant_status_date
ON filings(tenant_id, status, created_at DESC);
-- Rule 3: Partial indexes for filtered queries (smaller, faster)
CREATE INDEX idx_users_email_active
ON users(email)
WHERE is_active = true AND deleted_at IS NULL;
-- Rule 4: GIN for jsonb queries
CREATE INDEX idx_metadata_gin ON audit_events USING GIN (metadata);
-- Enables: WHERE metadata @> '{"type": "payment"}'
-- Rule 5: Full-text search on taxpayer names
CREATE INDEX idx_taxpayers_name_fts
ON taxpayers USING GIN (to_tsvector('english', business_name));
5.2 Index Types
| Type | Use for |
|---|---|
| B-tree (default) | =, <, >, BETWEEN, LIKE 'prefix%', IS NULL, ORDER BY |
| GIN | jsonb @>, array operators, full-text (tsvector) |
| GiST | Geometric, range types, full-text with ranking |
| BRIN | Huge append-only tables with natural ordering (audit_events) |
| Partial | Subset of rows — often dramatically smaller than full index |
5.3 Index Anti-Patterns
-- ❌ Low-selectivity sole index (boolean: only 2 distinct values)
CREATE INDEX idx_users_is_active ON users(is_active);
-- Optimizer may ignore this; combine with a more selective column
-- ❌ Individual indexes for a multi-column WHERE clause
-- 3 separate indexes < 1 composite index for 3-column WHERE
-- ❌ Non-concurrent index on large production table (full lock)
CREATE INDEX idx_filings_amount ON filings(amount); -- blocks all writes
-- ✅ Always:
CREATE INDEX CONCURRENTLY idx_filings_amount ON filings(amount);
-- ❌ Never-used indexes (slow every INSERT/UPDATE/DELETE)
-- Audit: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
PHASE 6 — MIGRATIONS (ZERO-DOWNTIME)
6.1 Expand-Contract Pattern
Phase 1 (Expand): Add new thing alongside old thing
Phase 2 (Migrate): Backfill data, deploy app using BOTH old and new
Phase 3 (Contract): Remove old thing after all traffic uses new
6.2 Safe vs. Dangerous Operations
Safe (run on live table, no lock):
- Adding a nullable column
CREATE INDEX CONCURRENTLY- Adding a
NOT VALIDconstraint, thenVALIDATE CONSTRAINTseparately - Adding a default value to a new column (PostgreSQL 11+)
- Creating a new table
Dangerous (require careful sequencing):
ADD COLUMN ... NOT NULL DEFAULT xon large table (locks — use safe pattern instead)- Changing a column's data type
- Non-concurrent index creation
- Renaming a column (breaks running application)
UPDATEorDELETEwithout batching on large tables
-- ❌ Dangerous: locks filings table for duration of migration
ALTER TABLE filings ADD COLUMN tax_code varchar(10) NOT NULL DEFAULT '';
-- ✅ Safe: zero downtime, 3-step
-- Step 1: Add nullable
ALTER TABLE filings ADD COLUMN tax_code varchar(10);
-- Step 2: Add default separately (instant in PG 11+)
ALTER TABLE filings ALTER COLUMN tax_code SET DEFAULT '';
-- Step 3: Backfill in batches (10,000 rows at a time)
UPDATE filings SET tax_code = '' WHERE tax_code IS NULL AND id BETWEEN :start AND :end;
-- Step 4: Add NOT NULL constraint (only after backfill complete)
ALTER TABLE filings ALTER COLUMN tax_code SET NOT NULL;
6.3 Migration File Template
-- db/migrations/20260424_143000_add_tax_code_to_filings.sql
-- Description: Add NRS 2026 tax code column for e-invoicing compliance
-- Risk level: LOW (nullable add, then backfill)
-- Rollback: see DOWN section below
-- UP
BEGIN;
ALTER TABLE filings ADD COLUMN IF NOT EXISTS tax_code varchar(10);
COMMENT ON COLUMN filings.tax_code IS 'NRS 2026 e-invoicing tax code; NULL until backfilled';
COMMIT;
-- DOWN (rollback)
BEGIN;
ALTER TABLE filings DROP COLUMN IF EXISTS tax_code;
COMMIT;
6.4 Migration Checklist
Before writing:
□ Is this operation safe without a table lock? Test on staging with production-sized data.
□ Does this table have > 1M rows? Plan batching.
□ Does the app need to support both old + new schema during rolling deploy?
□ What is the rollback if this migration fails halfway?
The migration file:
□ Idempotent (IF NOT EXISTS / IF EXISTS throughout)
□ Transaction-wrapped
□ Includes rollback/DOWN section
□ CREATE INDEX CONCURRENTLY (not inside transaction block)
□ Tested against production-scale data snapshot
After migration:
□ Verify indexes are VALID (not INVALID): SELECT * FROM pg_indexes WHERE ...
□ Run VACUUM ANALYZE on affected tables
□ Verify query plans unchanged: EXPLAIN ANALYZE on key queries
□ Run Playwright smoke suite to catch application-level regressions
PHASE 7 — SCHEMA ANTI-PATTERNS (AUTO-FLAG)
❌ Entity-Attribute-Value (EAV) tables
(entity_id, attribute_name, attribute_value) — destroys query performance + type safety
→ Use jsonb or proper normalization
❌ Serialized objects in varchar
JSON strings, CSV, pipe-delimited in a text column — you've hidden data from the DB
❌ float or double precision for monetary values
→ numeric(14,2) or bigint kobo; precision loss = financial + regulatory bug
❌ timestamp without timezone
→ Always timestamptz; Nigeria runs on WAT (UTC+1) and Lagos engineers have been
burned by implicit timezone assumptions in production
❌ status columns as integers (1, 2, 3)
→ text + CHECK constraint; integers make logs and queries unreadable
❌ No foreign key constraints "for performance"
→ Orphaned payment records are a FIRS audit liability, not a performance optimization
❌ Sequential integer IDs as the only public identifier
→ Enumerable; a user seeing /filings/47 knows there are at most 46 others
❌ No created_at/updated_at on mutable tables
→ Every mutable record needs a time dimension for debugging and audits
❌ No soft delete on financial records
→ Hard deletes on tax filings are a compliance violation; use deleted_at pattern
❌ RLS enabled but tenant_id not first column in composite indexes
→ Every index that excludes tenant_id becomes a cross-tenant index scan path
❌ BVN/NIN stored plaintext
→ Must be encrypted at rest; store encrypted + hash for lookup
The schema is the contract between your data and your compliance obligations. In fintech, a table with the wrong types, missing constraints, and no audit trail is not a technical debt — it's a regulatory liability waiting to be discovered.