Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.
name: data-systems-architecture
description: Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.
Data Systems Architecture
Overview
Core principle: Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.
This skill synthesizes knowledge from three foundational texts:
New system design?
├─ Yes → Read 01, then 02 for data model
└─ No → What's the problem?
├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns)
├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions)
├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP)
├─ "App makes too many queries" → Read 07 (N+1, ORM patterns)
└─ "Race conditions/deadlocks" → Read 06 (concurrency)
Core Concepts (Quick Reference)
The Three Pillars
Concern
Definition
Key Question
Reliability
System works correctly under faults
What happens when things fail?
Scalability
Handles growth gracefully
What's 10x load look like?
Maintainability
Easy to operate and evolve
Can new engineers understand this?
Data Model Selection
Model
Best For
Avoid When
Relational
Many-to-many relationships, joins, consistency
Highly hierarchical data, constant schema changes
Document
Self-contained docs, tree structures
Need for joins, many-to-many
Graph
Highly connected data, recursive queries
Simple CRUD, no relationship traversal
OLTP vs OLAP
Aspect
OLTP
OLAP
Query pattern
Point lookups, few rows
Aggregates, many rows
Optimization
Index everything used in WHERE
Fewer indexes, full scans OK
Storage
Row-oriented
Consider column-oriented
Index Type Quick Reference
Type
Use Case
PostgreSQL
B-tree
Equality, range, sorting
Default, most queries
Hash
Equality only
Faster for exact match
GIN
Arrays, JSONB, full-text
@>, @@ operators
GiST
Geometric, range types
PostGIS, nearest-neighbor
BRIN
Large, naturally ordered tables
Time-series data
Isolation Levels
Level
Prevents
PostgreSQL Default?
Read Committed
Dirty reads
Yes
Repeatable Read
+ Non-repeatable reads
No
Serializable
All anomalies
No (uses SSI)
Design Checklist
Before finalizing a data architecture:
Identified load parameters (read/write ratio, data volume, latency requirements)
Chose appropriate data model (relational/document/graph hybrid?)
Normalized to 3NF first, denormalized only with measured justification
Designed indexes for actual query patterns (not hypothetical)
Considered 10x growth scenario
Established isolation level requirements
Defined where business logic lives (app vs DB vs both)
Planned for operations (backups, monitoring, migrations)
References
Kleppmann, M. Designing Data-Intensive Applications (O'Reilly, 2017)
Fontaine, D. The Art of PostgreSQL (2nd ed., 2020)