postgresql-table-design
Install this skill
npx skills add wshobson/agentsWorks across Claude Code, Cursor, Codex, Copilot & Antigravity
PostgreSQL table design focuses on schema efficiency, data integrity, and performance optimization within the relational model. This skill ensures your database architecture follows 3NF normalization standards while strategically implementing indexes, constraints, and appropriate data types to prevent common bottlenecks. It moves beyond basic CRUD operations to address technical nuances like MVCC row management, TOAST storage thresholds, and GIN/GiST index selection for advanced types. By applying these standards, you avoid silent data truncation, unnecessary sequence gaps anxiety, and performance issues caused by unindexed foreign keys. The methodology emphasizes explicit type selectionâprioritizing TEXT, TIMESTAMPTZ, and NUMERICâto maintain precise control over storage and query execution, ensuring your schema remains maintainable and performs predictably under high-concurrency workloads.
When to Use This Skill
- â˘Designing transactional schemas that require strict ACID compliance
- â˘Architecting event-log databases that utilize range types for time-window queries
- â˘Structuring complex relationship models using normalized foreign keys
- â˘Building full-text search indexes on large document repositories
How to Invoke This Skill
Example prompts that trigger this skill in Claude Code, Cursor, or Antigravity:
- âhow should I design my database schema for this app
- âwhat are the best practices for PostgreSQL primary keys
- âhow to model this data relationship in Postgres
- âshould I use UUID or BIGINT for my user table
- âhow to handle search on this text column in Postgres
Pro Tips
- đĄAlways start with normalization to 3NF and only denormalize if specific performance bottlenecks are identified and proven through testing.
- đĄPrioritize indexing on foreign key columns and frequently queried WHERE clause columns; avoid over-indexing, which can hurt write performance.
- đĄRegularly review `EXPLAIN ANALYZE` output for your most critical queries to validate your design choices and identify further optimization opportunities.
What this skill does
- â˘Configuring identity columns and UUID strategies for primary keys
- â˘Implementing constraint-based data validation instead of procedural checks
- â˘Selecting optimal storage types for specialized data like ranges, arrays, and network addresses
- â˘Constructing efficient index strategies for join keys and full-text search
- â˘Optimizing TOAST and storage strategies for large text or binary payloads
When not to use it
- âProjects requiring schemaless NoSQL flexibility for rapidly changing document structures
- âApplications where horizontal write-scaling takes absolute precedence over relational integrity
Example workflow
- Define entity relationships and normalize to 3NF
- Select appropriate data types for each column to maximize space efficiency
- Apply NOT NULL and DEFAULT constraints to ensure schema integrity
- Establish primary keys and explicitly index foreign key columns
- Create specialized indexes like GIN for arrays or GiST for ranges
- Verify storage strategy settings for columns prone to exceeding TOAST thresholds
Prerequisites
- âA running PostgreSQL instance
- âBasic knowledge of SQL DDL syntax
Pitfalls & limitations
- !Assuming PostgreSQL will automatically index foreign key columns
- !Relying on sequence numbers to remain consecutive after transaction rollbacks
- !Over-using denormalization, which creates significant data update anomalies
- !Neglecting the default storage behavior for large objects
FAQ
How it compares
Generic prompts often suggest basic CRUD schemas; this approach enforces specific performance-tuned patterns like FK indexing and appropriate index selection (GIN/GiST) that standard AI suggestions frequently overlook.
đ Full skill instructions â original source: wshobson/agents
## Core Rules
- Define a **PRIMARY KEY** for reference tables (users, orders, etc.). Not always needed for time-series/event/log data. When used, prefer
BIGINT GENERATED ALWAYS AS IDENTITY; use UUID only when global uniqueness/opacity is needed.- **Normalize first (to 3NF)** to eliminate data redundancy and update anomalies; denormalize **only** for measured, high-ROI reads where join performance is proven problematic. Premature denormalization creates maintenance burden.
- Add **NOT NULL** everywhere itâs semantically required; use **DEFAULT**s for common values.
- Create **indexes for access paths you actually query**: PK/unique (auto), **FK columns (manual!)**, frequent filters/sorts, and join keys.
- Prefer **TIMESTAMPTZ** for event time; **NUMERIC** for money; **TEXT** for strings; **BIGINT** for integer values, **DOUBLE PRECISION** for floats (or
NUMERIC for exact decimal arithmetic).## PostgreSQL âGotchasâ
- **Identifiers**: unquoted â lowercased. Avoid quoted/mixed-case names. Convention: use
snake_case for table/column names.- **Unique + NULLs**: UNIQUE allows multiple NULLs. Use
UNIQUE (...) NULLS NOT DISTINCT (PG15+) to restrict to one NULL.- **FK indexes**: PostgreSQL **does not** auto-index FK columns. Add them.
- **No silent coercions**: length/precision overflows error out (no truncation). Example: inserting 999 into
NUMERIC(2,0) fails with error, unlike some databases that silently truncate or round.- **Sequences/identity have gaps** (normal; don't "fix"). Rollbacks, crashes, and concurrent transactions create gaps in ID sequences (1, 2, 5, 6...). This is expected behaviorâdon't try to make IDs consecutive.
- **Heap storage**: no clustered PK by default (unlike SQL Server/MySQL InnoDB);
CLUSTER is one-off reorganization, not maintained on subsequent inserts. Row order on disk is insertion order unless explicitly clustered.- **MVCC**: updates/deletes leave dead tuples; vacuum handles themâdesign to avoid hot wide-row churn.
## Data Types
- **IDs**:
BIGINT GENERATED ALWAYS AS IDENTITY preferred (GENERATED BY DEFAULT also fine); UUID when merging/federating/used in a distributed system or for opaque IDs. Generate with uuidv7() (preferred if using PG18+) or gen_random_uuid() (if using an older PG version).- **Integers**: prefer
BIGINT unless storage space is critical; INTEGER for smaller ranges; avoid SMALLINT unless constrained.- **Floats**: prefer
DOUBLE PRECISION over REAL unless storage space is critical. Use NUMERIC for exact decimal arithmetic.- **Strings**: prefer
TEXT; if length limits needed, use CHECK (LENGTH(col) <= n) instead of VARCHAR(n); avoid CHAR(n). Use BYTEA for binary data. Large strings/binary (>2KB default threshold) automatically stored in TOAST with compression. TOAST storage: PLAIN (no TOAST), EXTENDED (compress + out-of-line), EXTERNAL (out-of-line, no compress), MAIN (compress, keep in-line if possible). Default EXTENDED usually optimal. Control with ALTER TABLE tbl ALTER COLUMN col SET STORAGE strategy and ALTER TABLE tbl SET (toast_tuple_target = 4096) for threshold. Case-insensitive: for locale/accent handling use non-deterministic collations; for plain ASCII use expression indexes on LOWER(col) (preferred unless column needs case-insensitive PK/FK/UNIQUE) or CITEXT.- **Money**:
NUMERIC(p,s) (never float).- **Time**:
TIMESTAMPTZ for timestamps; DATE for date-only; INTERVAL for durations. Avoid TIMESTAMP (without timezone). Use now() for transaction start time, clock_timestamp() for current wall-clock time.- **Booleans**:
BOOLEAN with NOT NULL constraint unless tri-state values are required.- **Enums**:
CREATE TYPE ... AS ENUM for small, stable sets (e.g. US states, days of week). For business-logic-driven and evolving values (e.g. order statuses) â use TEXT (or INT) + CHECK or lookup table.- **Arrays**:
TEXT[], INTEGER[], etc. Use for ordered lists where you query elements. Index with **GIN** for containment (@>, <@) and overlap (&&) queries. Access: arr[1] (1-indexed), arr[1:3] (slicing). Good for tags, categories; avoid for relationsâuse junction tables instead. Literal syntax: '{val1,val2}' or ARRAY[val1,val2].- **Range types**:
daterange, numrange, tstzrange for intervals. Support overlap (&&), containment (@>), operators. Index with **GiST**. Good for scheduling, versioning, numeric ranges. Pick a bounds scheme and use it consistently; prefer [) (inclusive/exclusive) by default.- **Network types**:
INET for IP addresses, CIDR for network ranges, MACADDR for MAC addresses. Support network operators (<<, >>, &&).- **Geometric types**:
POINT, LINE, POLYGON, CIRCLE for 2D spatial data. Index with **GiST**. Consider **PostGIS** for advanced spatial features.- **Text search**:
TSVECTOR for full-text search documents, TSQUERY for search queries. Index tsvector with **GIN**. Always specify language: to_tsvector('english', col) and to_tsquery('english', 'query'). Never use single-argument versions. This applies to both index expressions and queries.- **Domain types**:
CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$') for reusable custom types with validation. Enforces constraints across tables.- **Composite types**:
CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT) for structured data within columns. Access with (col).field syntax.- **JSONB**: preferred over JSON; index with **GIN**. Use only for optional/semi-structured attrs. ONLY use JSON if the original ordering of the contents MUST be preserved.
- **Vector types**:
vector type by pgvector for vector similarity search for embeddings.### Do not use the following data types
- DO NOT use
timestamp (without time zone); DO use timestamptz instead.- DO NOT use
char(n) or varchar(n); DO use text instead.- DO NOT use
money type; DO use numeric instead.- DO NOT use
timetz type; DO use timestamptz instead.- DO NOT use
timestamptz(0) or any other precision specification; DO use timestamptz instead- DO NOT use
serial type; DO use generated always as identity instead.## Table Types
- **Regular**: default; fully durable, logged.
- **TEMPORARY**: session-scoped, auto-dropped, not logged. Faster for scratch work.
- **UNLOGGED**: persistent but not crash-safe. Faster writes; good for caches/staging.
## Row-Level Security
Enable with
ALTER TABLE tbl ENABLE ROW LEVEL SECURITY. Create policies: CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id()). Built-in user-based access control at the row level.## Constraints
- **PK**: implicit UNIQUE + NOT NULL; creates a B-tree index.
- **FK**: specify
ON DELETE/UPDATE action (CASCADE, RESTRICT, SET NULL, SET DEFAULT). Add explicit index on referencing columnâspeeds up joins and prevents locking issues on parent deletes/updates. Use DEFERRABLE INITIALLY DEFERRED for circular FK dependencies checked at transaction end.- **UNIQUE**: creates a B-tree index; allows multiple NULLs unless
NULLS NOT DISTINCT (PG15+). Standard behavior: (1, NULL) and (1, NULL) are allowed. With NULLS NOT DISTINCT: only one (1, NULL) allowed. Prefer NULLS NOT DISTINCT unless you specifically need duplicate NULLs.- **CHECK**: row-local constraints; NULL values pass the check (three-valued logic). Example:
CHECK (price > 0) allows NULL prices. Combine with NOT NULL to enforce: price NUMERIC NOT NULL CHECK (price > 0).- **EXCLUDE**: prevents overlapping values using operators.
EXCLUDE USING gist (room_id WITH =, booking_period WITH &&) prevents double-booking rooms. Requires appropriate index type (often GiST).## Indexing
- **B-tree**: default for equality/range queries (
=, <, >, BETWEEN, ORDER BY)- **Composite**: order mattersâindex used if equality on leftmost prefix (
WHERE a = ? AND b > ? uses index on (a,b), but WHERE b = ? does not). Put most selective/frequently filtered columns first.- **Covering**:
CREATE INDEX ON tbl (id) INCLUDE (name, email) - includes non-key columns for index-only scans without visiting table.- **Partial**: for hot subsets (
WHERE status = 'active' â CREATE INDEX ON tbl (user_id) WHERE status = 'active'). Any query with status = 'active' can use this index.- **Expression**: for computed search keys (
CREATE INDEX ON tbl (LOWER(email))). Expression must match exactly in WHERE clause: WHERE LOWER(email) = '[email protected]'.- **GIN**: JSONB containment/existence, arrays (
@>, ?), full-text search (@@)- **GiST**: ranges, geometry, exclusion constraints
- **BRIN**: very large, naturally ordered data (time-series)âminimal storage overhead. Effective when row order on disk correlates with indexed column (insertion order or after
CLUSTER).## Partitioning
- Use for very large tables (>100M rows) where queries consistently filter on partition key (often time/date).
- Alternate use: use for tables where data maintenance tasks dictates e.g. data pruned or bulk replaced periodically
- **RANGE**: common for time-series (
PARTITION BY RANGE (created_at)). Create partitions: CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'). **TimescaleDB** automates time-based or ID-based partitioning with retention policies and compression.- **LIST**: for discrete values (
PARTITION BY LIST (region)). Example: FOR VALUES IN ('us-east', 'us-west').- **HASH**: for even distribution when no natural key (
PARTITION BY HASH (user_id)). Creates N partitions with modulus.- **Constraint exclusion**: requires
CHECK constraints on partitions for query planner to prune. Auto-created for declarative partitioning (PG10+).- Prefer declarative partitioning or hypertables. Do NOT use table inheritance.
- **Limitations**: no global UNIQUE constraintsâinclude partition key in PK/UNIQUE. FKs from partitioned tables not supported; use triggers.
## Special Considerations
### Update-Heavy Tables
- **Separate hot/cold columns**âput frequently updated columns in separate table to minimize bloat.
- **Use
fillfactor=90** to leave space for HOT updates that avoid index maintenance.- **Avoid updating indexed columns**âprevents beneficial HOT updates.
- **Partition by update patterns**âseparate frequently updated rows in a different partition from stable data.
### Insert-Heavy Workloads
- **Minimize indexes**âonly create what you query; every index slows inserts.
- **Use
COPY or multi-row INSERT** instead of single-row inserts.- **UNLOGGED tables** for rebuildable staging dataâmuch faster writes.
- **Defer index creation** for bulk loadsâ>drop index, load data, recreate indexes.
- **Partition by time/hash** to distribute load. **TimescaleDB** automates partitioning and compression of insert-heavy data.
- **Use a natural key for primary key** such as a (timestamp, device_id) if enforcing global uniqueness is important many insert-heavy tables don't need a primary key at all.
- If you do need a surrogate key, **Prefer
BIGINT GENERATED ALWAYS AS IDENTITY over UUID**.### Upsert-Friendly Design
- **Requires UNIQUE index** on conflict target columnsâ
ON CONFLICT (col1, col2) needs exact matching unique index (partial indexes don't work).- **Use
EXCLUDED.column** to reference would-be-inserted values; only update columns that actually changed to reduce write overhead.- **
DO NOTHING faster** than DO UPDATE when no actual update needed.### Safe Schema Evolution
- **Transactional DDL**: most DDL operations can run in transactions and be rolled backâ
BEGIN; ALTER TABLE...; ROLLBACK; for safe testing.- **Concurrent index creation**:
CREATE INDEX CONCURRENTLY avoids blocking writes but can't run in transactions.- **Volatile defaults cause rewrites**: adding
NOT NULL columns with volatile defaults (e.g., now(), gen_random_uuid()) rewrites entire table. Non-volatile defaults are fast.- **Drop constraints before columns**:
ALTER TABLE DROP CONSTRAINT then DROP COLUMN to avoid dependency issues.- **Function signature changes**:
CREATE OR REPLACE with different arguments creates overloads, not replacements. DROP old version if no overload desired.## Generated Columns
-
... GENERATED ALWAYS AS (<expr>) STORED for computed, indexable fields. PG18+ adds VIRTUAL columns (computed on read, not stored).## Extensions
- **
pgcrypto**: crypt() for password hashing.- **
uuid-ossp**: alternative UUID functions; prefer pgcrypto for new projects.- **
pg_trgm**: fuzzy text search with % operator, similarity() function. Index with GIN for LIKE '%pattern%' acceleration.- **
citext**: case-insensitive text type. Prefer expression indexes on LOWER(col) unless you need case-insensitive constraints.- **
btree_gin/btree_gist**: enable mixed-type indexes (e.g., GIN index on both JSONB and text columns).- **
hstore**: key-value pairs; mostly superseded by JSONB but useful for simple string mappings.- **
timescaledb**: essential for time-seriesâautomated partitioning, retention, compression, continuous aggregates.- **
postgis**: comprehensive geospatial support beyond basic geometric typesâessential for location-based applications.- **
pgvector**: vector similarity search for embeddings.- **
pgaudit**: audit logging for all database activity.## JSONB Guidance
- Prefer
JSONB with **GIN** index.- Default:
CREATE INDEX ON tbl USING GIN (jsonb_col); â accelerates:- **Containment**
jsonb_col @> '{"k":"v"}'- **Key existence**
jsonb_col ? 'k', **any/all keys** ?\|, ?&- **Path containment** on nested docs
- **Disjunction**
jsonb_col @> ANY(ARRAY['{"status":"active"}', '{"status":"pending"}'])- Heavy
@> workloads: consider opclass jsonb_path_ops for smaller/faster containment-only indexes:-
CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops);- **Trade-off**: loses support for key existence (
?, ?|, ?&) queriesâonly supports containment (@>)- Equality/range on a specific scalar field: extract and index with B-tree (generated column or expression):
-
ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED;-
CREATE INDEX ON tbl (price);- Prefer queries like
WHERE price BETWEEN 100 AND 500 (uses B-tree) over WHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500 without index.- Arrays inside JSONB: use GIN +
@> for containment (e.g., tags). Consider jsonb_path_ops if only doing containment.- Keep core relations in tables; use JSONB for optional/variable attributes.
- Use constraints to limit allowed JSONB values in a column e.g.
config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object')## Examples
### Users
CREATE TABLE users (
user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ON users (LOWER(email));
CREATE INDEX ON users (created_at);### Orders
CREATE TABLE orders (
order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(user_id),
status TEXT NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING','PAID','CANCELED')),
total NUMERIC(10,2) NOT NULL CHECK (total > 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (created_at);### JSONB
CREATE TABLE profiles (
user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
attrs JSONB NOT NULL DEFAULT '{}',
theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED
);
CREATE INDEX profiles_attrs_gin ON profiles USING GIN (attrs);How to Use This Skill Unit
Option A: Project-Specific (Recommended)
- Click "Download" above
- In your project, create the directory:
.agent/skills/postgresql/ - Save the file as
SKILL.md - The agent will automatically discover the skill based on its description.
Option B: Global Installation (All Agents)
Save the file to these locations to make it available across all projects:
- Claude Code:
~/.claude/skills/wshobson/agents/postgresql/SKILL.md - Cursor:
~/.cursor/skills/wshobson/agents/postgresql/SKILL.md - Antigravity:
~/.gemini/antigravity/skills/wshobson/agents/postgresql/SKILL.md
đ Install with CLI:npx skills add wshobson/agents