Introduction to PostgreSQL

PostgreSQL, often simply called “Postgres,” is a powerful, open-source object-relational database management system (ORDBMS). First released in 1996 (while created even earlier), it has earned a reputation for reliability, feature robustness, and performance.

Key features of PostgreSQL include:

  • ACID (Atomicity, Consistency, Isolation, Durability) compliance
  • Support for complex queries
  • Multi-Version Concurrency Control (MVCC)
  • Extensibility
  • Strong standards compliance

For developers coming from other database systems, PostgreSQL offers a familiar SQL interface with some unique advantages. Its ability to handle complex workloads, support for JSON, and extensive indexing options make it a versatile choice for a wide range of applications.

Installation and dev environment

Before diving into PostgreSQL’s features, let’s set up a development environment. We’ll cover installation on major operating systems and basic configuration.

2.1 Installation

Ubuntu/Debian:


sudo apt update
sudo apt install postgresql postgresql-contrib

macOS (using Homebrew):


brew install postgresql

Windows: Download the installer from the official PostgreSQL website and follow the installation wizard.

2.2 Verifying installation

After installation, verify that PostgreSQL is running:


sudo systemctl status postgresql

On macOS or if using Homebrew:


brew services list

2.3 Basic configuration

PostgreSQL’s main configuration file is postgresql.conf. Its location varies by system, but you can find it using:


sudo -u postgres psql -c "SHOW config_file"

Key configuration parameters include:

  • max_connections: Maximum number of concurrent connections
  • shared_buffers: Memory used for caching data
  • effective_cache_size: Estimate of memory available for disk caching

Always restart PostgreSQL after changing configuration:


sudo systemctl restart postgresql

Basic PostgreSQL concepts

Before we start using PostgreSQL, let’s familiarize ourselves with some core concepts:

3.1 Clusters

A PostgreSQL installation is called a cluster. It’s a collection of databases managed by a single server instance.

3.2 Databases

A database in PostgreSQL is a named collection of SQL objects (tables, views, functions, etc.). Each cluster can contain multiple databases.

3.3 Schemas

Schemas are namespaces within a database that allow you to organize database objects. They help avoid naming conflicts and can be used for access control.

3.4 Tables

Tables are the primary data storage structures in PostgreSQL. They consist of rows (records) and columns (fields).

3.5 Views

Views are virtual tables based on the result of a SQL query. They don’t store data themselves but provide a way to encapsulate complex queries.

3.6 Functions and procedures

PostgreSQL supports user-defined functions and stored procedures, allowing you to encapsulate logic within the database.

3.7 Roles and privileges

PostgreSQL uses roles for authentication and authorization. Roles can be assigned various privileges to control access to database objects.

Data types

PostgreSQL offers a rich set of built-in data types. Understanding these types is crucial for efficient schema design and query performance.

4.1 Numeric types

  • INTEGER: 4-byte signed integer
  • BIGINT: 8-byte signed integer
  • NUMERIC or DECIMAL: Exact numeric with selectable precision
  • REAL: 4-byte floating-point number
  • DOUBLE PRECISION: 8-byte floating-point number

4.2 Character types

  • CHAR(n): Fixed-length character string
  • VARCHAR(n): Variable-length character string with limit
  • TEXT: Variable-length character string without limit

4.3 Date/Time types

  • DATE: Calendar date (year, month, day)
  • TIME: Time of day
  • TIMESTAMP: Date and time
  • INTERVAL: Time span

4.4 Boolean type

  • BOOLEAN: True or false

4.5 Array types

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays of any built-in or user-defined type.

Example:

CREATE TABLE matrix (
    id SERIAL PRIMARY KEY,
    data INTEGER[][]
);

INSERT INTO matrix (data) VALUES (ARRAY[[1,2,3], [4,5,6], [7,8,9]]);

4.6 JSON types

  • JSON: Stores JSON data that gets reparsed on each operation
  • JSONB: Stores JSON data in a binary format for faster processing

4.7 Special types

  • UUID: Universally Unique Identifiers
  • BYTEA: Binary strings (“byte arrays”)
  • CIDR, INET, MACADDR: Network address types

4.8 Composite types

You can create custom types that combine multiple fields:

CREATE TYPE inventory_item AS (
    name TEXT,
    supplier_id INTEGER,
    price NUMERIC
);

CREATE TABLE on_hand (
    item inventory_item,
    count INTEGER
);

Understanding these data types allows you to design efficient schemas and write performant queries. In the next section, we’ll dive into creating and managing databases in PostgreSQL.

Setting up PostgreSQL 16

5.1 Detailed installation process

Ubuntu/Debian:


# Create the file repository configuration
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# Import the repository signing key
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# Update the package lists
sudo apt update

# Install PostgreSQL 16
sudo apt install postgresql-16 postgresql-contrib-16

Red Hat/CentOS:


# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql

# Install PostgreSQL 16
sudo dnf install -y postgresql16-server

# Initialize the database
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb

# Start and enable the service
sudo systemctl start postgresql-16
sudo systemctl enable postgresql-16

macOS (using Homebrew):


# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install PostgreSQL 16
brew install postgresql@16

# Start the PostgreSQL service
brew services start postgresql@16

5.2 Post-installation setup

After installation, follow these steps:

1. Set a password for the default ‘postgres’ user:


sudo -u postgres psql
ALTER USER postgres WITH PASSWORD 'your_secure_password';
\q

2. Create a new database and user:


sudo -u postgres createuser --interactive
sudo -u postgres createdb myapp

3. Configure client authentication by editing the pg_hba.conf file:


sudo nano /etc/postgresql/16/main/pg_hba.conf

Adjust the lines for local connections according to your security needs.

5.3 Configuring PostgreSQL 16

The main configuration file is still postgresql.conf. Here are some important settings, with some adjustments specific to PostgreSQL 16:

1. Memory settings:


shared_buffers = 256MB  # 25% of RAM for dedicated servers
work_mem = 4MB          # Depends on max_connections and workload
maintenance_work_mem = 64MB

2. Checkpoint settings:


checkpoint_timeout = 15min
max_wal_size = 1GB
min_wal_size = 80MB

3. Query planner settings:


random_page_cost = 1.1  # For SSD storage
effective_cache_size = 3GB  # 50-75% of total RAM

4. Logging settings:


log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_duration_statement = 250ms

5. New in PostgreSQL 16:


wal_keep_size = 1GB  # Replaces wal_keep_segments

Remember to restart PostgreSQL after making changes:


sudo systemctl restart postgresql@16

5.4 Setting up pgAdmin 4

For PostgreSQL 16, make sure you have the latest version of pgAdmin:

1. Install pgAdmin 4: For Ubuntu/Debian:


# Add the pgAdmin repository
curl https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo apt-key add -
sudo sh -c 'echo "deb https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/$(lsb_release -cs) pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list'

# Install pgAdmin
sudo apt update
sudo apt install pgadmin4

For macOS:


brew install --cask pgadmin4

2. Launch pgAdmin and create a new server connection as before, ensuring you select PostgreSQL 16 as the version.

5.5 Setting up connection pooling with PgBouncer

The PgBouncer setup remains largely the same, but ensure it’s compatible with PostgreSQL 16:

1. Install PgBouncer:


sudo apt install pgbouncer

2. Configure PgBouncer by editing /etc/pgbouncer/pgbouncer.ini:


[databases]
* = host=localhost port=5432

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
max_client_conn = 1000
default_pool_size = 20

3. Create the auth file and start PgBouncer.

5.6 Verifying your PostgreSQL 16 setup

1. Check PostgreSQL status:


sudo systemctl status postgresql@16

2. Connect to PostgreSQL:


psql -U postgres -d postgres

3. Run a simple query to check the version:


SELECT version();

This should display PostgreSQL 16.x.
4. Check server logs:


sudo tail -f /var/log/postgresql/postgresql-16-main.log

5. Verify PgBouncer (if installed):


psql -p 6432 -U postgres pgbouncer
SHOW POOLS;

By following these steps, you should now have PostgreSQL 16 set up and ready to use. This version includes several improvements over previous versions, including better performance for parallel queries, improved monitoring capabilities, and enhanced logical replication features.

Creating and managing databases

6.1 Creating databases

In PostgreSQL 16, you can create databases using either SQL commands or command-line tools.

Using SQL:


CREATE DATABASE myapp
  WITH 
  OWNER = postgres
  ENCODING = 'UTF8'
  LC_COLLATE = 'en_US.utf8'
  LC_CTYPE = 'en_US.utf8'
  TEMPLATE = template0
  ALLOW_CONNECTIONS = true;

Using command-line:


createdb -U postgres myapp

6.2 Listing databases

To list all databases:

Using SQL:


SELECT datname FROM pg_database;

Using command-line:


psql -U postgres -l

6.3 Connecting to a database

Connect to a specific database:


psql -U postgres -d myapp

6.4 Dropping a database

To delete a database:

Using SQL:

 

DROP DATABASE myapp;

Using command-line:


dropdb -U postgres myapp

6.5 Database configuration

PostgreSQL 16 allows per-database configuration. You can set parameters for specific databases:


ALTER DATABASE myapp SET work_mem = '16MB';

6.6 Database statistics

PostgreSQL 16 provides enhanced statistics views. To view database statistics:


SELECT * FROM pg_stat_database WHERE datname = 'myapp';

6.7 Database sizes

To check the size of your databases:


SELECT pg_size_pretty(pg_database_size('myapp'));

6.8 Managing connections

PostgreSQL 16 improved connection management. To view current connections:


SELECT * FROM pg_stat_activity WHERE datname = 'myapp';

To limit connections for a database:


ALTER DATABASE myapp CONNECTION LIMIT 100;

6.9 Database maintenance

Regular maintenance is crucial. PostgreSQL 16 includes improved VACUUM and ANALYZE commands:


VACUUM (VERBOSE, ANALYZE) myapp;

6.10 Logical replication

PostgreSQL 16 enhances logical replication. To set up a publication:


CREATE PUBLICATION mypub FOR TABLE mytable;

And to create a subscription on another server:


CREATE SUBSCRIPTION mysub
  CONNECTION 'host=192.168.1.50 port=5432 dbname=myapp'
  PUBLICATION mypub;

6.11 Backup and restore

PostgreSQL 16 maintains compatibility with pg_dump and pg_restore for logical backups:

Backup:


pg_dump -U postgres -d myapp > myapp_backup.sql

Restore:


psql -U postgres -d myapp < myapp_backup.sql

6.12 Extensions

PostgreSQL 16 supports a wide range of extensions. To list available extensions:


SELECT * FROM pg_available_extensions;

To install an extension:


CREATE EXTENSION pg_stat_statements;

6.13 Tablespaces

Manage tablespaces for optimized storage:


CREATE TABLESPACE fastspace LOCATION '/ssd/postgresql/data';

To move a database to a tablespace:


ALTER DATABASE myapp SET TABLESPACE fastspace;

6.14 Role-based access control

PostgreSQL 16 enhances security with role-based access control:


CREATE ROLE readonly;
GRANT CONNECT ON DATABASE myapp TO readonly;
GRANT USAGE ON SCHEMA public TO readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;

6.15 Full text search

PostgreSQL 16 improves full-text search capabilities:


CREATE INDEX textsearch_idx ON messages USING GIN (to_tsvector('english', body));

6.16 JSON and JSONB Support

PostgreSQL 16 continues to excel in JSON support:


CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  info JSONB
);

INSERT INTO orders (info) VALUES ('{"customer": "John Doe", "items": [{"product": "Beer", "qty": 6}]}');

6.17 Monitoring and logging

PostgreSQL 16 provides enhanced monitoring capabilities. To enable detailed logging:


ALTER SYSTEM SET log_statement = 'all';
SELECT pg_reload_conf();

This comprehensive overview covers the main aspects of creating and managing databases in PostgreSQL 16. Each of these topics can be explored in much greater depth, depending on your specific needs and use cases.

Tables and schema design

7.1 Creating tables

Basic table creation syntax:


CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    hire_date DATE DEFAULT CURRENT_DATE,
    salary NUMERIC(10, 2),
    department_id INTEGER REFERENCES departments(id)
);

7.2 Data types

PostgreSQL 16 supports a wide range of data types:

  • Numeric: INTEGER, BIGINT, NUMERIC, REAL, DOUBLE PRECISION
  • Character: CHAR, VARCHAR, TEXT
  • Date/Time: DATE, TIME, TIMESTAMP, INTERVAL
  • Boolean: BOOLEAN
  • Geometric: POINT, LINE, CIRCLE, POLYGON
  • Network: INET, CIDR, MACADDR
  • JSON: JSON, JSONB
  • Arrays: Any data type can be an array
  • Custom types: CREATE TYPE

Example using various data types:


CREATE TABLE product_inventory (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price NUMERIC(10, 2),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    is_available BOOLEAN DEFAULT true,
    tags TEXT[],
    properties JSONB
);

7.3 Constraints

PostgreSQL 16 supports various constraints to ensure data integrity:

  • PRIMARY KEY
  • FOREIGN KEY
  • UNIQUE
  • NOT NULL
  • CHECK
  • EXCLUSION

Example:


CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date DATE DEFAULT CURRENT_DATE,
    total_amount NUMERIC(10, 2),
    status VARCHAR(20),
    CONSTRAINT check_positive_amount CHECK (total_amount > 0),
    CONSTRAINT valid_status CHECK (status IN ('Pending', 'Shipped', 'Delivered'))
);

7.4 Indexes

Indexes are crucial for query performance. PostgreSQL 16 supports various index types:

  • B-tree (default)
  • Hash
  • GiST
  • SP-GiST
  • GIN
  • BRIN

Example:


CREATE INDEX idx_employee_last_name ON employees (last_name);
CREATE INDEX idx_product_properties ON product_inventory USING GIN (properties);

7.5 Partitioning

PostgreSQL 16 enhances table partitioning for improved performance with large tables:


CREATE TABLE measurements (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurements_y2023m01 PARTITION OF measurements
    FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');

CREATE TABLE measurements_y2023m02 PARTITION OF measurements
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');

7.6 Inheritance

While partitioning is preferred for most use cases, table inheritance is still supported:


CREATE TABLE cities (
    name            text,
    population      float,
    elevation       int     -- in feet
);

CREATE TABLE capitals (
    state           char(2)
) INHERITS (cities);

7.7 Schemas

Schemas help organize database objects:


CREATE SCHEMA hr;

CREATE TABLE hr.employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

7.8 Views

Views provide a powerful way to encapsulate complex queries:


CREATE VIEW high_salary_employees AS
    SELECT * FROM employees
    WHERE salary > 100000;

7.9 Materialized views

Materialized views store the result of a query for faster access:


CREATE MATERIALIZED VIEW monthly_sales AS
    SELECT 
        date_trunc('month', order_date) AS month,
        SUM(total_amount) AS total_sales
    FROM orders
    GROUP BY 1;

7.10 Table inheritance vs. partitioning

While both allow splitting data across multiple tables, partitioning is generally preferred for performance reasons. Use inheritance for modeling class hierarchies and partitioning for large tables that need to be split based on certain criteria.

7.11 Generated columns

PostgreSQL 16 supports generated columns:


CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    full_name VARCHAR(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

7.12 Temporary tables

Temporary tables are automatically dropped at the end of a session:


CREATE TEMPORARY TABLE temp_results (
    id INT,
    result VARCHAR(100)
);

7.13 Unlogged tables

Unlogged tables are faster but not crash-safe:


CREATE UNLOGGED TABLE fast_imports (
    id SERIAL,
    data TEXT
);

7.14 Table spaces

Tablespaces allow you to define locations on the file system where the files representing database objects can be stored:


CREATE TABLESPACE fastspace LOCATION '/ssd/postgresql/data';
CREATE TABLE fast_access_table (id INT, data TEXT) TABLESPACE fastspace;

7.15 Column storage and compression

PostgreSQL 16 offers TOAST (The Oversized-Attribute Storage Technique) for large field values:


CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    CONSTRAINT content_length CHECK (length(content) < 1000000)
) WITH (fillfactor=80);

7.16 System columns

Every table has several system columns that are implicitly defined by the system:

  • tableoid
  • xmin
  • cmin
  • xmax
  • cmax
  • ctid

You can access these for advanced operations:


SELECT xmin, xmax, * FROM mytable;

This overview covers the main aspects of table and schema design in PostgreSQL 16. Each of these topics can be explored in greater depth depending on your specific requirements.

SQL basics

8.1 SELECT statements

The SELECT statement is the foundation of querying data in PostgreSQL.

Basic SELECT:


SELECT first_name, last_name, salary FROM employees;

Using WHERE clause:


SELECT * FROM employees WHERE department_id = 3;

ORDER BY clause:


SELECT * FROM employees ORDER BY salary DESC, last_name ASC;

LIMIT and OFFSET:


SELECT * FROM employees ORDER BY id LIMIT 10 OFFSET 20;

8.2 JOINs

PostgreSQL supports various types of JOINs.

INNER JOIN:


SELECT e.first_name, e.last_name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;

LEFT JOIN:


SELECT c.name, o.id AS order_id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;

FULL OUTER JOIN:


SELECT e.name AS employee, p.name AS project
FROM employees e
FULL OUTER JOIN projects p ON e.project_id = p.id;

8.3 Aggregations

GROUP BY and aggregate functions:


SELECT department_id, AVG(salary) AS avg_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;

8.4 Subqueries

Subqueries can be used in various parts of a SQL statement:


SELECT first_name, last_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

8.5 Common table expressions (CTEs)

CTEs provide a way to write auxiliary statements for use in a larger query:


WITH high_salary_employees AS (
    SELECT * FROM employees WHERE salary > 100000
)
SELECT department_id, COUNT(*) 
FROM high_salary_employees 
GROUP BY department_id;

8.6 Window functions

Window functions perform calculations across a set of rows:


SELECT 
    first_name, 
    last_name, 
    salary,
    RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;

8.7 INSERT statements

Basic INSERT:


INSERT INTO employees (first_name, last_name, email, salary)
VALUES ('John', 'Doe', '[email protected]', 75000);

Multi-row INSERT:


INSERT INTO employees (first_name, last_name, email, salary)
VALUES 
    ('Jane', 'Smith', '[email protected]', 80000),
    ('Bob', 'Johnson', '[email protected]', 72000);

INSERT from SELECT:


INSERT INTO high_salary_employees
SELECT * FROM employees WHERE salary > 100000;

8.8 UPDATE statements

Basic UPDATE:


UPDATE employees
SET salary = salary * 1.1
WHERE department_id = 3;

UPDATE with JOIN:


UPDATE employees e
SET salary = salary * 1.15
FROM departments d
WHERE e.department_id = d.id AND d.name = 'Sales';

8.9 DELETE statements

Basic DELETE:


DELETE FROM employees WHERE last_login < '2022-01-01';

DELETE with subquery:


DELETE FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE last_purchase < '2022-01-01');

8.10 UPSERT (INSERT ON CONFLICT)

PostgreSQL provides an elegant way to handle insert-or-update scenarios:


INSERT INTO products (code, name, price)
VALUES ('A001', 'Widget', 19.99)
ON CONFLICT (code) 
DO UPDATE SET price = EXCLUDED.price;

8.11 RETURNING clause

The RETURNING clause allows you to retrieve data from rows affected by an INSERT, UPDATE, or DELETE:


UPDATE employees
SET salary = salary * 1.1
WHERE department_id = 3
RETURNING id, first_name, last_name, salary AS new_salary;

8.12 Lateral Joins

LATERAL joins allow subqueries in the FROM clause to reference columns from preceding items:


SELECT e.name, e.department_id, latest_orders.order_id, latest_orders.order_date
FROM employees e,
LATERAL (
    SELECT order_id, order_date 
    FROM orders 
    WHERE orders.employee_id = e.id 
    ORDER BY order_date DESC 
    LIMIT 3
) AS latest_orders;

8.13 Full Text Search

PostgreSQL offers powerful full-text search capabilities:


CREATE INDEX idx_fts_title ON books USING GIN (to_tsvector('english', title));

SELECT title
FROM books
WHERE to_tsvector('english', title) @@ to_tsquery('english', 'postgresql & database');

8.14 JSON operations

PostgreSQL has robust support for JSON data:


SELECT info -> 'customer' ->> 'name' AS customer_name
FROM orders
WHERE (info -> 'items' ->> 'product')::jsonb ? 'Laptop';

8.15 Arrays

Working with array columns:


SELECT * FROM employees
WHERE skills && ARRAY['Python', 'SQL'];

UPDATE employees
SET skills = array_append(skills, 'JavaScript')
WHERE id = 1;

8.16 Regular expressions

PostgreSQL supports POSIX regular expressions:


SELECT * FROM employees
WHERE email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$';

This overview covers the fundamental SQL operations in PostgreSQL, along with some of its more advanced features. Each of these topics can be explored in much greater depth, and there are many more advanced features and optimizations available in PostgreSQL.

Advanced queries and JOINs

9.1 Advanced JOIN Techniques

9.1.1 Self-Joins

Useful for hierarchical or comparative data within a table:


SELECT e1.name AS employee, e2.name AS manager
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.id;

9.1.2 Cross joins

Generates a Cartesian product of two tables:


SELECT products.name, colors.name
FROM products
CROSS JOIN colors;

9.1.3 Natural joins

Automatically join tables based on columns with the same name:


SELECT * FROM employees NATURAL JOIN departments;
💡
Use natural joins cautiously as they can lead to unexpected results

9.2 Set operations

9.2.1 UNION and UNION ALL

Combines result sets of two or more SELECT statements:


(SELECT name FROM employees)
UNION
(SELECT name FROM customers);

-- UNION ALL doesn't remove duplicates
(SELECT product_id FROM orders_2022)
UNION ALL
(SELECT product_id FROM orders_2023);

9.2.2 INTERSECT

Returns rows that appear in both result sets:


(SELECT customer_id FROM orders_2022)
INTERSECT
(SELECT customer_id FROM orders_2023);

9.2.3 EXCEPT

Returns rows from the first query that don’t appear in the second query:


(SELECT customer_id FROM all_customers)
EXCEPT
(SELECT customer_id FROM orders);

9.3 Advanced subqueries

9.3.1 Correlated Subqueries

Subqueries that depend on the outer query:


SELECT e.name, e.salary
FROM employees e
WHERE salary > (
    SELECT AVG(salary) 
    FROM employees 
    WHERE department_id = e.department_id
);

9.3.2 Subqueries in SELECT

Using subqueries to compute values in SELECT:


SELECT 
    name,
    salary,
    (SELECT AVG(salary) FROM employees) as company_avg,
    salary - (SELECT AVG(salary) FROM employees) as diff_from_avg
FROM employees;

9.3.3 Subqueries with EXISTS

Checks for the existence of rows that satisfy a subquery:


SELECT name
FROM departments d
WHERE EXISTS (
    SELECT 1 
    FROM employees e 
    WHERE e.department_id = d.id AND salary > 100000
);

9.4 Advanced Aggregations

9.4.1 GROUPING SETS

Allows specification of multiple groupings in a single query:


SELECT 
    COALESCE(department, 'All Departments') as department,
    COALESCE(job_title, 'All Jobs') as job,
    AVG(salary)
FROM employees
GROUP BY GROUPING SETS ((department, job_title), (department), (job_title), ());

9.4.2 ROLLUP

Produces a result set including all subtotals and a grand total:


SELECT 
    COALESCE(department, 'Total') as department,
    COALESCE(job_title, 'Subtotal') as job,
    SUM(salary)
FROM employees
GROUP BY ROLLUP (department, job_title);

9.4.3 CUBE

Generates all possible combinations of the specified dimensions:


SELECT 
    COALESCE(department, 'All') as department,
    COALESCE(job_title, 'All') as job,
    SUM(salary)
FROM employees
GROUP BY CUBE (department, job_title);

9.5 Window functions

9.5.1 Ranking functions


SELECT 
    name, 
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_salary_rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) as overall_salary_rank
FROM employees;

9.5.2 Offset functions


SELECT 
    name,
    department,
    hire_date,
    salary,
    LAG(salary) OVER (PARTITION BY department ORDER BY hire_date) as prev_salary,
    LEAD(salary) OVER (PARTITION BY department ORDER BY hire_date) as next_salary
FROM employees;

9.5.3 Aggregate window functions


SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
    salary - AVG(salary) OVER (PARTITION BY department) as diff_from_dept_avg
FROM employees;

9.6 Recursive queries

Useful for hierarchical or tree-structured data:


WITH RECURSIVE subordinates AS (
    SELECT id, name, manager_id
    FROM employees
    WHERE name = 'Jane Doe'
    UNION ALL
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    INNER JOIN subordinates s ON s.id = e.manager_id
)
SELECT * FROM subordinates;

9.7 LATERAL Joins

Allows subqueries in the FROM clause to reference columns from preceding items:


SELECT 
    d.name AS department,
    e.name AS top_employee,
    e.salary
FROM 
    departments d,
    LATERAL (
        SELECT name, salary 
        FROM employees 
        WHERE department_id = d.id 
        ORDER BY salary DESC 
        LIMIT 1
    ) e;

9.8 Full Text Search with advanced options


CREATE INDEX idx_fts_document ON documents USING GIN (to_tsvector('english', content));

SELECT title, ts_rank(to_tsvector('english', content), query) AS rank
FROM documents, to_tsquery('english', 'postgresql & database') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;

9.9 Complex JSON queries


SELECT 
    id,
    info -> 'customer' ->> 'name' AS customer_name,
    jsonb_array_elements(info -> 'items') ->> 'product' AS product,
    (jsonb_array_elements(info -> 'items') ->> 'price')::numeric AS price
FROM orders
WHERE (info -> 'total')::numeric > 1000;

These advanced querying techniques showcase the power and flexibility of PostgreSQL 16. They allow for complex data analysis, hierarchical data querying, and sophisticated data manipulations. Each of these concepts can be explored further and combined in various ways to solve complex data problems.

Indexing and performance optimization

10.1 Types of indexes

PostgreSQL 16 supports several types of indexes, each suited for different scenarios:

10.1.1 B-tree Indexes

The default and most common index type, suitable for equality and range queries:


CREATE INDEX idx_employee_last_name ON employees (last_name);

10.1.2 Hash indexes

Optimized for equality comparisons:


CREATE INDEX idx_product_code ON products USING HASH (product_code);

10.1.3 GiST (Generalized Search Tree) indexes

Useful for full-text search, geometric data, and custom data types:


CREATE INDEX idx_geo_location ON stores USING GIST (location);

10.1.4 SP-GiST (Space-Partitioned GiST) indexes

Good for clustered data and tree-like structures:


CREATE INDEX idx_ip_address ON network_logs USING SPGIST (ip_address inet_ops);

10.1.5 GIN (Generalized Inverted Index) indexes

Efficient for multi-value columns like arrays and full-text search:


CREATE INDEX idx_tags ON products USING GIN (tags);

10.1.6 BRIN (Block Range INdex) indexes

Suitable for very large tables with natural clustering:


CREATE INDEX idx_timestamp ON large_table USING BRIN (timestamp);

10.2 Composite indexes

Indexes on multiple columns can be more efficient for queries that filter on these columns together:


CREATE INDEX idx_employee_dept_hire_date ON employees (department_id, hire_date);

10.3 Partial indexes

Indexes on a subset of rows can be smaller and more efficient:


CREATE INDEX idx_active_users ON users (last_login) WHERE active = true;

10.4 Expression indexes

Indexes can be created on expressions for queries that frequently use these expressions:


CREATE INDEX idx_lower_email ON users (lower(email));

10.5 Unique indexes

Enforce uniqueness while providing fast lookups:


CREATE UNIQUE INDEX idx_unique_email ON users (email);

10.6 Index-only scans

Design indexes to cover queries for optimal performance:


CREATE INDEX idx_employee_coverage ON employees (id, first_name, last_name, email);

10.7 Analyzing query performance

10.7.1 EXPLAIN View the query plan:


EXPLAIN SELECT * FROM employees WHERE department_id = 5;

10.7.2 EXPLAIN ANALYZE

Execute the query and show actual timings:


EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 100000;

10.8 Query optimization techniques

10.8.1 Rewriting queries

Sometimes, rewriting a query can lead to better performance:


-- Instead of:
SELECT * FROM orders WHERE extract(year from order_date) = 2023;

-- Use:
SELECT * FROM orders WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';

10.8.2 Proper JOIN order

Ensure smaller datasets are processed first:


SELECT * FROM small_table s
JOIN large_table l ON s.id = l.small_id;

10.8.3 Avoid functions on indexed columns

Functions on indexed columns can prevent index usage:


-- Instead of:
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';

-- Use:
SELECT * FROM employees WHERE last_name = 'Smith';

10.9 Table partitioning

Divide large tables into smaller, more manageable chunks:


CREATE TABLE measurements (
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurements_y2023m01 PARTITION OF measurements
    FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');

CREATE TABLE measurements_y2023m02 PARTITION OF measurements
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');

10.10 Parallel query execution

PostgreSQL 16 can utilize multiple CPU cores for query execution:


SET max_parallel_workers_per_gather = 4;

10.11 Materialized views

Pre-compute and store complex query results:


CREATE MATERIALIZED VIEW mv_monthly_sales AS
SELECT 
    date_trunc('month', order_date) AS month,
    SUM(total_amount) AS total_sales
FROM orders
GROUP BY 1;

REFRESH MATERIALIZED VIEW mv_monthly_sales;

10.12 Vacuuming and Analyzing

Regular maintenance is crucial for performance:


VACUUM ANALYZE;

10.13 JIT (Just-In-Time) Compilation

PostgreSQL can compile parts of query execution plans to machine code:


SET jit = on;

10.14 Optimizing configuration

Adjust PostgreSQL configuration for your hardware and workload:


ALTER SYSTEM SET shared_buffers = '1GB';
ALTER SYSTEM SET effective_cache_size = '3GB';
ALTER SYSTEM SET work_mem = '16MB';

10.15 Monitoring and logging

Enable logging to identify slow queries:


ALTER SYSTEM SET log_min_duration_statement = '1000';  -- Log queries taking more than 1 second
SELECT pg_reload_conf();

10.16 Using pg_stat_statements

Enable and use pg_stat_statements for query performance analysis:


CREATE EXTENSION pg_stat_statements;

SELECT query, calls, total_exec_time, rows, 100.0 * shared_blks_hit /
    nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

These indexing and performance optimization techniques form the foundation for maintaining a fast and efficient PostgreSQL 16 database. Remember that optimization is an iterative process, and what works best can depend on your specific data, queries, and hardware.

Transactions and ACID properties

11.1 Understanding ACID properties

PostgreSQL fully supports ACID properties:

  • Atomicity: All operations in a transaction succeed or all fail.
  • Consistency: The database remains in a consistent state before and after the transaction.
  • Isolation: Concurrent transactions do not interfere with each other.
  • Durability: Once a transaction is committed, it remains so even in the event of system failure.

11.2 Basic transaction control

11.2.1 Starting a transaction


BEGIN;
-- or
START TRANSACTION;

11.2.2 Committing a transaction


COMMIT;

11.2.3 Rolling back a transaction


ROLLBACK;

11.2.4 Example of a transaction


BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

11.3 Savepoints

Savepoints allow you to roll back parts of a transaction:


BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Oops, wrong account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100 WHERE id = 3;
COMMIT;

11.4 Transaction isolation levels

PostgreSQL 16 supports all four standard isolation levels:

11.4.1 Read uncommitted


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

11.4.2 Read committed (default in PostgreSQL)


SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

11.4.3 Repeatable read


SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

11.4.4 Serializable


SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

11.5 Understanding isolation levels

Each isolation level prevents certain phenomena:

  • Dirty read: Reading data from an uncommitted transaction.
  • Non-repeatable read: Getting different results when reading the same data twice in the same transaction.
  • Phantom read: A transaction re-executing a query gets a different set of rows than the first execution.

PostgreSQL’s implementation prevents dirty reads in all isolation levels.

11.6 Explicit locking

11.6.1 Table-Level locks


LOCK TABLE accounts IN EXCLUSIVE MODE;

11.6.2 Row-Level locks


SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

11.7 Deadlock detection

PostgreSQL automatically detects and resolves deadlocks:


SET deadlock_timeout = '1s';

11.8 Advisory locks

Application-defined locks that don’t lock any database objects:


SELECT pg_advisory_lock(id) FROM accounts WHERE id = 1;
-- Do some work
SELECT pg_advisory_unlock(id) FROM accounts WHERE id = 1;

11.9 Transaction management in applications

When using PostgreSQL with an application, consider:

11.9.1 Connection pooling

Use tools like PgBouncer to manage database connections efficiently.

11.9.2 Error handling

Ensure proper error handling and transaction management in your application code.

11.10 Monitoring transactions

11.10.1 Active transactions


SELECT * FROM pg_stat_activity WHERE state = 'active';

11.10.2 Long-running transactions


SELECT pid, now() - xact_start AS xact_runtime, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;

11.11 Two-phase commit (2PC)

For distributed transactions across multiple databases:


BEGIN;
-- Do some work
PREPARE TRANSACTION 'my_transaction';

-- On another system or later time
COMMIT PREPARED 'my_transaction';
-- or
ROLLBACK PREPARED 'my_transaction';

11.12 Transaction logging

PostgreSQL uses Write-Ahead Logging (WAL) to ensure durability:


SHOW wal_level;

11.13 Handling concurrent updates

Use conditional updates to handle concurrent modifications:


UPDATE accounts 
SET balance = balance - 100 
WHERE id = 1 AND balance >= 100;

11.14 Read-only transactions

For queries that don’t modify data:


BEGIN READ ONLY;
SELECT * FROM accounts;
COMMIT;

11.15 Deferrable constraints

Constraints can be checked at the end of a transaction:


CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
);

BEGIN;
INSERT INTO orders (id, customer_id) VALUES (1, 100);
INSERT INTO customers (id, name) VALUES (100, 'New Customer');
COMMIT;

11.16 Transaction timeout

Set a timeout for long-running transactions:


SET statement_timeout = '5min';

Understanding and properly implementing transactions and ACID properties is crucial for maintaining data integrity in PostgreSQL 16. These concepts ensure that your database remains consistent even in the face of concurrent access and system failures.

Concurrency control

12.1 Multi-Version concurrency Control (MVCC)

PostgreSQL uses MVCC to handle concurrent access to the database:

  • Each transaction sees a snapshot of data as it was at the start of the transaction.
  • Updates create new versions of rows, rather than overwriting old data.
  • This allows readers and writers to operate without blocking each other in most cases.

12.2 Transaction isolation levels revisited

Understanding how isolation levels affect concurrency:

12.2.1 Read committed

  • Provides a trade-off between consistency and concurrency.
  • Each SQL statement sees only committed data as of the start of that statement.

12.2.2 Repeatable read

  • Ensures that a transaction sees a consistent snapshot of the database throughout its lifetime.
  • Prevents non-repeatable reads and phantom reads.

12.2.3 Serializable

  • Provides the strictest isolation, simulating serial transaction execution.
  • May lead to serialization failures that require transaction retries.

12.3 Explicit locking mechanisms

12.3.1 Row-level locks


SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

12.3.2 Table-level locks


LOCK TABLE accounts IN EXCLUSIVE MODE;

12.3.3 Advisory locks


SELECT pg_advisory_lock(id) FROM accounts WHERE id = 1;

12.4 Handling concurrent inserts

12.4.1 UNIQUE constraints

Enforce uniqueness even with concurrent inserts:


CREATE UNIQUE INDEX idx_unique_email ON users (email);

12.4.2 INSERT … ON CONFLICT

Handle potential conflicts during insert:


INSERT INTO users (email, name)
VALUES ('[email protected]', 'John Doe')
ON CONFLICT (email) DO UPDATE
SET name = EXCLUDED.name;

12.5 Optimistic locking

Implement optimistic locking using a version column:


UPDATE products
SET stock = stock - 1, version = version + 1
WHERE id = 123 AND version = 1;

12.6 Pessimistic locking

Use FOR UPDATE to lock rows pessimistically:


BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Process the account
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

12.7 Deadlock detection and prevention

12.7.1 Automatic deadlock detection

PostgreSQL automatically detects and resolves deadlocks by aborting one of the transactions.

12.7.2 Deadlock prevention

  • Acquire locks in a consistent order across transactions.
  • Use timeouts to prevent indefinite waiting:

SET lock_timeout = '10s';

12.8 SELECT FOR SHARE

Allow concurrent reads while preventing updates:


SELECT * FROM products WHERE id = 1 FOR SHARE;

12.9 NOWAIT and SKIP LOCKED

Handle concurrent access without waiting:


SELECT * FROM tasks WHERE status = 'pending'
FOR UPDATE SKIP LOCKED
LIMIT 1;

12.10 Handling hot spots

For frequently updated rows, consider techniques like:

  • Splitting hot rows into multiple rows
  • Using advisory locks instead of row locks
  • Implementing application-level queues

12.11 Concurrent index creation

Create indexes without blocking writes:


CREATE INDEX CONCURRENTLY idx_user_email ON users (email);

12.12 Monitoring concurrency

12.12.1 Checking for locks


SELECT * FROM pg_locks WHERE NOT granted;

12.12.2 Identifying blocking queries


SELECT blocked_locks.pid AS blocked_pid,
       blocking_locks.pid AS blocking_pid,
       blocked_activity.query AS blocked_statement,
       blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

12.13 Handling long-running queries

Use statement_timeout to prevent queries from running too long:


SET statement_timeout = '30s';

12.14 Connection pooling

Implement connection pooling (e.g., using PgBouncer) to manage database connections efficiently and reduce connection overhead.

12.15 Partition-level locks

In partitioned tables, locks can be acquired on individual partitions:


SELECT * FROM measurements_y2023m01 FOR UPDATE;

12.16 Conflict resolution in logical replication

Handle conflicts in logical replication setups:


CREATE OR REPLACE FUNCTION public.resolve_conflict_on_unique_constraint()
RETURNS TABLE (resolution text)
LANGUAGE SQL
AS $$
  -- Custom conflict resolution logic
$$;

ALTER PUBLICATION my_publication
ADD TABLE public.users (PUBLISH_VIA_ROOT);

ALTER SUBSCRIPTION my_subscription
SET (publication_name = 'my_publication',
     conflict_resolution = 'resolve_conflict_on_unique_constraint');

12.17 READ ONLY transactions

Use read-only transactions for queries that don’t need to modify data:


BEGIN READ ONLY;
SELECT * FROM accounts;
COMMIT;

Effective concurrency control is vital for maintaining data integrity and optimizing performance in multi-user PostgreSQL 16 environments. It requires a good understanding of PostgreSQL’s MVCC model, proper use of transactions and isolation levels, and careful application design.

Backup and recovery

13.1 Types of backups

13.1.1 Logical backups

  • Contain SQL statements to recreate the database
  • Flexible but slower to restore for large databases

13.1.2 Physical backups

  • Exact copy of the data files
  • Faster to restore but less flexible

13.1.3 Continuous archiving

  • Allows point-in-time recovery (PITR)
  • Combines full backups with Write-Ahead Log (WAL) archiving

13.2 Logical Backups with pg_dump

13.2.1 Full database backup


pg_dump dbname > backup.sql

13.2.2 Specific table backup


pg_dump -t tablename dbname > table_backup.sql

13.2.3 Custom format backup (compressed)


pg_dump -Fc dbname > backup.dump

13.3 Logical backups with pg_dumpall

For backing up an entire PostgreSQL cluster, including roles and tablespaces:


pg_dumpall > full_backup.sql

13.4 Restoring logical backups

13.4.1 Plain Text Format


psql dbname < backup.sql

13.4.2 Custom format


pg_restore -d dbname backup.dump

13.5 Physical backups

13.5.1 File system level backup


tar -cvf backup.tar /var/lib/postgresql/data

13.5.2 Using pg_basebackup


pg_basebackup -D /backup/path -Ft -z -P

13.6 Continuous archiving and Point-in-Time recovery (PITR)

13.6.1 Configuring WAL archiving

In postgresql.conf:


wal_level = replica
archive_mode = on
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

13.6.2 Taking a base backup


pg_basebackup -D /backup/base -Ft -z -P

13.6.3 Performing PITR

Create a recovery.conf (or recovery.signal in PostgreSQL 12+) file in the data directory:


restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = '2023-07-17 15:00:00'

13.7 Backup validation

Always validate your backups to ensure they can be restored:


pg_restore --list backup.dump

13.8 Incremental backups

While PostgreSQL doesn’t natively support incremental backups, you can achieve similar results using WAL archiving and PITR.

13.9 Logical replication for backup

Use logical replication to maintain a standby server that can serve as a backup:


CREATE PUBLICATION my_publication FOR ALL TABLES;

-- On the standby server:
CREATE SUBSCRIPTION my_subscription 
CONNECTION 'host=primary dbname=mydb' 
PUBLICATION my_publication;

13.10 Backup encryption

Encrypt your backups to protect sensitive data:


pg_dump dbname | gpg -c > encrypted_backup.sql.gpg

13.11 Automated backup scripts

Create scripts to automate your backup process. Here’s a simple example:


#!/bin/bash
BACKUP_DIR="/path/to/backups"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
DB_NAME="your_database"

pg_dump -Fc $DB_NAME > $BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump

# Keep only the last 7 days of backups
find $BACKUP_DIR -type f -mtime +7 -name '*.dump' -delete

13.12 Monitoring backup process

Use pg_stat_archiver to monitor WAL archiving:


SELECT * FROM pg_stat_archiver;

13.13 Testing recovery procedures

Regularly test your recovery procedures to ensure they work as expected:

  1. Set up a test environment
  2. Restore your backup
  3. Apply WAL files (for PITR)
  4. Verify data integrity

13.14 Backing up large databases

For very large databases, consider:

  • Using parallel dumps: pg_dump -j 4 dbname > backup.sql
  • Splitting backups into multiple files
  • Using compression: pg_dump -Fc dbname > backup.dump

13.15 Handling tablespaces in backups

When using tablespaces, ensure your backup strategy includes all tablespace locations:


pg_basebackup -D /backup/base -Ft -z -P -T /path/to/tablespace=/backup/tablespaces

13.16 Backup retention policy

Implement a backup retention policy based on your business needs:

  • Keep daily backups for the last week
  • Keep weekly backups for the last month
  • Keep monthly backups for the last year

13.17 Cloud backup solutions

Consider using cloud storage solutions for off-site backup storage:

  • AWS S3
  • Google Cloud Storage
  • Azure Blob Storage

Implement scripts to automatically upload backups to cloud storage.

13.18 Backup metadata management

Keep track of your backups:


CREATE TABLE backup_log (
    id SERIAL PRIMARY KEY,
    backup_time TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    backup_type VARCHAR(50),
    file_name VARCHAR(255),
    file_size BIGINT,
    status VARCHAR(20)
);

INSERT INTO backup_log (backup_type, file_name, file_size, status)
VALUES ('FULL', 'backup_20230717.dump', 1234567890, 'SUCCESS');

Proper backup and recovery strategies are crucial for protecting your data and ensuring business continuity. Regular testing of backup and recovery procedures is essential to guarantee their effectiveness when needed.

PostgreSQL extensions

14.1 Understanding extensions

Extensions in PostgreSQL are packages of SQL objects that add functionality to the database. They can include new data types, functions, operators, index types, and more.

14.2 Listing available extensions

To see what extensions are available in your PostgreSQL installation:


SELECT * FROM pg_available_extensions;

14.3 Installing extensions

Most extensions can be installed with a simple SQL command:


CREATE EXTENSION extension_name;

14.4 Popular extensions

14.4.1 PostGIS

Adds support for geographic objects:


CREATE EXTENSION postgis;

-- Example usage
CREATE TABLE cities (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    location GEOMETRY(Point, 4326)
);

INSERT INTO cities (name, location)
VALUES ('Sofia', ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326));

14.4.2 pg_stat_statements

Provides execution statistics for all SQL statements:


CREATE EXTENSION pg_stat_statements;

-- Query to see top time-consuming queries
SELECT query, calls, total_exec_time, rows, 100.0 * shared_blks_hit /
       nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

14.4.3 pgcrypto

Provides cryptographic functions:


CREATE EXTENSION pgcrypto;

-- Example: Hashing a password
SELECT crypt('my_password', gen_salt('bf'));

14.4.4 uuid-ossp

Generates universally unique identifiers (UUIDs):


CREATE EXTENSION "uuid-ossp";

-- Generate a UUID
SELECT uuid_generate_v4();

14.4.5 hstore

Provides a key-value pair storage data type:


CREATE EXTENSION hstore;

CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    attributes hstore
);

INSERT INTO books (attributes)
VALUES ('author => "Jane Austen", title => "Pride and Prejudice", year => "1813"');

SELECT attributes -> 'author' AS author FROM books;

14.4.6 pg_trgm

Provides functions and operators for text similarity matching:


CREATE EXTENSION pg_trgm;

-- Create a GIN index for faster similarity searches
CREATE INDEX trgm_idx_products_name ON products USING GIN (name gin_trgm_ops);

-- Find similar product names
SELECT name, similarity(name, 'iPhone') AS sim
FROM products
WHERE name % 'iPhone'
ORDER BY sim DESC, name;

14.5 Custom extensions

You can also create your own extensions:


-- In a file named my_extension--1.0.sql
CREATE FUNCTION my_function() RETURNS TEXT AS $$
BEGIN
    RETURN 'Hello from my extension!';
END;
$$ LANGUAGE plpgsql;

-- In psql
CREATE EXTENSION my_extension;

14.6 Updating extensions

To update an extension to the latest version:


ALTER EXTENSION extension_name UPDATE;

14.7 Removing extensions

To remove an extension:


DROP EXTENSION extension_name;

14.8 Extension dependencies

Some extensions depend on others. PostgreSQL handles these dependencies automatically.

14.9 Popular third-party extensions

14.9.1 TimescaleDB

Optimizes PostgreSQL for time-series data:


CREATE EXTENSION timescaledb;

CREATE TABLE sensors (
    time        TIMESTAMPTZ         NOT NULL,
    sensor_id   INTEGER             NOT NULL,
    temperature DOUBLE PRECISION    NULL,
    humidity    DOUBLE PRECISION    NULL
);

SELECT create_hypertable('sensors', 'time');

14.9.2 PL/Python

Allows writing functions in Python:


CREATE EXTENSION plpython3u;

CREATE FUNCTION pymax (a integer, b integer)
  RETURNS integer
AS $$
  if a > b:
    return a
  return b
$$ LANGUAGE plpython3u;

14.9.3 pg_cron

Allows scheduling PostgreSQL commands:

CREATE EXTENSION pg_cron;

— Schedule a job to run every day at midnight
SELECT cron.schedule(‘0 0 * * *’, $$VACUUM ANALYZE$$);

14.10 Performance considerations

While extensions can add powerful functionality, they may also impact performance. Always test thoroughly before deploying to production.

14.11 Security considerations

Some extensions may introduce security risks if not properly configured. Always review the security implications before installing an extension.

14.12 Extension management in applications

When developing applications, consider using extensions to simplify your database schema and improve functionality. However, be aware of the deployment implications when using extensions.

14.13 Exploring extension internals

You can explore how extensions work internally:


-- View extension contents
SELECT * FROM pg_extension WHERE extname = 'postgis';

-- View objects that belong to an extension
SELECT * FROM pg_depend
WHERE refclassid = 'pg_extension'::regclass
AND refobjid = (SELECT oid FROM pg_extension WHERE extname = 'postgis');

Extensions greatly enhance PostgreSQL’s capabilities, allowing you to tailor your database to specific needs without modifying the core PostgreSQL code. They’re a key feature that contributes to PostgreSQL’s flexibility and power.

Security and user management

15.1 User management

15.1.1 Creating users (Roles)


CREATE ROLE john LOGIN PASSWORD 'secure_password';
-- or
CREATE USER john WITH PASSWORD 'secure_password';

15.1.2 Altering users


ALTER ROLE john WITH PASSWORD 'new_secure_password';

15.1.3 Dropping users


DROP ROLE john;

15.2 Role attributes

15.2.1 Superuser


CREATE ROLE admin SUPERUSER LOGIN PASSWORD 'very_secure_password';

15.2.2 Create role


CREATE ROLE manager CREATEROLE LOGIN PASSWORD 'manager_password';

15.2.3 Create database


CREATE ROLE dba CREATEDB LOGIN PASSWORD 'dba_password';

15.3 Role membership

15.3.1 Creating group roles


CREATE ROLE developers;
GRANT developers TO john, jane;

15.3.2 Setting role inheritance


CREATE ROLE analyst INHERIT;
GRANT developers TO analyst;

15.4 Schema management

15.4.1 Creating schemas


CREATE SCHEMA hr;

15.4.2 Setting search path


SET search_path TO hr, public;

15.5 Privileges

15.5.1 Granting privileges


GRANT SELECT, INSERT ON TABLE employees TO hr_staff;
GRANT USAGE ON SCHEMA hr TO hr_staff;

15.5.2 Revoking privileges


REVOKE INSERT ON TABLE employees FROM hr_staff;

15.5.3 Default privileges


ALTER DEFAULT PRIVILEGES IN SCHEMA hr
GRANT SELECT ON TABLES TO read_only;

15.6 Row-level security (RLS)

15.6.1 Enabling RLS


ALTER TABLE employees ENABLE ROW LEVEL SECURITY;

15.6.2 Creating policies


CREATE POLICY employee_isolation ON employees
    USING (department = current_user);

15.7 Column-level security

Use views and grants to restrict column access:


CREATE VIEW public_employee_info AS
    SELECT id, name, department FROM employees;
GRANT SELECT ON public_employee_info TO PUBLIC;

15.8 Data encryption

15.8.1 Password encryption


CREATE EXTENSION pgcrypto;
UPDATE users SET password = crypt('rawpassword', gen_salt('bf'));

15.8.2 Data-at-Rest encryption

Use file system-level encryption or tools like dm-crypt.

15.9 SSL configuration

15.9.1 Enabling SSL

In postgresql.conf:


ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'

15.9.2 Requiring SSL connections

In pg_hba.conf:


hostssl all all all md5

15.10 Authentication methods

15.10.1 Password authentication


host    all    all    0.0.0.0/0    md5

15.10.2 SCRAM authentication (more secure)


host    all    all    0.0.0.0/0    scram-sha-256

15.11 Connection limits

15.11.1 Per-role connection limits


ALTER ROLE worker CONNECTION LIMIT 5;

15.11.2 Per-database connection limits


ALTER DATABASE myapp CONNECTION LIMIT 100;

15.12 Auditing

15.12.1 Session logging

In postgresql.conf:


log_connections = on
log_disconnections = on

15.12.2 Statement logging


log_statement = 'mod'  -- Logs all data-modifying statements

15.13 Object ownership

Changing object ownership:


ALTER TABLE employees OWNER TO hr_manager;

15.14 Security best practices

  1. Use least privilege principle
  2. Regularly audit user access and permissions
  3. Use strong, unique passwords for each role
  4. Implement proper network security (firewalls, VPNs)
  5. Keep PostgreSQL updated with the latest security patches

15.15 Managing pg_hba.conf

The pg_hba.conf file controls client authentication:


# TYPE  DATABASE    USER    ADDRESS       METHOD
local   all         all                    peer
host    all         all     127.0.0.1/32   md5
host    all         all     ::1/128        md5

15.16 Password policies

Implement password policies using PL/pgSQL functions and triggers.

15.17 Monitoring and logging

15.17.1 Failed authentication attempts


SELECT * FROM pg_stat_activity WHERE backend_type = 'client backend' AND state = 'active' AND wait_event_type = 'Client';

15.17.2 Logging suspicious activity

Set up log parsing and alerting for suspicious patterns in PostgreSQL logs.

15.18 Database firewall

Consider using an additional database firewall like pgBouncer or Pgpool-II for enhanced security.

15.19 Regular security audits

Perform regular security audits of your PostgreSQL installation, including:

  • User and role review
  • Permission checks
  • Configuration file review
  • Network security assessment

15.20 Secure backups

Ensure that your backups are encrypted and stored securely, with restricted access.

Security and user management in PostgreSQL 16 involve multiple layers of protection, from network-level security to fine-grained access controls within the database. Properly implementing these measures is crucial for protecting your data and ensuring compliance with security standards.

PostgreSQL vs other databases

16.1 PostgreSQL vs MySQL

16.1.1 Key differences

  • PostgreSQL is more feature-rich and adheres more closely to SQL standards.
  • MySQL is often considered easier to set up and manage for simpler applications.

16.1.2 Performance

  • PostgreSQL generally performs better for complex queries and read-heavy workloads.
  • MySQL traditionally performs better for simple, high-volume CRUD operations.

16.1.3 Data types

  • PostgreSQL offers a wider range of built-in data types, including advanced types like JSONB, ARRAY, and user-defined types.

16.1.4 Replication

  • Both offer robust replication solutions, but PostgreSQL’s logical replication is more flexible.

16.2 PostgreSQL vs Oracle

16.2.1 Licensing

  • PostgreSQL is open-source and free; Oracle is proprietary with significant licensing costs.

16.2.2 Features

  • Both are feature-rich, but Oracle has some enterprise features not present in PostgreSQL.
  • PostgreSQL often implements new features faster due to its open-source nature.

16.2.3 Scalability

  • Oracle traditionally has had an edge in very large database (VLDB) environments, but PostgreSQL has significantly closed this gap.

16.2.4 Extensibility

  • PostgreSQL is highly extensible with its extension ecosystem.

16.3 PostgreSQL vs SQL server

16.3.1 Platform support

  • PostgreSQL runs on virtually all platforms; SQL Server primarily targets Windows (though Linux support has improved).

16.3.2 T-SQL vs PL/pgSQL

  • SQL Server uses T-SQL, while PostgreSQL uses PL/pgSQL. Both are powerful, but there are syntax and feature differences.

16.3.3 JSON support

  • PostgreSQL’s JSONB type often outperforms SQL Server’s JSON capabilities.

16.3.4 Costs

  • PostgreSQL is free and open-source; SQL Server has various editions with different costs.

16.4 PostgreSQL vs MongoDB

16.4.1 Data model

  • PostgreSQL is a relational database; MongoDB is a document-oriented NoSQL database.
  • PostgreSQL offers strong JSONB support, bridging some gaps with document stores.

16.4.2 Schema flexibility

  • MongoDB offers more schema flexibility out of the box.
  • PostgreSQL requires more upfront schema design but offers JSONB for schema-less data.

16.4.3 Transactions and ACID compliance

  • PostgreSQL offers full ACID compliance.
  • MongoDB has improved in this area but traditionally lagged in multi-document transactions.

16.5 PostgreSQL unique features

16.5.1 Extensibility

  • PostgreSQL’s extension system is unparalleled, allowing for significant customization without forking the core code.

16.5.2 Advanced data types

  • Support for arrays, hstore, and full-text search capabilities are built-in.

16.5.3 Indexing options

  • PostgreSQL offers a wide variety of index types (B-tree, Hash, GiST, SP-GiST, GIN, and BRIN).

16.5.4 Full Text Search

  • Built-in full-text search capabilities that rival dedicated search engines for many use cases.

16.6 When to choose PostgreSQL

16.6.1 Complex queries

  • If your application requires complex queries or data analysis, PostgreSQL’s query planner is excellent.

16.6.2 Data integrity

  • When strong data consistency and ACID compliance are crucial.

16.6.3 Extensibility needs

  • If you need to extend database functionality with custom types or functions.

16.6.4 Geospatial data

  • PostgreSQL with PostGIS is a powerful solution for geospatial applications.

16.7 Performance comparisons

While specific performance can vary based on workload and configuration:

  • PostgreSQL often excels in read-heavy workloads and complex queries.
  • For simple CRUD operations, PostgreSQL performance is competitive but may not always lead.
  • PostgreSQL’s MVCC implementation provides excellent performance for concurrent transactions.

16.8 Community and ecosystem

  • PostgreSQL has a strong, active open-source community.
  • Regular releases with new features and improvements.
  • Rich ecosystem of tools and extensions.

16.9 Cloud support

  • All major cloud providers offer managed PostgreSQL services.
  • PostgreSQL’s feature set makes it well-suited for cloud-native applications.

16.10 Emerging trends

  • PostgreSQL is well-positioned for microservices architectures with its robust transactional support and JSON capabilities.
  • Increasing adoption in big data ecosystems, with tools like Foreign Data Wrappers.

This comparison highlights PostgreSQL’s strengths in standards compliance, extensibility, and advanced features, while also acknowledging areas where other databases might have advantages. The choice of database should always be based on specific project requirements, including performance needs, budget constraints, existing team expertise, and specific feature requirements.

Best practices and common pitfalls

17.1 Database design best practices

17.1.1 Normalization

  • Properly normalize your database to reduce redundancy and improve data integrity.
  • Consider denormalization only when performance gains are significant and necessary.

17.1.2 Use appropriate data types

  • Choose the most appropriate data type for each column (e.g., use INTEGER for whole numbers, not VARCHAR).
  • Use BIGINT for IDs if you expect more than 2 billion rows.

17.1.3 Constraints

  • Use constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL) to enforce data integrity.
  • Implement CHECK constraints for domain integrity.

17.2 Indexing best practices

17.2.1 Index wisely

  • Index columns used in WHERE, JOIN, and ORDER BY clauses.
  • Avoid over-indexing; each index adds overhead to writes.

17.2.2 Use appropriate index types

  • Use B-tree indexes for most scenarios.
  • Consider GIN indexes for full-text search or jsonb columns.
  • Use BRIN indexes for very large tables with natural clustering.

17.2.3 Partial indexes

  • Use partial indexes to index only a subset of rows, reducing index size and improving performance.

17.3 Query optimization

17.3.1 Use EXPLAIN ANALYZE

  • Regularly analyze your queries to understand and optimize execution plans.

17.3.2 Avoid SELECT *

  • Only select the columns you need to reduce I/O and network traffic.

17.3.3 Use JOINs appropriately

  • Prefer JOINs over subqueries when possible for better performance.

17.3.4 Utilize Common Table Expressions (CTEs)

  • Use CTEs to improve query readability and optimization.

17.4 Performance tuning

17.4.1 Configure PostgreSQL for your hardware

  • Adjust settings like shared_buffers, work_mem, and effective_cache_size based on your server’s resources.

17.4.2 Regular VACUUM and ANALYZE

  • Schedule regular VACUUM and ANALYZE operations to maintain performance.

17.4.3 Use connection pooling

  • Implement connection pooling (e.g., PgBouncer) for applications with many short-lived connections.

17.5 Security best practices

17.5.1 Principle of least privilege

  • Grant users only the permissions they need.

17.5.2 Use SSL for connections

  • Enable SSL to encrypt data in transit.

17.5.3 Implement row-level security

  • Use RLS for fine-grained access control when needed.

17.6 Backup and recovery

17.6.1 Regular backups

  • Implement a robust backup strategy, including regular full backups and WAL archiving.

17.6.2 Test recovery procedures

  • Regularly test your recovery procedures to ensure they work when needed.

17.7 Common pitfalls

17.7.1 N+1 query problem

  • Avoid executing a query for each result of a previous query. Use JOINs or subqueries instead.

17.7.2 Ignoring MVCC implications

  • Be aware of how MVCC affects your transactions and query results.

17.7.3 Overuse of SERIAL/BIGSERIAL

  • Consider using IDENTITY columns (introduced in PostgreSQL 10) instead of SERIAL for better standards compliance and performance.

17.7.4 Not managing table and index bloat

  • Regular maintenance (VACUUM) is crucial to manage bloat and maintain performance.

17.7.5 Incorrect use of LIKE and regular expressions

  • Avoid using LIKE with a leading wildcard (‘%string’) as it prevents the use of indexes.

17.7.6 Neglecting to update statistics

  • Regularly run ANALYZE to update statistics for the query planner.

17.8 Scalability considerations

17.8.1 Partitioning large tables

  • Use table partitioning for very large tables to improve query performance and manageability.

17.8.2 Consider read replicas

  • Offload read-heavy workloads to read replicas to improve overall system performance.

17.9 Monitoring and logging

17.9.1 Set up proper logging

  • Configure logging to capture slow queries, errors, and other important events.

17.9.2 Use monitoring tools

  • Implement monitoring solutions (e.g., pg_stat_statements, pgBadger) to track database performance.

17.10 Version upgrades

17.10.1 Plan upgrades carefully

  • Always review release notes and test upgrades in a staging environment before applying to production.

17.10.2 Use pg_upgrade

  • Utilize pg_upgrade for faster major version upgrades when possible.

17.11 Development practices

17.11.1 Use database migrations

  • Implement a database migration strategy for managing schema changes across environments.

17.11.2 Avoid ORM over-reliance

  • While ORMs are convenient, be prepared to write raw SQL for complex queries to ensure optimal performance.

17.12 Concurrency management

17.12.1 Use advisory locks

  • Utilize advisory locks for application-level locking scenarios.

17.12.2 Be aware of lock levels

  • Understand different lock levels and their implications on concurrent operations.

By following these best practices and avoiding common pitfalls, you can ensure a more robust, efficient, and maintainable PostgreSQL implementation. Remember that database management is an ongoing process, requiring regular monitoring, tuning, and adaptation to changing requirements and workloads.

Conclusion

18.1 Key takeaways

18.1.1 Feature-rich

PostgreSQL 16 offers a robust set of features, including:

  • Advanced data types (JSONB, Arrays, Geometric types)
  • Powerful indexing options (B-tree, GIN, BRIN, etc.)
  • Full-text search capabilities
  • Extensibility through custom functions and extensions

18.1.2 Standards compliant

PostgreSQL adheres closely to SQL standards, making it a reliable choice for applications requiring strict compliance.

18.1.3 Scalability

With features like table partitioning, parallel query execution, and logical replication, PostgreSQL can scale to handle large datasets and high-concurrency scenarios.

18.1.4 Security

PostgreSQL provides a comprehensive security model, including:

  • Role-based access control
  • Row-level security
  • Data encryption options

18.1.5 Performance

Through its advanced query planner and various optimization techniques, PostgreSQL offers excellent performance for a wide range of workloads.

18.2 PostgreSQL in modern architectures

18.2.1 Cloud-Native Applications

PostgreSQL’s features make it well-suited for cloud-native and microservices architectures.

18.2.2 Big Data Ecosystems

With Foreign Data Wrappers and JSON support, PostgreSQL integrates well with big data ecosystems.

18.2.3 OLTP and OLAP

PostgreSQL can handle both transactional (OLTP) and analytical (OLAP) workloads effectively.

18.3 Future outlook

18.3.1 Continuous improvement

The PostgreSQL community consistently delivers new features and improvements with each release.

18.3.2 Growing adoption

PostgreSQL’s adoption continues to grow in various sectors, from startups to large enterprises.

18.3.3 Emerging technologies

PostgreSQL is well-positioned to adapt to emerging technologies and changing data management needs.

18.4 Final recommendations

18.4.1 Invest in Learning

Given PostgreSQL’s rich feature set, invest time in learning its advanced capabilities to fully leverage its power.

18.4.2 Stay updated

Keep abreast of new PostgreSQL releases and features to take advantage of improvements and new capabilities.

18.4.3 Engage with the community

Participate in the PostgreSQL community for support, knowledge sharing, and potential contributions.

18.4.4 Regular maintenance

Implement a regular maintenance schedule for your PostgreSQL databases to ensure optimal performance and reliability.

18.4.5 Tailored approach

While following best practices, remember to tailor your PostgreSQL implementation to your specific use case and requirements.

18.5 Closing thoughts

PostgreSQL 16 stands as a testament to the power of open-source development and community-driven innovation. Its combination of robust features, reliability, and flexibility makes it an excellent choice for a wide range of database needs, from small applications to large-scale enterprise systems.

As data continues to grow in volume and complexity, PostgreSQL’s advanced features and extensibility position it well to meet future challenges in data management and analysis. Whether you’re a developer, database administrator, or decision-maker, PostgreSQL offers a powerful, cost-effective solution worthy of serious consideration for your data management needs.

By mastering PostgreSQL, you’re equipping yourself with a valuable skill set that is increasingly in demand across the technology industry. Continue exploring, learning, and pushing the boundaries of what you can achieve with this powerful database system.

Last Update: 17/07/2024