Introduction to PostgreSQL
PostgreSQL, often simply called “Postgres,” is a powerful, open-source object-relational database management system (ORDBMS). First released in 1996 (while created even earlier), it has earned a reputation for reliability, feature robustness, and performance.
Key features of PostgreSQL include:
- ACID (Atomicity, Consistency, Isolation, Durability) compliance
- Support for complex queries
- Multi-Version Concurrency Control (MVCC)
- Extensibility
- Strong standards compliance
For developers coming from other database systems, PostgreSQL offers a familiar SQL interface with some unique advantages. Its ability to handle complex workloads, support for JSON, and extensive indexing options make it a versatile choice for a wide range of applications.
Installation and dev environment
Before diving into PostgreSQL’s features, let’s set up a development environment. We’ll cover installation on major operating systems and basic configuration.
2.1 Installation
Ubuntu/Debian:
sudo apt update
sudo apt install postgresql postgresql-contrib
macOS (using Homebrew):
brew install postgresql
Windows: Download the installer from the official PostgreSQL website and follow the installation wizard.
2.2 Verifying installation
After installation, verify that PostgreSQL is running:
sudo systemctl status postgresql
On macOS or if using Homebrew:
brew services list
2.3 Basic configuration
PostgreSQL’s main configuration file is postgresql.conf
. Its location varies by system, but you can find it using:
sudo -u postgres psql -c "SHOW config_file"
Key configuration parameters include:
max_connections
: Maximum number of concurrent connectionsshared_buffers
: Memory used for caching dataeffective_cache_size
: Estimate of memory available for disk caching
Always restart PostgreSQL after changing configuration:
sudo systemctl restart postgresql
Basic PostgreSQL concepts
Before we start using PostgreSQL, let’s familiarize ourselves with some core concepts:
3.1 Clusters
A PostgreSQL installation is called a cluster. It’s a collection of databases managed by a single server instance.
3.2 Databases
A database in PostgreSQL is a named collection of SQL objects (tables, views, functions, etc.). Each cluster can contain multiple databases.
3.3 Schemas
Schemas are namespaces within a database that allow you to organize database objects. They help avoid naming conflicts and can be used for access control.
3.4 Tables
Tables are the primary data storage structures in PostgreSQL. They consist of rows (records) and columns (fields).
3.5 Views
Views are virtual tables based on the result of a SQL query. They don’t store data themselves but provide a way to encapsulate complex queries.
3.6 Functions and procedures
PostgreSQL supports user-defined functions and stored procedures, allowing you to encapsulate logic within the database.
3.7 Roles and privileges
PostgreSQL uses roles for authentication and authorization. Roles can be assigned various privileges to control access to database objects.
Data types
PostgreSQL offers a rich set of built-in data types. Understanding these types is crucial for efficient schema design and query performance.
4.1 Numeric types
INTEGER
: 4-byte signed integerBIGINT
: 8-byte signed integerNUMERIC
orDECIMAL
: Exact numeric with selectable precisionREAL
: 4-byte floating-point numberDOUBLE PRECISION
: 8-byte floating-point number
4.2 Character types
CHAR(n)
: Fixed-length character stringVARCHAR(n)
: Variable-length character string with limitTEXT
: Variable-length character string without limit
4.3 Date/Time types
DATE
: Calendar date (year, month, day)TIME
: Time of dayTIMESTAMP
: Date and timeINTERVAL
: Time span
4.4 Boolean type
BOOLEAN
: True or false
4.5 Array types
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays of any built-in or user-defined type.
Example:
CREATE TABLE matrix (
id SERIAL PRIMARY KEY,
data INTEGER[][]
);
INSERT INTO matrix (data) VALUES (ARRAY[[1,2,3], [4,5,6], [7,8,9]]);
4.6 JSON types
JSON
: Stores JSON data that gets reparsed on each operationJSONB
: Stores JSON data in a binary format for faster processing
4.7 Special types
UUID
: Universally Unique IdentifiersBYTEA
: Binary strings (“byte arrays”)CIDR
,INET
,MACADDR
: Network address types
4.8 Composite types
You can create custom types that combine multiple fields:
CREATE TYPE inventory_item AS (
name TEXT,
supplier_id INTEGER,
price NUMERIC
);
CREATE TABLE on_hand (
item inventory_item,
count INTEGER
);
Understanding these data types allows you to design efficient schemas and write performant queries. In the next section, we’ll dive into creating and managing databases in PostgreSQL.
Setting up PostgreSQL 16
5.1 Detailed installation process
Ubuntu/Debian:
# Create the file repository configuration
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update the package lists
sudo apt update
# Install PostgreSQL 16
sudo apt install postgresql-16 postgresql-contrib-16
Red Hat/CentOS:
# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 16
sudo dnf install -y postgresql16-server
# Initialize the database
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
# Start and enable the service
sudo systemctl start postgresql-16
sudo systemctl enable postgresql-16
macOS (using Homebrew):
# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install PostgreSQL 16
brew install postgresql@16
# Start the PostgreSQL service
brew services start postgresql@16
5.2 Post-installation setup
After installation, follow these steps:
1. Set a password for the default ‘postgres’ user:
sudo -u postgres psql
ALTER USER postgres WITH PASSWORD 'your_secure_password';
\q
2. Create a new database and user:
sudo -u postgres createuser --interactive
sudo -u postgres createdb myapp
3. Configure client authentication by editing the pg_hba.conf
file:
sudo nano /etc/postgresql/16/main/pg_hba.conf
Adjust the lines for local connections according to your security needs.
5.3 Configuring PostgreSQL 16
The main configuration file is still postgresql.conf
. Here are some important settings, with some adjustments specific to PostgreSQL 16:
1. Memory settings:
shared_buffers = 256MB # 25% of RAM for dedicated servers
work_mem = 4MB # Depends on max_connections and workload
maintenance_work_mem = 64MB
2. Checkpoint settings:
checkpoint_timeout = 15min
max_wal_size = 1GB
min_wal_size = 80MB
3. Query planner settings:
random_page_cost = 1.1 # For SSD storage
effective_cache_size = 3GB # 50-75% of total RAM
4. Logging settings:
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_duration_statement = 250ms
5. New in PostgreSQL 16:
wal_keep_size = 1GB # Replaces wal_keep_segments
Remember to restart PostgreSQL after making changes:
sudo systemctl restart postgresql@16
5.4 Setting up pgAdmin 4
For PostgreSQL 16, make sure you have the latest version of pgAdmin:
1. Install pgAdmin 4: For Ubuntu/Debian:
# Add the pgAdmin repository
curl https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo apt-key add -
sudo sh -c 'echo "deb https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/$(lsb_release -cs) pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list'
# Install pgAdmin
sudo apt update
sudo apt install pgadmin4
For macOS:
brew install --cask pgadmin4
2. Launch pgAdmin and create a new server connection as before, ensuring you select PostgreSQL 16 as the version.
5.5 Setting up connection pooling with PgBouncer
The PgBouncer setup remains largely the same, but ensure it’s compatible with PostgreSQL 16:
1. Install PgBouncer:
sudo apt install pgbouncer
2. Configure PgBouncer by editing /etc/pgbouncer/pgbouncer.ini
:
[databases]
* = host=localhost port=5432
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
max_client_conn = 1000
default_pool_size = 20
3. Create the auth file and start PgBouncer.
5.6 Verifying your PostgreSQL 16 setup
1. Check PostgreSQL status:
sudo systemctl status postgresql@16
2. Connect to PostgreSQL:
psql -U postgres -d postgres
3. Run a simple query to check the version:
SELECT version();
This should display PostgreSQL 16.x.
4. Check server logs:
sudo tail -f /var/log/postgresql/postgresql-16-main.log
5. Verify PgBouncer (if installed):
psql -p 6432 -U postgres pgbouncer
SHOW POOLS;
By following these steps, you should now have PostgreSQL 16 set up and ready to use. This version includes several improvements over previous versions, including better performance for parallel queries, improved monitoring capabilities, and enhanced logical replication features.
Creating and managing databases
6.1 Creating databases
In PostgreSQL 16, you can create databases using either SQL commands or command-line tools.
Using SQL:
CREATE DATABASE myapp
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.utf8'
LC_CTYPE = 'en_US.utf8'
TEMPLATE = template0
ALLOW_CONNECTIONS = true;
Using command-line:
createdb -U postgres myapp
6.2 Listing databases
To list all databases:
Using SQL:
SELECT datname FROM pg_database;
Using command-line:
psql -U postgres -l
6.3 Connecting to a database
Connect to a specific database:
psql -U postgres -d myapp
6.4 Dropping a database
To delete a database:
Using SQL:
DROP DATABASE myapp;
Using command-line:
dropdb -U postgres myapp
6.5 Database configuration
PostgreSQL 16 allows per-database configuration. You can set parameters for specific databases:
ALTER DATABASE myapp SET work_mem = '16MB';
6.6 Database statistics
PostgreSQL 16 provides enhanced statistics views. To view database statistics:
SELECT * FROM pg_stat_database WHERE datname = 'myapp';
6.7 Database sizes
To check the size of your databases:
SELECT pg_size_pretty(pg_database_size('myapp'));
6.8 Managing connections
PostgreSQL 16 improved connection management. To view current connections:
SELECT * FROM pg_stat_activity WHERE datname = 'myapp';
To limit connections for a database:
ALTER DATABASE myapp CONNECTION LIMIT 100;
6.9 Database maintenance
Regular maintenance is crucial. PostgreSQL 16 includes improved VACUUM
and ANALYZE
commands:
VACUUM (VERBOSE, ANALYZE) myapp;
6.10 Logical replication
PostgreSQL 16 enhances logical replication. To set up a publication:
CREATE PUBLICATION mypub FOR TABLE mytable;
And to create a subscription on another server:
CREATE SUBSCRIPTION mysub
CONNECTION 'host=192.168.1.50 port=5432 dbname=myapp'
PUBLICATION mypub;
6.11 Backup and restore
PostgreSQL 16 maintains compatibility with pg_dump
and pg_restore
for logical backups:
Backup:
pg_dump -U postgres -d myapp > myapp_backup.sql
Restore:
psql -U postgres -d myapp < myapp_backup.sql
6.12 Extensions
PostgreSQL 16 supports a wide range of extensions. To list available extensions:
SELECT * FROM pg_available_extensions;
To install an extension:
CREATE EXTENSION pg_stat_statements;
6.13 Tablespaces
Manage tablespaces for optimized storage:
CREATE TABLESPACE fastspace LOCATION '/ssd/postgresql/data';
To move a database to a tablespace:
ALTER DATABASE myapp SET TABLESPACE fastspace;
6.14 Role-based access control
PostgreSQL 16 enhances security with role-based access control:
CREATE ROLE readonly;
GRANT CONNECT ON DATABASE myapp TO readonly;
GRANT USAGE ON SCHEMA public TO readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
6.15 Full text search
PostgreSQL 16 improves full-text search capabilities:
CREATE INDEX textsearch_idx ON messages USING GIN (to_tsvector('english', body));
6.16 JSON and JSONB Support
PostgreSQL 16 continues to excel in JSON support:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
info JSONB
);
INSERT INTO orders (info) VALUES ('{"customer": "John Doe", "items": [{"product": "Beer", "qty": 6}]}');
6.17 Monitoring and logging
PostgreSQL 16 provides enhanced monitoring capabilities. To enable detailed logging:
ALTER SYSTEM SET log_statement = 'all';
SELECT pg_reload_conf();
This comprehensive overview covers the main aspects of creating and managing databases in PostgreSQL 16. Each of these topics can be explored in much greater depth, depending on your specific needs and use cases.
Tables and schema design
7.1 Creating tables
Basic table creation syntax:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
hire_date DATE DEFAULT CURRENT_DATE,
salary NUMERIC(10, 2),
department_id INTEGER REFERENCES departments(id)
);
7.2 Data types
PostgreSQL 16 supports a wide range of data types:
- Numeric: INTEGER, BIGINT, NUMERIC, REAL, DOUBLE PRECISION
- Character: CHAR, VARCHAR, TEXT
- Date/Time: DATE, TIME, TIMESTAMP, INTERVAL
- Boolean: BOOLEAN
- Geometric: POINT, LINE, CIRCLE, POLYGON
- Network: INET, CIDR, MACADDR
- JSON: JSON, JSONB
- Arrays: Any data type can be an array
- Custom types: CREATE TYPE
Example using various data types:
CREATE TABLE product_inventory (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
description TEXT,
price NUMERIC(10, 2),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
is_available BOOLEAN DEFAULT true,
tags TEXT[],
properties JSONB
);
7.3 Constraints
PostgreSQL 16 supports various constraints to ensure data integrity:
- PRIMARY KEY
- FOREIGN KEY
- UNIQUE
- NOT NULL
- CHECK
- EXCLUSION
Example:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date DATE DEFAULT CURRENT_DATE,
total_amount NUMERIC(10, 2),
status VARCHAR(20),
CONSTRAINT check_positive_amount CHECK (total_amount > 0),
CONSTRAINT valid_status CHECK (status IN ('Pending', 'Shipped', 'Delivered'))
);
7.4 Indexes
Indexes are crucial for query performance. PostgreSQL 16 supports various index types:
- B-tree (default)
- Hash
- GiST
- SP-GiST
- GIN
- BRIN
Example:
CREATE INDEX idx_employee_last_name ON employees (last_name);
CREATE INDEX idx_product_properties ON product_inventory USING GIN (properties);
7.5 Partitioning
PostgreSQL 16 enhances table partitioning for improved performance with large tables:
CREATE TABLE measurements (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurements_y2023m01 PARTITION OF measurements
FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE measurements_y2023m02 PARTITION OF measurements
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
7.6 Inheritance
While partitioning is preferred for most use cases, table inheritance is still supported:
CREATE TABLE cities (
name text,
population float,
elevation int -- in feet
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
7.7 Schemas
Schemas help organize database objects:
CREATE SCHEMA hr;
CREATE TABLE hr.employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
7.8 Views
Views provide a powerful way to encapsulate complex queries:
CREATE VIEW high_salary_employees AS
SELECT * FROM employees
WHERE salary > 100000;
7.9 Materialized views
Materialized views store the result of a query for faster access:
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
date_trunc('month', order_date) AS month,
SUM(total_amount) AS total_sales
FROM orders
GROUP BY 1;
7.10 Table inheritance vs. partitioning
While both allow splitting data across multiple tables, partitioning is generally preferred for performance reasons. Use inheritance for modeling class hierarchies and partitioning for large tables that need to be split based on certain criteria.
7.11 Generated columns
PostgreSQL 16 supports generated columns:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
full_name VARCHAR(101) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);
7.12 Temporary tables
Temporary tables are automatically dropped at the end of a session:
CREATE TEMPORARY TABLE temp_results (
id INT,
result VARCHAR(100)
);
7.13 Unlogged tables
Unlogged tables are faster but not crash-safe:
CREATE UNLOGGED TABLE fast_imports (
id SERIAL,
data TEXT
);
7.14 Table spaces
Tablespaces allow you to define locations on the file system where the files representing database objects can be stored:
CREATE TABLESPACE fastspace LOCATION '/ssd/postgresql/data';
CREATE TABLE fast_access_table (id INT, data TEXT) TABLESPACE fastspace;
7.15 Column storage and compression
PostgreSQL 16 offers TOAST (The Oversized-Attribute Storage Technique) for large field values:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
CONSTRAINT content_length CHECK (length(content) < 1000000)
) WITH (fillfactor=80);
7.16 System columns
Every table has several system columns that are implicitly defined by the system:
- tableoid
- xmin
- cmin
- xmax
- cmax
- ctid
You can access these for advanced operations:
SELECT xmin, xmax, * FROM mytable;
This overview covers the main aspects of table and schema design in PostgreSQL 16. Each of these topics can be explored in greater depth depending on your specific requirements.
SQL basics
8.1 SELECT statements
The SELECT statement is the foundation of querying data in PostgreSQL.
Basic SELECT:
SELECT first_name, last_name, salary FROM employees;
Using WHERE clause:
SELECT * FROM employees WHERE department_id = 3;
ORDER BY clause:
SELECT * FROM employees ORDER BY salary DESC, last_name ASC;
LIMIT and OFFSET:
SELECT * FROM employees ORDER BY id LIMIT 10 OFFSET 20;
8.2 JOINs
PostgreSQL supports various types of JOINs.
INNER JOIN:
SELECT e.first_name, e.last_name, d.name AS department
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
LEFT JOIN:
SELECT c.name, o.id AS order_id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
FULL OUTER JOIN:
SELECT e.name AS employee, p.name AS project
FROM employees e
FULL OUTER JOIN projects p ON e.project_id = p.id;
8.3 Aggregations
GROUP BY and aggregate functions:
SELECT department_id, AVG(salary) AS avg_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
8.4 Subqueries
Subqueries can be used in various parts of a SQL statement:
SELECT first_name, last_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
8.5 Common table expressions (CTEs)
CTEs provide a way to write auxiliary statements for use in a larger query:
WITH high_salary_employees AS (
SELECT * FROM employees WHERE salary > 100000
)
SELECT department_id, COUNT(*)
FROM high_salary_employees
GROUP BY department_id;
8.6 Window functions
Window functions perform calculations across a set of rows:
SELECT
first_name,
last_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
8.7 INSERT statements
Basic INSERT:
INSERT INTO employees (first_name, last_name, email, salary)
VALUES ('John', 'Doe', '[email protected]', 75000);
Multi-row INSERT:
INSERT INTO employees (first_name, last_name, email, salary)
VALUES
('Jane', 'Smith', '[email protected]', 80000),
('Bob', 'Johnson', '[email protected]', 72000);
INSERT from SELECT:
INSERT INTO high_salary_employees
SELECT * FROM employees WHERE salary > 100000;
8.8 UPDATE statements
Basic UPDATE:
UPDATE employees
SET salary = salary * 1.1
WHERE department_id = 3;
UPDATE with JOIN:
UPDATE employees e
SET salary = salary * 1.15
FROM departments d
WHERE e.department_id = d.id AND d.name = 'Sales';
8.9 DELETE statements
Basic DELETE:
DELETE FROM employees WHERE last_login < '2022-01-01';
DELETE with subquery:
DELETE FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE last_purchase < '2022-01-01');
8.10 UPSERT (INSERT ON CONFLICT)
PostgreSQL provides an elegant way to handle insert-or-update scenarios:
INSERT INTO products (code, name, price)
VALUES ('A001', 'Widget', 19.99)
ON CONFLICT (code)
DO UPDATE SET price = EXCLUDED.price;
8.11 RETURNING clause
The RETURNING clause allows you to retrieve data from rows affected by an INSERT, UPDATE, or DELETE:
UPDATE employees
SET salary = salary * 1.1
WHERE department_id = 3
RETURNING id, first_name, last_name, salary AS new_salary;
8.12 Lateral Joins
LATERAL joins allow subqueries in the FROM clause to reference columns from preceding items:
SELECT e.name, e.department_id, latest_orders.order_id, latest_orders.order_date
FROM employees e,
LATERAL (
SELECT order_id, order_date
FROM orders
WHERE orders.employee_id = e.id
ORDER BY order_date DESC
LIMIT 3
) AS latest_orders;
8.13 Full Text Search
PostgreSQL offers powerful full-text search capabilities:
CREATE INDEX idx_fts_title ON books USING GIN (to_tsvector('english', title));
SELECT title
FROM books
WHERE to_tsvector('english', title) @@ to_tsquery('english', 'postgresql & database');
8.14 JSON operations
PostgreSQL has robust support for JSON data:
SELECT info -> 'customer' ->> 'name' AS customer_name
FROM orders
WHERE (info -> 'items' ->> 'product')::jsonb ? 'Laptop';
8.15 Arrays
Working with array columns:
SELECT * FROM employees
WHERE skills && ARRAY['Python', 'SQL'];
UPDATE employees
SET skills = array_append(skills, 'JavaScript')
WHERE id = 1;
8.16 Regular expressions
PostgreSQL supports POSIX regular expressions:
SELECT * FROM employees
WHERE email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$';
This overview covers the fundamental SQL operations in PostgreSQL, along with some of its more advanced features. Each of these topics can be explored in much greater depth, and there are many more advanced features and optimizations available in PostgreSQL.
Advanced queries and JOINs
9.1 Advanced JOIN Techniques
9.1.1 Self-Joins
Useful for hierarchical or comparative data within a table:
SELECT e1.name AS employee, e2.name AS manager
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.id;
9.1.2 Cross joins
Generates a Cartesian product of two tables:
SELECT products.name, colors.name
FROM products
CROSS JOIN colors;
9.1.3 Natural joins
Automatically join tables based on columns with the same name:
SELECT * FROM employees NATURAL JOIN departments;
9.2 Set operations
9.2.1 UNION and UNION ALL
Combines result sets of two or more SELECT statements:
(SELECT name FROM employees)
UNION
(SELECT name FROM customers);
-- UNION ALL doesn't remove duplicates
(SELECT product_id FROM orders_2022)
UNION ALL
(SELECT product_id FROM orders_2023);
9.2.2 INTERSECT
Returns rows that appear in both result sets:
(SELECT customer_id FROM orders_2022)
INTERSECT
(SELECT customer_id FROM orders_2023);
9.2.3 EXCEPT
Returns rows from the first query that don’t appear in the second query:
(SELECT customer_id FROM all_customers)
EXCEPT
(SELECT customer_id FROM orders);
9.3 Advanced subqueries
9.3.1 Correlated Subqueries
Subqueries that depend on the outer query:
SELECT e.name, e.salary
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
9.3.2 Subqueries in SELECT
Using subqueries to compute values in SELECT:
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) as company_avg,
salary - (SELECT AVG(salary) FROM employees) as diff_from_avg
FROM employees;
9.3.3 Subqueries with EXISTS
Checks for the existence of rows that satisfy a subquery:
SELECT name
FROM departments d
WHERE EXISTS (
SELECT 1
FROM employees e
WHERE e.department_id = d.id AND salary > 100000
);
9.4 Advanced Aggregations
9.4.1 GROUPING SETS
Allows specification of multiple groupings in a single query:
SELECT
COALESCE(department, 'All Departments') as department,
COALESCE(job_title, 'All Jobs') as job,
AVG(salary)
FROM employees
GROUP BY GROUPING SETS ((department, job_title), (department), (job_title), ());
9.4.2 ROLLUP
Produces a result set including all subtotals and a grand total:
SELECT
COALESCE(department, 'Total') as department,
COALESCE(job_title, 'Subtotal') as job,
SUM(salary)
FROM employees
GROUP BY ROLLUP (department, job_title);
9.4.3 CUBE
Generates all possible combinations of the specified dimensions:
SELECT
COALESCE(department, 'All') as department,
COALESCE(job_title, 'All') as job,
SUM(salary)
FROM employees
GROUP BY CUBE (department, job_title);
9.5 Window functions
9.5.1 Ranking functions
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_salary_rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as overall_salary_rank
FROM employees;
9.5.2 Offset functions
SELECT
name,
department,
hire_date,
salary,
LAG(salary) OVER (PARTITION BY department ORDER BY hire_date) as prev_salary,
LEAD(salary) OVER (PARTITION BY department ORDER BY hire_date) as next_salary
FROM employees;
9.5.3 Aggregate window functions
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department) as diff_from_dept_avg
FROM employees;
9.6 Recursive queries
Useful for hierarchical or tree-structured data:
WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id
FROM employees
WHERE name = 'Jane Doe'
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e
INNER JOIN subordinates s ON s.id = e.manager_id
)
SELECT * FROM subordinates;
9.7 LATERAL Joins
Allows subqueries in the FROM clause to reference columns from preceding items:
SELECT
d.name AS department,
e.name AS top_employee,
e.salary
FROM
departments d,
LATERAL (
SELECT name, salary
FROM employees
WHERE department_id = d.id
ORDER BY salary DESC
LIMIT 1
) e;
9.8 Full Text Search with advanced options
CREATE INDEX idx_fts_document ON documents USING GIN (to_tsvector('english', content));
SELECT title, ts_rank(to_tsvector('english', content), query) AS rank
FROM documents, to_tsquery('english', 'postgresql & database') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;
9.9 Complex JSON queries
SELECT
id,
info -> 'customer' ->> 'name' AS customer_name,
jsonb_array_elements(info -> 'items') ->> 'product' AS product,
(jsonb_array_elements(info -> 'items') ->> 'price')::numeric AS price
FROM orders
WHERE (info -> 'total')::numeric > 1000;
These advanced querying techniques showcase the power and flexibility of PostgreSQL 16. They allow for complex data analysis, hierarchical data querying, and sophisticated data manipulations. Each of these concepts can be explored further and combined in various ways to solve complex data problems.
Indexing and performance optimization
10.1 Types of indexes
PostgreSQL 16 supports several types of indexes, each suited for different scenarios:
10.1.1 B-tree Indexes
The default and most common index type, suitable for equality and range queries:
CREATE INDEX idx_employee_last_name ON employees (last_name);
10.1.2 Hash indexes
Optimized for equality comparisons:
CREATE INDEX idx_product_code ON products USING HASH (product_code);
10.1.3 GiST (Generalized Search Tree) indexes
Useful for full-text search, geometric data, and custom data types:
CREATE INDEX idx_geo_location ON stores USING GIST (location);
10.1.4 SP-GiST (Space-Partitioned GiST) indexes
Good for clustered data and tree-like structures:
CREATE INDEX idx_ip_address ON network_logs USING SPGIST (ip_address inet_ops);
10.1.5 GIN (Generalized Inverted Index) indexes
Efficient for multi-value columns like arrays and full-text search:
CREATE INDEX idx_tags ON products USING GIN (tags);
10.1.6 BRIN (Block Range INdex) indexes
Suitable for very large tables with natural clustering:
CREATE INDEX idx_timestamp ON large_table USING BRIN (timestamp);
10.2 Composite indexes
Indexes on multiple columns can be more efficient for queries that filter on these columns together:
CREATE INDEX idx_employee_dept_hire_date ON employees (department_id, hire_date);
10.3 Partial indexes
Indexes on a subset of rows can be smaller and more efficient:
CREATE INDEX idx_active_users ON users (last_login) WHERE active = true;
10.4 Expression indexes
Indexes can be created on expressions for queries that frequently use these expressions:
CREATE INDEX idx_lower_email ON users (lower(email));
10.5 Unique indexes
Enforce uniqueness while providing fast lookups:
CREATE UNIQUE INDEX idx_unique_email ON users (email);
10.6 Index-only scans
Design indexes to cover queries for optimal performance:
CREATE INDEX idx_employee_coverage ON employees (id, first_name, last_name, email);
10.7 Analyzing query performance
10.7.1 EXPLAIN View the query plan:
EXPLAIN SELECT * FROM employees WHERE department_id = 5;
10.7.2 EXPLAIN ANALYZE
Execute the query and show actual timings:
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 100000;
10.8 Query optimization techniques
10.8.1 Rewriting queries
Sometimes, rewriting a query can lead to better performance:
-- Instead of:
SELECT * FROM orders WHERE extract(year from order_date) = 2023;
-- Use:
SELECT * FROM orders WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
10.8.2 Proper JOIN order
Ensure smaller datasets are processed first:
SELECT * FROM small_table s
JOIN large_table l ON s.id = l.small_id;
10.8.3 Avoid functions on indexed columns
Functions on indexed columns can prevent index usage:
-- Instead of:
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';
-- Use:
SELECT * FROM employees WHERE last_name = 'Smith';
10.9 Table partitioning
Divide large tables into smaller, more manageable chunks:
CREATE TABLE measurements (
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurements_y2023m01 PARTITION OF measurements
FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE measurements_y2023m02 PARTITION OF measurements
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
10.10 Parallel query execution
PostgreSQL 16 can utilize multiple CPU cores for query execution:
SET max_parallel_workers_per_gather = 4;
10.11 Materialized views
Pre-compute and store complex query results:
CREATE MATERIALIZED VIEW mv_monthly_sales AS
SELECT
date_trunc('month', order_date) AS month,
SUM(total_amount) AS total_sales
FROM orders
GROUP BY 1;
REFRESH MATERIALIZED VIEW mv_monthly_sales;
10.12 Vacuuming and Analyzing
Regular maintenance is crucial for performance:
VACUUM ANALYZE;
10.13 JIT (Just-In-Time) Compilation
PostgreSQL can compile parts of query execution plans to machine code:
SET jit = on;
10.14 Optimizing configuration
Adjust PostgreSQL configuration for your hardware and workload:
ALTER SYSTEM SET shared_buffers = '1GB';
ALTER SYSTEM SET effective_cache_size = '3GB';
ALTER SYSTEM SET work_mem = '16MB';
10.15 Monitoring and logging
Enable logging to identify slow queries:
ALTER SYSTEM SET log_min_duration_statement = '1000'; -- Log queries taking more than 1 second
SELECT pg_reload_conf();
10.16 Using pg_stat_statements
Enable and use pg_stat_statements
for query performance analysis:
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, total_exec_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
These indexing and performance optimization techniques form the foundation for maintaining a fast and efficient PostgreSQL 16 database. Remember that optimization is an iterative process, and what works best can depend on your specific data, queries, and hardware.
Transactions and ACID properties
11.1 Understanding ACID properties
PostgreSQL fully supports ACID properties:
- Atomicity: All operations in a transaction succeed or all fail.
- Consistency: The database remains in a consistent state before and after the transaction.
- Isolation: Concurrent transactions do not interfere with each other.
- Durability: Once a transaction is committed, it remains so even in the event of system failure.
11.2 Basic transaction control
11.2.1 Starting a transaction
BEGIN;
-- or
START TRANSACTION;
11.2.2 Committing a transaction
COMMIT;
11.2.3 Rolling back a transaction
ROLLBACK;
11.2.4 Example of a transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
11.3 Savepoints
Savepoints allow you to roll back parts of a transaction:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Oops, wrong account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100 WHERE id = 3;
COMMIT;
11.4 Transaction isolation levels
PostgreSQL 16 supports all four standard isolation levels:
11.4.1 Read uncommitted
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
11.4.2 Read committed (default in PostgreSQL)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
11.4.3 Repeatable read
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
11.4.4 Serializable
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
11.5 Understanding isolation levels
Each isolation level prevents certain phenomena:
- Dirty read: Reading data from an uncommitted transaction.
- Non-repeatable read: Getting different results when reading the same data twice in the same transaction.
- Phantom read: A transaction re-executing a query gets a different set of rows than the first execution.
PostgreSQL’s implementation prevents dirty reads in all isolation levels.
11.6 Explicit locking
11.6.1 Table-Level locks
LOCK TABLE accounts IN EXCLUSIVE MODE;
11.6.2 Row-Level locks
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
11.7 Deadlock detection
PostgreSQL automatically detects and resolves deadlocks:
SET deadlock_timeout = '1s';
11.8 Advisory locks
Application-defined locks that don’t lock any database objects:
SELECT pg_advisory_lock(id) FROM accounts WHERE id = 1;
-- Do some work
SELECT pg_advisory_unlock(id) FROM accounts WHERE id = 1;
11.9 Transaction management in applications
When using PostgreSQL with an application, consider:
11.9.1 Connection pooling
Use tools like PgBouncer to manage database connections efficiently.
11.9.2 Error handling
Ensure proper error handling and transaction management in your application code.
11.10 Monitoring transactions
11.10.1 Active transactions
SELECT * FROM pg_stat_activity WHERE state = 'active';
11.10.2 Long-running transactions
SELECT pid, now() - xact_start AS xact_runtime, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;
11.11 Two-phase commit (2PC)
For distributed transactions across multiple databases:
BEGIN;
-- Do some work
PREPARE TRANSACTION 'my_transaction';
-- On another system or later time
COMMIT PREPARED 'my_transaction';
-- or
ROLLBACK PREPARED 'my_transaction';
11.12 Transaction logging
PostgreSQL uses Write-Ahead Logging (WAL) to ensure durability:
SHOW wal_level;
11.13 Handling concurrent updates
Use conditional updates to handle concurrent modifications:
UPDATE accounts
SET balance = balance - 100
WHERE id = 1 AND balance >= 100;
11.14 Read-only transactions
For queries that don’t modify data:
BEGIN READ ONLY;
SELECT * FROM accounts;
COMMIT;
11.15 Deferrable constraints
Constraints can be checked at the end of a transaction:
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
);
BEGIN;
INSERT INTO orders (id, customer_id) VALUES (1, 100);
INSERT INTO customers (id, name) VALUES (100, 'New Customer');
COMMIT;
11.16 Transaction timeout
Set a timeout for long-running transactions:
SET statement_timeout = '5min';
Understanding and properly implementing transactions and ACID properties is crucial for maintaining data integrity in PostgreSQL 16. These concepts ensure that your database remains consistent even in the face of concurrent access and system failures.
Concurrency control
12.1 Multi-Version concurrency Control (MVCC)
PostgreSQL uses MVCC to handle concurrent access to the database:
- Each transaction sees a snapshot of data as it was at the start of the transaction.
- Updates create new versions of rows, rather than overwriting old data.
- This allows readers and writers to operate without blocking each other in most cases.
12.2 Transaction isolation levels revisited
Understanding how isolation levels affect concurrency:
12.2.1 Read committed
- Provides a trade-off between consistency and concurrency.
- Each SQL statement sees only committed data as of the start of that statement.
12.2.2 Repeatable read
- Ensures that a transaction sees a consistent snapshot of the database throughout its lifetime.
- Prevents non-repeatable reads and phantom reads.
12.2.3 Serializable
- Provides the strictest isolation, simulating serial transaction execution.
- May lead to serialization failures that require transaction retries.
12.3 Explicit locking mechanisms
12.3.1 Row-level locks
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
12.3.2 Table-level locks
LOCK TABLE accounts IN EXCLUSIVE MODE;
12.3.3 Advisory locks
SELECT pg_advisory_lock(id) FROM accounts WHERE id = 1;
12.4 Handling concurrent inserts
12.4.1 UNIQUE constraints
Enforce uniqueness even with concurrent inserts:
CREATE UNIQUE INDEX idx_unique_email ON users (email);
12.4.2 INSERT … ON CONFLICT
Handle potential conflicts during insert:
INSERT INTO users (email, name)
VALUES ('[email protected]', 'John Doe')
ON CONFLICT (email) DO UPDATE
SET name = EXCLUDED.name;
12.5 Optimistic locking
Implement optimistic locking using a version column:
UPDATE products
SET stock = stock - 1, version = version + 1
WHERE id = 123 AND version = 1;
12.6 Pessimistic locking
Use FOR UPDATE to lock rows pessimistically:
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Process the account
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
12.7 Deadlock detection and prevention
12.7.1 Automatic deadlock detection
PostgreSQL automatically detects and resolves deadlocks by aborting one of the transactions.
12.7.2 Deadlock prevention
- Acquire locks in a consistent order across transactions.
- Use timeouts to prevent indefinite waiting:
SET lock_timeout = '10s';
12.8 SELECT FOR SHARE
Allow concurrent reads while preventing updates:
SELECT * FROM products WHERE id = 1 FOR SHARE;
12.9 NOWAIT and SKIP LOCKED
Handle concurrent access without waiting:
SELECT * FROM tasks WHERE status = 'pending'
FOR UPDATE SKIP LOCKED
LIMIT 1;
12.10 Handling hot spots
For frequently updated rows, consider techniques like:
- Splitting hot rows into multiple rows
- Using advisory locks instead of row locks
- Implementing application-level queues
12.11 Concurrent index creation
Create indexes without blocking writes:
CREATE INDEX CONCURRENTLY idx_user_email ON users (email);
12.12 Monitoring concurrency
12.12.1 Checking for locks
SELECT * FROM pg_locks WHERE NOT granted;
12.12.2 Identifying blocking queries
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
12.13 Handling long-running queries
Use statement_timeout
to prevent queries from running too long:
SET statement_timeout = '30s';
12.14 Connection pooling
Implement connection pooling (e.g., using PgBouncer) to manage database connections efficiently and reduce connection overhead.
12.15 Partition-level locks
In partitioned tables, locks can be acquired on individual partitions:
SELECT * FROM measurements_y2023m01 FOR UPDATE;
12.16 Conflict resolution in logical replication
Handle conflicts in logical replication setups:
CREATE OR REPLACE FUNCTION public.resolve_conflict_on_unique_constraint()
RETURNS TABLE (resolution text)
LANGUAGE SQL
AS $$
-- Custom conflict resolution logic
$$;
ALTER PUBLICATION my_publication
ADD TABLE public.users (PUBLISH_VIA_ROOT);
ALTER SUBSCRIPTION my_subscription
SET (publication_name = 'my_publication',
conflict_resolution = 'resolve_conflict_on_unique_constraint');
12.17 READ ONLY transactions
Use read-only transactions for queries that don’t need to modify data:
BEGIN READ ONLY;
SELECT * FROM accounts;
COMMIT;
Effective concurrency control is vital for maintaining data integrity and optimizing performance in multi-user PostgreSQL 16 environments. It requires a good understanding of PostgreSQL’s MVCC model, proper use of transactions and isolation levels, and careful application design.
Backup and recovery
13.1 Types of backups
13.1.1 Logical backups
- Contain SQL statements to recreate the database
- Flexible but slower to restore for large databases
13.1.2 Physical backups
- Exact copy of the data files
- Faster to restore but less flexible
13.1.3 Continuous archiving
- Allows point-in-time recovery (PITR)
- Combines full backups with Write-Ahead Log (WAL) archiving
13.2 Logical Backups with pg_dump
13.2.1 Full database backup
pg_dump dbname > backup.sql
13.2.2 Specific table backup
pg_dump -t tablename dbname > table_backup.sql
13.2.3 Custom format backup (compressed)
pg_dump -Fc dbname > backup.dump
13.3 Logical backups with pg_dumpall
For backing up an entire PostgreSQL cluster, including roles and tablespaces:
pg_dumpall > full_backup.sql
13.4 Restoring logical backups
13.4.1 Plain Text Format
psql dbname < backup.sql
13.4.2 Custom format
pg_restore -d dbname backup.dump
13.5 Physical backups
13.5.1 File system level backup
tar -cvf backup.tar /var/lib/postgresql/data
13.5.2 Using pg_basebackup
pg_basebackup -D /backup/path -Ft -z -P
13.6 Continuous archiving and Point-in-Time recovery (PITR)
13.6.1 Configuring WAL archiving
In postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
13.6.2 Taking a base backup
pg_basebackup -D /backup/base -Ft -z -P
13.6.3 Performing PITR
Create a recovery.conf (or recovery.signal
in PostgreSQL 12+) file in the data directory:
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = '2023-07-17 15:00:00'
13.7 Backup validation
Always validate your backups to ensure they can be restored:
pg_restore --list backup.dump
13.8 Incremental backups
While PostgreSQL doesn’t natively support incremental backups, you can achieve similar results using WAL archiving and PITR.
13.9 Logical replication for backup
Use logical replication to maintain a standby server that can serve as a backup:
CREATE PUBLICATION my_publication FOR ALL TABLES;
-- On the standby server:
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=primary dbname=mydb'
PUBLICATION my_publication;
13.10 Backup encryption
Encrypt your backups to protect sensitive data:
pg_dump dbname | gpg -c > encrypted_backup.sql.gpg
13.11 Automated backup scripts
Create scripts to automate your backup process. Here’s a simple example:
#!/bin/bash
BACKUP_DIR="/path/to/backups"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
DB_NAME="your_database"
pg_dump -Fc $DB_NAME > $BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump
# Keep only the last 7 days of backups
find $BACKUP_DIR -type f -mtime +7 -name '*.dump' -delete
13.12 Monitoring backup process
Use pg_stat_archiver
to monitor WAL archiving:
SELECT * FROM pg_stat_archiver;
13.13 Testing recovery procedures
Regularly test your recovery procedures to ensure they work as expected:
- Set up a test environment
- Restore your backup
- Apply WAL files (for PITR)
- Verify data integrity
13.14 Backing up large databases
For very large databases, consider:
- Using parallel dumps:
pg_dump -j 4 dbname > backup.sql
- Splitting backups into multiple files
- Using compression:
pg_dump -Fc dbname > backup.dump
13.15 Handling tablespaces in backups
When using tablespaces, ensure your backup strategy includes all tablespace locations:
pg_basebackup -D /backup/base -Ft -z -P -T /path/to/tablespace=/backup/tablespaces
13.16 Backup retention policy
Implement a backup retention policy based on your business needs:
- Keep daily backups for the last week
- Keep weekly backups for the last month
- Keep monthly backups for the last year
13.17 Cloud backup solutions
Consider using cloud storage solutions for off-site backup storage:
- AWS S3
- Google Cloud Storage
- Azure Blob Storage
Implement scripts to automatically upload backups to cloud storage.
13.18 Backup metadata management
Keep track of your backups:
CREATE TABLE backup_log (
id SERIAL PRIMARY KEY,
backup_time TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
backup_type VARCHAR(50),
file_name VARCHAR(255),
file_size BIGINT,
status VARCHAR(20)
);
INSERT INTO backup_log (backup_type, file_name, file_size, status)
VALUES ('FULL', 'backup_20230717.dump', 1234567890, 'SUCCESS');
Proper backup and recovery strategies are crucial for protecting your data and ensuring business continuity. Regular testing of backup and recovery procedures is essential to guarantee their effectiveness when needed.
PostgreSQL extensions
14.1 Understanding extensions
Extensions in PostgreSQL are packages of SQL objects that add functionality to the database. They can include new data types, functions, operators, index types, and more.
14.2 Listing available extensions
To see what extensions are available in your PostgreSQL installation:
SELECT * FROM pg_available_extensions;
14.3 Installing extensions
Most extensions can be installed with a simple SQL command:
CREATE EXTENSION extension_name;
14.4 Popular extensions
14.4.1 PostGIS
Adds support for geographic objects:
CREATE EXTENSION postgis;
-- Example usage
CREATE TABLE cities (
id SERIAL PRIMARY KEY,
name VARCHAR(50),
location GEOMETRY(Point, 4326)
);
INSERT INTO cities (name, location)
VALUES ('Sofia', ST_SetSRID(ST_MakePoint(-74.006, 40.7128), 4326));
14.4.2 pg_stat_statements
Provides execution statistics for all SQL statements:
CREATE EXTENSION pg_stat_statements;
-- Query to see top time-consuming queries
SELECT query, calls, total_exec_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
14.4.3 pgcrypto
Provides cryptographic functions:
CREATE EXTENSION pgcrypto;
-- Example: Hashing a password
SELECT crypt('my_password', gen_salt('bf'));
14.4.4 uuid-ossp
Generates universally unique identifiers (UUIDs):
CREATE EXTENSION "uuid-ossp";
-- Generate a UUID
SELECT uuid_generate_v4();
14.4.5 hstore
Provides a key-value pair storage data type:
CREATE EXTENSION hstore;
CREATE TABLE books (
id SERIAL PRIMARY KEY,
attributes hstore
);
INSERT INTO books (attributes)
VALUES ('author => "Jane Austen", title => "Pride and Prejudice", year => "1813"');
SELECT attributes -> 'author' AS author FROM books;
14.4.6 pg_trgm
Provides functions and operators for text similarity matching:
CREATE EXTENSION pg_trgm;
-- Create a GIN index for faster similarity searches
CREATE INDEX trgm_idx_products_name ON products USING GIN (name gin_trgm_ops);
-- Find similar product names
SELECT name, similarity(name, 'iPhone') AS sim
FROM products
WHERE name % 'iPhone'
ORDER BY sim DESC, name;
14.5 Custom extensions
You can also create your own extensions:
-- In a file named my_extension--1.0.sql
CREATE FUNCTION my_function() RETURNS TEXT AS $$
BEGIN
RETURN 'Hello from my extension!';
END;
$$ LANGUAGE plpgsql;
-- In psql
CREATE EXTENSION my_extension;
14.6 Updating extensions
To update an extension to the latest version:
ALTER EXTENSION extension_name UPDATE;
14.7 Removing extensions
To remove an extension:
DROP EXTENSION extension_name;
14.8 Extension dependencies
Some extensions depend on others. PostgreSQL handles these dependencies automatically.
14.9 Popular third-party extensions
14.9.1 TimescaleDB
Optimizes PostgreSQL for time-series data:
CREATE EXTENSION timescaledb;
CREATE TABLE sensors (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);
SELECT create_hypertable('sensors', 'time');
14.9.2 PL/Python
Allows writing functions in Python:
CREATE EXTENSION plpython3u;
CREATE FUNCTION pymax (a integer, b integer)
RETURNS integer
AS $$
if a > b:
return a
return b
$$ LANGUAGE plpython3u;
14.9.3 pg_cron
Allows scheduling PostgreSQL commands:
CREATE EXTENSION pg_cron;
— Schedule a job to run every day at midnight
SELECT cron.schedule(‘0 0 * * *’, $$VACUUM ANALYZE$$);
14.10 Performance considerations
While extensions can add powerful functionality, they may also impact performance. Always test thoroughly before deploying to production.
14.11 Security considerations
Some extensions may introduce security risks if not properly configured. Always review the security implications before installing an extension.
14.12 Extension management in applications
When developing applications, consider using extensions to simplify your database schema and improve functionality. However, be aware of the deployment implications when using extensions.
14.13 Exploring extension internals
You can explore how extensions work internally:
-- View extension contents
SELECT * FROM pg_extension WHERE extname = 'postgis';
-- View objects that belong to an extension
SELECT * FROM pg_depend
WHERE refclassid = 'pg_extension'::regclass
AND refobjid = (SELECT oid FROM pg_extension WHERE extname = 'postgis');
Extensions greatly enhance PostgreSQL’s capabilities, allowing you to tailor your database to specific needs without modifying the core PostgreSQL code. They’re a key feature that contributes to PostgreSQL’s flexibility and power.
Security and user management
15.1 User management
15.1.1 Creating users (Roles)
CREATE ROLE john LOGIN PASSWORD 'secure_password';
-- or
CREATE USER john WITH PASSWORD 'secure_password';
15.1.2 Altering users
ALTER ROLE john WITH PASSWORD 'new_secure_password';
15.1.3 Dropping users
DROP ROLE john;
15.2 Role attributes
15.2.1 Superuser
CREATE ROLE admin SUPERUSER LOGIN PASSWORD 'very_secure_password';
15.2.2 Create role
CREATE ROLE manager CREATEROLE LOGIN PASSWORD 'manager_password';
15.2.3 Create database
CREATE ROLE dba CREATEDB LOGIN PASSWORD 'dba_password';
15.3 Role membership
15.3.1 Creating group roles
CREATE ROLE developers;
GRANT developers TO john, jane;
15.3.2 Setting role inheritance
CREATE ROLE analyst INHERIT;
GRANT developers TO analyst;
15.4 Schema management
15.4.1 Creating schemas
CREATE SCHEMA hr;
15.4.2 Setting search path
SET search_path TO hr, public;
15.5 Privileges
15.5.1 Granting privileges
GRANT SELECT, INSERT ON TABLE employees TO hr_staff;
GRANT USAGE ON SCHEMA hr TO hr_staff;
15.5.2 Revoking privileges
REVOKE INSERT ON TABLE employees FROM hr_staff;
15.5.3 Default privileges
ALTER DEFAULT PRIVILEGES IN SCHEMA hr
GRANT SELECT ON TABLES TO read_only;
15.6 Row-level security (RLS)
15.6.1 Enabling RLS
ALTER TABLE employees ENABLE ROW LEVEL SECURITY;
15.6.2 Creating policies
CREATE POLICY employee_isolation ON employees
USING (department = current_user);
15.7 Column-level security
Use views and grants to restrict column access:
CREATE VIEW public_employee_info AS
SELECT id, name, department FROM employees;
GRANT SELECT ON public_employee_info TO PUBLIC;
15.8 Data encryption
15.8.1 Password encryption
CREATE EXTENSION pgcrypto;
UPDATE users SET password = crypt('rawpassword', gen_salt('bf'));
15.8.2 Data-at-Rest encryption
Use file system-level encryption or tools like dm-crypt.
15.9 SSL configuration
15.9.1 Enabling SSL
In postgresql.conf
:
ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
15.9.2 Requiring SSL connections
In pg_hba.conf:
hostssl all all all md5
15.10 Authentication methods
15.10.1 Password authentication
host all all 0.0.0.0/0 md5
15.10.2 SCRAM authentication (more secure)
host all all 0.0.0.0/0 scram-sha-256
15.11 Connection limits
15.11.1 Per-role connection limits
ALTER ROLE worker CONNECTION LIMIT 5;
15.11.2 Per-database connection limits
ALTER DATABASE myapp CONNECTION LIMIT 100;
15.12 Auditing
15.12.1 Session logging
In postgresql.conf
:
log_connections = on
log_disconnections = on
15.12.2 Statement logging
log_statement = 'mod' -- Logs all data-modifying statements
15.13 Object ownership
Changing object ownership:
ALTER TABLE employees OWNER TO hr_manager;
15.14 Security best practices
- Use least privilege principle
- Regularly audit user access and permissions
- Use strong, unique passwords for each role
- Implement proper network security (firewalls, VPNs)
- Keep PostgreSQL updated with the latest security patches
15.15 Managing pg_hba.conf
The pg_hba.conf file controls client authentication:
# TYPE DATABASE USER ADDRESS METHOD
local all all peer
host all all 127.0.0.1/32 md5
host all all ::1/128 md5
15.16 Password policies
Implement password policies using PL/pgSQL functions and triggers.
15.17 Monitoring and logging
15.17.1 Failed authentication attempts
SELECT * FROM pg_stat_activity WHERE backend_type = 'client backend' AND state = 'active' AND wait_event_type = 'Client';
15.17.2 Logging suspicious activity
Set up log parsing and alerting for suspicious patterns in PostgreSQL logs.
15.18 Database firewall
Consider using an additional database firewall like pgBouncer or Pgpool-II for enhanced security.
15.19 Regular security audits
Perform regular security audits of your PostgreSQL installation, including:
- User and role review
- Permission checks
- Configuration file review
- Network security assessment
15.20 Secure backups
Ensure that your backups are encrypted and stored securely, with restricted access.
Security and user management in PostgreSQL 16 involve multiple layers of protection, from network-level security to fine-grained access controls within the database. Properly implementing these measures is crucial for protecting your data and ensuring compliance with security standards.
PostgreSQL vs other databases
16.1 PostgreSQL vs MySQL
16.1.1 Key differences
- PostgreSQL is more feature-rich and adheres more closely to SQL standards.
- MySQL is often considered easier to set up and manage for simpler applications.
16.1.2 Performance
- PostgreSQL generally performs better for complex queries and read-heavy workloads.
- MySQL traditionally performs better for simple, high-volume CRUD operations.
16.1.3 Data types
- PostgreSQL offers a wider range of built-in data types, including advanced types like JSONB, ARRAY, and user-defined types.
16.1.4 Replication
- Both offer robust replication solutions, but PostgreSQL’s logical replication is more flexible.
16.2 PostgreSQL vs Oracle
16.2.1 Licensing
- PostgreSQL is open-source and free; Oracle is proprietary with significant licensing costs.
16.2.2 Features
- Both are feature-rich, but Oracle has some enterprise features not present in PostgreSQL.
- PostgreSQL often implements new features faster due to its open-source nature.
16.2.3 Scalability
- Oracle traditionally has had an edge in very large database (VLDB) environments, but PostgreSQL has significantly closed this gap.
16.2.4 Extensibility
- PostgreSQL is highly extensible with its extension ecosystem.
16.3 PostgreSQL vs SQL server
16.3.1 Platform support
- PostgreSQL runs on virtually all platforms; SQL Server primarily targets Windows (though Linux support has improved).
16.3.2 T-SQL vs PL/pgSQL
- SQL Server uses T-SQL, while PostgreSQL uses PL/pgSQL. Both are powerful, but there are syntax and feature differences.
16.3.3 JSON support
- PostgreSQL’s JSONB type often outperforms SQL Server’s JSON capabilities.
16.3.4 Costs
- PostgreSQL is free and open-source; SQL Server has various editions with different costs.
16.4 PostgreSQL vs MongoDB
16.4.1 Data model
- PostgreSQL is a relational database; MongoDB is a document-oriented NoSQL database.
- PostgreSQL offers strong JSONB support, bridging some gaps with document stores.
16.4.2 Schema flexibility
- MongoDB offers more schema flexibility out of the box.
- PostgreSQL requires more upfront schema design but offers JSONB for schema-less data.
16.4.3 Transactions and ACID compliance
- PostgreSQL offers full ACID compliance.
- MongoDB has improved in this area but traditionally lagged in multi-document transactions.
16.5 PostgreSQL unique features
16.5.1 Extensibility
- PostgreSQL’s extension system is unparalleled, allowing for significant customization without forking the core code.
16.5.2 Advanced data types
- Support for arrays, hstore, and full-text search capabilities are built-in.
16.5.3 Indexing options
- PostgreSQL offers a wide variety of index types (B-tree, Hash, GiST, SP-GiST, GIN, and BRIN).
16.5.4 Full Text Search
- Built-in full-text search capabilities that rival dedicated search engines for many use cases.
16.6 When to choose PostgreSQL
16.6.1 Complex queries
- If your application requires complex queries or data analysis, PostgreSQL’s query planner is excellent.
16.6.2 Data integrity
- When strong data consistency and ACID compliance are crucial.
16.6.3 Extensibility needs
- If you need to extend database functionality with custom types or functions.
16.6.4 Geospatial data
- PostgreSQL with PostGIS is a powerful solution for geospatial applications.
16.7 Performance comparisons
While specific performance can vary based on workload and configuration:
- PostgreSQL often excels in read-heavy workloads and complex queries.
- For simple CRUD operations, PostgreSQL performance is competitive but may not always lead.
- PostgreSQL’s MVCC implementation provides excellent performance for concurrent transactions.
16.8 Community and ecosystem
- PostgreSQL has a strong, active open-source community.
- Regular releases with new features and improvements.
- Rich ecosystem of tools and extensions.
16.9 Cloud support
- All major cloud providers offer managed PostgreSQL services.
- PostgreSQL’s feature set makes it well-suited for cloud-native applications.
16.10 Emerging trends
- PostgreSQL is well-positioned for microservices architectures with its robust transactional support and JSON capabilities.
- Increasing adoption in big data ecosystems, with tools like Foreign Data Wrappers.
This comparison highlights PostgreSQL’s strengths in standards compliance, extensibility, and advanced features, while also acknowledging areas where other databases might have advantages. The choice of database should always be based on specific project requirements, including performance needs, budget constraints, existing team expertise, and specific feature requirements.
Best practices and common pitfalls
17.1 Database design best practices
17.1.1 Normalization
- Properly normalize your database to reduce redundancy and improve data integrity.
- Consider denormalization only when performance gains are significant and necessary.
17.1.2 Use appropriate data types
- Choose the most appropriate data type for each column (e.g., use INTEGER for whole numbers, not VARCHAR).
- Use BIGINT for IDs if you expect more than 2 billion rows.
17.1.3 Constraints
- Use constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL) to enforce data integrity.
- Implement CHECK constraints for domain integrity.
17.2 Indexing best practices
17.2.1 Index wisely
- Index columns used in WHERE, JOIN, and ORDER BY clauses.
- Avoid over-indexing; each index adds overhead to writes.
17.2.2 Use appropriate index types
- Use B-tree indexes for most scenarios.
- Consider GIN indexes for full-text search or jsonb columns.
- Use BRIN indexes for very large tables with natural clustering.
17.2.3 Partial indexes
- Use partial indexes to index only a subset of rows, reducing index size and improving performance.
17.3 Query optimization
17.3.1 Use EXPLAIN ANALYZE
- Regularly analyze your queries to understand and optimize execution plans.
17.3.2 Avoid SELECT *
- Only select the columns you need to reduce I/O and network traffic.
17.3.3 Use JOINs appropriately
- Prefer JOINs over subqueries when possible for better performance.
17.3.4 Utilize Common Table Expressions (CTEs)
- Use CTEs to improve query readability and optimization.
17.4 Performance tuning
17.4.1 Configure PostgreSQL for your hardware
- Adjust settings like shared_buffers, work_mem, and effective_cache_size based on your server’s resources.
17.4.2 Regular VACUUM and ANALYZE
- Schedule regular VACUUM and ANALYZE operations to maintain performance.
17.4.3 Use connection pooling
- Implement connection pooling (e.g., PgBouncer) for applications with many short-lived connections.
17.5 Security best practices
17.5.1 Principle of least privilege
- Grant users only the permissions they need.
17.5.2 Use SSL for connections
- Enable SSL to encrypt data in transit.
17.5.3 Implement row-level security
- Use RLS for fine-grained access control when needed.
17.6 Backup and recovery
17.6.1 Regular backups
- Implement a robust backup strategy, including regular full backups and WAL archiving.
17.6.2 Test recovery procedures
- Regularly test your recovery procedures to ensure they work when needed.
17.7 Common pitfalls
17.7.1 N+1 query problem
- Avoid executing a query for each result of a previous query. Use JOINs or subqueries instead.
17.7.2 Ignoring MVCC implications
- Be aware of how MVCC affects your transactions and query results.
17.7.3 Overuse of SERIAL/BIGSERIAL
- Consider using IDENTITY columns (introduced in PostgreSQL 10) instead of SERIAL for better standards compliance and performance.
17.7.4 Not managing table and index bloat
- Regular maintenance (VACUUM) is crucial to manage bloat and maintain performance.
17.7.5 Incorrect use of LIKE and regular expressions
- Avoid using LIKE with a leading wildcard (‘%string’) as it prevents the use of indexes.
17.7.6 Neglecting to update statistics
- Regularly run ANALYZE to update statistics for the query planner.
17.8 Scalability considerations
17.8.1 Partitioning large tables
- Use table partitioning for very large tables to improve query performance and manageability.
17.8.2 Consider read replicas
- Offload read-heavy workloads to read replicas to improve overall system performance.
17.9 Monitoring and logging
17.9.1 Set up proper logging
- Configure logging to capture slow queries, errors, and other important events.
17.9.2 Use monitoring tools
- Implement monitoring solutions (e.g., pg_stat_statements, pgBadger) to track database performance.
17.10 Version upgrades
17.10.1 Plan upgrades carefully
- Always review release notes and test upgrades in a staging environment before applying to production.
17.10.2 Use pg_upgrade
- Utilize pg_upgrade for faster major version upgrades when possible.
17.11 Development practices
17.11.1 Use database migrations
- Implement a database migration strategy for managing schema changes across environments.
17.11.2 Avoid ORM over-reliance
- While ORMs are convenient, be prepared to write raw SQL for complex queries to ensure optimal performance.
17.12 Concurrency management
17.12.1 Use advisory locks
- Utilize advisory locks for application-level locking scenarios.
17.12.2 Be aware of lock levels
- Understand different lock levels and their implications on concurrent operations.
By following these best practices and avoiding common pitfalls, you can ensure a more robust, efficient, and maintainable PostgreSQL implementation. Remember that database management is an ongoing process, requiring regular monitoring, tuning, and adaptation to changing requirements and workloads.
Conclusion
18.1 Key takeaways
18.1.1 Feature-rich
PostgreSQL 16 offers a robust set of features, including:
- Advanced data types (JSONB, Arrays, Geometric types)
- Powerful indexing options (B-tree, GIN, BRIN, etc.)
- Full-text search capabilities
- Extensibility through custom functions and extensions
18.1.2 Standards compliant
PostgreSQL adheres closely to SQL standards, making it a reliable choice for applications requiring strict compliance.
18.1.3 Scalability
With features like table partitioning, parallel query execution, and logical replication, PostgreSQL can scale to handle large datasets and high-concurrency scenarios.
18.1.4 Security
PostgreSQL provides a comprehensive security model, including:
- Role-based access control
- Row-level security
- Data encryption options
18.1.5 Performance
Through its advanced query planner and various optimization techniques, PostgreSQL offers excellent performance for a wide range of workloads.
18.2 PostgreSQL in modern architectures
18.2.1 Cloud-Native Applications
PostgreSQL’s features make it well-suited for cloud-native and microservices architectures.
18.2.2 Big Data Ecosystems
With Foreign Data Wrappers and JSON support, PostgreSQL integrates well with big data ecosystems.
18.2.3 OLTP and OLAP
PostgreSQL can handle both transactional (OLTP) and analytical (OLAP) workloads effectively.
18.3 Future outlook
18.3.1 Continuous improvement
The PostgreSQL community consistently delivers new features and improvements with each release.
18.3.2 Growing adoption
PostgreSQL’s adoption continues to grow in various sectors, from startups to large enterprises.
18.3.3 Emerging technologies
PostgreSQL is well-positioned to adapt to emerging technologies and changing data management needs.
18.4 Final recommendations
18.4.1 Invest in Learning
Given PostgreSQL’s rich feature set, invest time in learning its advanced capabilities to fully leverage its power.
18.4.2 Stay updated
Keep abreast of new PostgreSQL releases and features to take advantage of improvements and new capabilities.
18.4.3 Engage with the community
Participate in the PostgreSQL community for support, knowledge sharing, and potential contributions.
18.4.4 Regular maintenance
Implement a regular maintenance schedule for your PostgreSQL databases to ensure optimal performance and reliability.
18.4.5 Tailored approach
While following best practices, remember to tailor your PostgreSQL implementation to your specific use case and requirements.
18.5 Closing thoughts
PostgreSQL 16 stands as a testament to the power of open-source development and community-driven innovation. Its combination of robust features, reliability, and flexibility makes it an excellent choice for a wide range of database needs, from small applications to large-scale enterprise systems.
As data continues to grow in volume and complexity, PostgreSQL’s advanced features and extensibility position it well to meet future challenges in data management and analysis. Whether you’re a developer, database administrator, or decision-maker, PostgreSQL offers a powerful, cost-effective solution worthy of serious consideration for your data management needs.
By mastering PostgreSQL, you’re equipping yourself with a valuable skill set that is increasingly in demand across the technology industry. Continue exploring, learning, and pushing the boundaries of what you can achieve with this powerful database system.