PostgreSQL 13 : présentation

La nouvelle release officielle de PostgreSQL est sorti le 24 septembre 2020. Place à la version 13.

Celle ci est bien entendu, téléchargeable sur le site PostgreSQL.org sur lequel vous pourrez retrouver les sources, les derniers packages.
Il sera aussi, comme pour les autres versions, possible de mettre à jour vos fichiers “sources” pour les utilisateurs Linux et profiter de cette nouvelle version.

Les nouveautés

Comme à chaque version, PostgreSQL nous apporte son lot de nouveautés pour cette release 13.
Notons cette liste, non exhaustive, des quelques améliorations dans les domaines suivants :

Performances
- Mise en place des statistiques étendues pour l’amélioration des plans d’exécution, nouvelle vue “pg_statistic_ext_data“
- Le tri incrémentiel, afin d’accélérer le tri des données sur jointures, contrôlé via le paramètre enable_incremental_sort
- Optimisation sur les index B-Tree avec l’agrégation des valeurs en doublons.
- Agrégation de hachage et utilisation du stockage disque pour l’utilisation de grands ensembles, le tri peut aller au del de la valeur “work_mem”
- fonction FETCH FIRST WITH TIES pour ramener les n meilleurs résultats avec leurs exæquos (équivaut à un rank <= n).
Maintenance
- wal_keep_size remplace wal_keep_segments. La valeur à définir sera donc : Nb Wals x 16Mb.
- Parallélisation du VACUUM de table pour les index associés afin d’accélérer le traitement ( clause PARALLEL)
- Jointure directe entre partitions de tables “partitionwise joins“
- Estimation de la taille d’un backup avec “pg_basebackup” par défaut.
- Outil “pg_verifybackup” afin de vérifier l’intégrité d’un backup via “pg_basebackup” en mode “plain”.
- Les “trusted extensions”, un ‘superuser’ peut permettre à un autre utilisateur d’installer une extension qu’il aura validé auparavant.
- “pg_dump” peut inclure les données des tables externes (foreign table) avec le paramètre “–include-foreign-data”
- Paramètre “maintenance_io_concurrency” permettant de paralléliser certaines opérations des tâches de maintenance
Monitoring
- Surveiller la progression d’un backup en cours avec la vue pg_stat_progress_basebackup
- Surveiller la progression des stats en cours avec la vue pg_stat_progress_analyze
- Ajout du champ “leader_pid” dans ‘pg_stat_activty” dans le cas de l’utilisation du parallélisme
- Granularité plus fine sur les informations dans la mémoire partagée avec la vue “pg_shmem_allocations“

Quelques exemples d’améliorations

Regardons de plus près les quelques avancées de cette nouvelle release. Voyons si celles ci sont réellement pertinentes !

Le paritionning généré par PgBench

Nous pourrons, avec la release 13, créer des tables partitionnées via “pgbench”. L’option est la suivante

[postgres@pg13]$ pgbench --help | grep -i partition
--partition-method=(range|hash)
partition pgbench_accounts with this method (default: range)
--partitions=NUM partition pgbench_accounts into NUM parts (default: 0)

Pour notre article, partons sur un partitionnement de type range avec 10 partitions pour le moment.

[postgres@pg13]$ pgbench -i -s 10 --partitions=10 --partition-method=range --foreign-keys -p 5433 manuelo
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
creating 10 partitions...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 5.75 s, remaining 0.00 s)
vacuuming...
creating primary keys...
creating foreign keys...
done in 9.51 s (drop tables 0.00 s, create tables 0.04 s, client-side generate 5.80 s, vacuum 0.49 s, primary keys 2.80 s, foreign keys 0.37 s).

S’il l’on liste les objets créés dans notre base ‘manuelo’

manuelo=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+---------------------+-------------------+----------+-------------+------------+-------------
public | pgbench_accounts | partitioned table | postgres | permanent | 0 bytes |
public | pgbench_accounts_1 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_10 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_2 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_3 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_4 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_5 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_6 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_7 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_8 | table | postgres | permanent | 13 MB |
public | pgbench_accounts_9 | table | postgres | permanent | 13 MB |
public | pgbench_branches | table | postgres | permanent | 40 kB |
public | pgbench_history | table | postgres | permanent | 0 bytes |
public | pgbench_tellers | table | postgres | permanent | 40 kB |

Nous retrouvons nos tables ‘pgbench’ créées sur le schéma public, dont le propriétaire est ‘postgres’.

Nous remarquons la création de “pgbench_accounts” qui est partitionnée.

Afin d’effectuer les tests de comparaison nécessaires pour la suite de l’article, on copie les données de la base “manuelo” sur un moteur PostgreSQL 12. La méthode “pg_dump/pg_restore” sera utilisée pour cette copie

[postgres@pg13]$ pg_dump -v -f manuelo.dmp -n public manuelo
[postgres@pg12]$ psql -d manuelo -f manuelo.dmp

Le FETCH FIRST WITH TIES ça apporte quoi au juste ?

Dans certaines opérations de tris sur une requête, il est parfois utile de regarder quelles sont les n valeurs les plus fortes, en prenant en compte également leurs exæquos. Cela est possible en passant par un “ranking” via une sous requête d’un ensemble donnée. Avec PostgreSQL 13, ceci est simplifier avec l’écriture “WITH TIES”.
En version postgreSQL 12, on connaissait déjà le FETCH FIRST n ROWS ONLY. Voyons ce que peut apporter le WITH TIES.

Reprenons nos données chargées via PgBench auparavant.

PostgreSQL 12

manuelo=# show server_version;
server_version
----------------
12.0
(1 row)

On sait que notre table “pgbench_accounts” est partitionnée méthode RANGE sur le champ “aid”.

manuelo=# \d+ pgbench_accounts
Partitioned table "public.pgbench_accounts"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------+-----------+----------+---------+----------+--------------+-------------
aid | integer | | not null | | plain | |
bid | integer | | | | plain | |
abalance | integer | | | | plain | |
filler | character(84) | | | | extended | |
Partition key: RANGE (aid)
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Foreign-key constraints:
"pgbench_accounts_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
Referenced by:
TABLE "pgbench_history" CONSTRAINT "pgbench_history_aid_fkey" FOREIGN KEY (aid) REFERENCES pgbench_accounts(aid)
Partitions: pgbench_accounts_1 FOR VALUES FROM (MINVALUE) TO (100001),
pgbench_accounts_10 FOR VALUES FROM (900001) TO (MAXVALUE),
pgbench_accounts_2 FOR VALUES FROM (100001) TO (200001),
pgbench_accounts_3 FOR VALUES FROM (200001) TO (300001),
pgbench_accounts_4 FOR VALUES FROM (300001) TO (400001),
pgbench_accounts_5 FOR VALUES FROM (400001) TO (500001),
pgbench_accounts_6 FOR VALUES FROM (500001) TO (600001),
pgbench_accounts_7 FOR VALUES FROM (600001) TO (700001),
pgbench_accounts_8 FOR VALUES FROM (700001) TO (800001),
pgbench_accounts_9 FOR VALUES FROM (800001) TO (900001)

Chaque partition contient donc 100 000 lignes.

Voyons donc s’il l’on interroge le TOP 10 des plus grands “account id” et leurs “branche id” attachés avec tri descendant sur branche id.

manuelo=# select aid , bid from pgbench_accounts order by bid desc fetch first 10 rows only;
aid | bid
--------+-----
900002 | 10
900003 | 10
900004 | 10
900005 | 10
900006 | 10
900007 | 10
900008 | 10
900009 | 10
900010 | 10
900001 | 10
(10 rows)

le plan d’exécution est le suivant :

manuelo=# explain analyze select aid , bid from pgbench_accounts order by bid desc fetch first 10 rows only;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34369.82..34370.98 rows=10 width=8) (actual time=2736.423..2736.528 rows=10 loops=1)
-> Gather Merge (cost=34369.82..131599.60 rows=833340 width=8) (actual time=2736.421..2736.495 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=33369.79..34411.47 rows=416670 width=8) (actual time=2681.387..2681.394 rows=10 loops=3)
Sort Key: pgbench_accounts_1.bid DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Append (cost=0.00..24365.70 rows=416670 width=8) (actual time=0.012..1945.815 rows=333333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_1 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.006..240.764 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_2 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.013..296.642 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_3 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.032..220.039 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_4 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.012..254.346 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_5 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.025..202.572 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_6 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.010..296.606 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_7 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.014..69.118 rows=33333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_8 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.006..184.851 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_9 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.007..250.804 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_10 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.010..231.848 rows=100000 loops=1)
Planning Time: 0.186 ms
Execution Time: 2736.588 ms
(22 rows)

Time: 2737.157 ms (00:02.737)

Il nous faut environ 2 secondes pour ramener le résultat de l’analyse. Rappelons que l’optimiseur parcourt l’ensemble des partitions dans la mesure on nous n’avons pas de clause WHERE.

Voyons maintenant le plan d’exécution si l’on souhaite ramener le TOP 10 des plus grands “account id” et leurs “branche id” attachés triés par “branche id” descendant, ainsi que les exæquos. Au vu de la conception de notre champs “bid”(valeurs égales par partition de la table “pgbench_accounts”), nous devrions ramener toutes les lignes de la partition 10 soit 100 000 lignes

manuelo=# explain analyze select aid , bid from (select aid , bid , rank() over (order by bid desc) from pgbench_accounts) as acc where rank <= 10 order by bid desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on acc (cost=144732.34..174732.34 rows=333333 width=8) (actual time=4288.950..9857.908 rows=100000 loops=1)
Filter: (acc.rank <= 10)
Rows Removed by Filter: 900000
-> WindowAgg (cost=144732.34..162232.34 rows=1000000 width=16) (actual time=4288.946..9109.634 rows=1000000 loops=1)
-> Sort (cost=144732.34..147232.34 rows=1000000 width=8) (actual time=2834.834..3555.152 rows=1000000 loops=1)
Sort Key: pgbench_accounts_1.bid DESC
Sort Method: external merge Disk: 17696kB
-> Append (cost=0.00..31400.00 rows=1000000 width=8) (actual time=0.014..2052.294 rows=1000000 loops=1)
-> Seq Scan on pgbench_accounts_1 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.012..75.332 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_2 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.012..75.270 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_3 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.012..77.936 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_4 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.024..76.067 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_5 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.010..78.318 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_6 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.019..81.886 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_7 (cost=0.00..2640.00 rows=100000 width=8) (actual time=1.330..82.675 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_8 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.010..77.019 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_9 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.011..76.075 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_10 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.011..76.073 rows=100000 loops=1)
Planning Time: 0.232 ms
Execution Time: 9924.785 ms
(20 rows)

Time: 9925.463 ms (00:09.925)

Nous mettons un peu plus de 9 secondes pour cette opération, avec du tri sur disque (un peu plus de 17 Mo).

Voyons maintenant comment cela se comporte en PostgreSQL 13

PostgreSQL 13

Exécutons les mêmes requêtes

manuelo=# show server_version;
server_version
----------------
13.0
(1 row)

la 1ere requête avec le TOP 10 des plus grands “account id” et leurs “branche id” attachés triés par “branche id” descendant.

manuelo=# select aid , bid from pgbench_accounts order by bid desc fetch first 10 rows only;
aid | bid
--------+-----
900002 | 10
900003 | 10
900004 | 10
900005 | 10
900006 | 10
900007 | 10
900008 | 10
900009 | 10
900010 | 10
900001 | 10
(10 rows)

Pour le plan d’exécution c’est le même qu’en 12 avec quasiment les mêmes perfs, soit à peu près 2 secondes d’analyse.

manuelo=# explain analyze select aid , bid from pgbench_accounts order by bid desc fetch first 10 rows only;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34369.82..34370.98 rows=10 width=8) (actual time=2713.365..2713.483 rows=10 loops=1)
-> Gather Merge (cost=34369.82..131599.60 rows=833340 width=8) (actual time=2713.362..2713.465 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=33369.79..34411.47 rows=416670 width=8) (actual time=2691.259..2691.275 rows=10 loops=3)
Sort Key: pgbench_accounts.bid DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Append (cost=0.00..24365.70 rows=416670 width=8) (actual time=0.010..1921.334 rows=333333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_1 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.006..275.988 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_2 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.009..187.432 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_3 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.006..186.142 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_4 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.005..206.952 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_5 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.010..164.090 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_6 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.007..163.891 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_7 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.007..91.653 rows=33333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_8 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.004..140.797 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_9 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.005..184.767 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_10 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.008..165.512 rows=100000 loops=1)
Planning Time: 0.201 ms
Execution Time: 2713.549 ms
(22 rows)

Puis le TOP 10 des plus grands “account id” et leurs “branche id” attachés ainsi que les exæquos avec la méthode via la fonction ‘ranking’.

manuelo=# explain analyze select aid , bid from (select aid , bid , rank() over (order by bid desc) from pgbench_accounts) as acc where rank <= 10 order by bid desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on acc (cost=131057.84..161057.84 rows=333333 width=8) (actual time=2962.088..5939.797 rows=100000 loops=1)
Filter: (acc.rank <= 10)
Rows Removed by Filter: 900000
-> WindowAgg (cost=131057.84..148557.84 rows=1000000 width=16) (actual time=2962.085..5234.609 rows=1000000 loops=1)
-> Sort (cost=131057.84..133557.84 rows=1000000 width=8) (actual time=2816.923..3508.142 rows=1000000 loops=1)
Sort Key: pgbench_accounts.bid DESC
Sort Method: external merge Disk: 17624kB
-> Append (cost=0.00..31400.00 rows=1000000 width=8) (actual time=0.011..2029.690 rows=1000000 loops=1)
-> Seq Scan on pgbench_accounts_1 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.009..78.495 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_2 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.009..77.523 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_3 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.008..82.124 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_4 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.009..76.659 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_5 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.032..82.713 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_6 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.006..76.210 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_7 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.012..75.231 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_8 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.011..74.480 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_9 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.010..74.386 rows=100000 loops=1)
-> Seq Scan on pgbench_accounts_10 (cost=0.00..2640.00 rows=100000 width=8) (actual time=0.011..77.242 rows=100000 loops=1)
Planning Time: 0.216 ms
Execution Time: 6006.397 ms
(20 rows)

Les performances sont identiques à la version 12.
Maintenant, utilisons la nouvelle méthode FETCH FIRST WITH TIES pour nous retourner le même résultat de données.

manuelo=# explain analyze select aid , bid from pgbench_accounts order by bid desc fetch first 10 rows with ties;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34369.82..34370.98 rows=10 width=8) (actual time=2741.817..3067.730 rows=100000 loops=1)
-> Gather Merge (cost=34369.82..131599.60 rows=833340 width=8) (actual time=2741.814..2941.271 rows=100001 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=33369.79..34411.47 rows=416670 width=8) (actual time=2714.034..2736.566 rows=34426 loops=3)
Sort Key: pgbench_accounts.bid DESC
Sort Method: quicksort Memory: 28076kB
Worker 0: Sort Method: quicksort Memory: 27841kB
Worker 1: Sort Method: quicksort Memory: 27824kB
-> Parallel Append (cost=0.00..24365.70 rows=416670 width=8) (actual time=0.010..1973.108 rows=333333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_1 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.006..254.568 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_2 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.011..208.208 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_3 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.007..207.545 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_4 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.007..207.043 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_5 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.005..229.658 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_6 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.005..163.504 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_7 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.008..80.356 rows=33333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_8 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.004..273.048 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_9 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.004..173.430 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_10 (cost=0.00..2228.24 rows=58824 width=8) (actual time=0.008..162.951 rows=100000 loops=1)
Planning Time: 0.196 ms
Execution Time: 3129.996 ms
(22 rows)

Nous ramenons également 100 000 lignes, mais avec une exécution en 3 secondes pour ce EXPLAIN ANALYZE au lieu des 6 secondes avec le mode ranking.
Sur la version PostgreSQL 12, nous avions même un temps d’exécution d’environ 9 secondes.
De plus, on voit que tous nos tris se font avec 2 workers, et en mémoire — > Sort Method: quicksort Memory: 28076kB

Nous optimisons donc les IO et le temps d’exécution avec cette fonction.

Le tri incrémental (incremental sort), optimiser les tris dans vos jointures !

Autre nouveauté de la version 13 de PostgreSQL , le tri incrémental. Très intéressant pour des requêtes de type “TOP n” avec “order by” et “limit”, pour l’optimiseur, il n’est pas nécessaire de revérifier les ensembles de tris précédents lorsque vous effectuez un tri complet de données.
Avec un exemple concret, ce sera bien mieux :

Soit 2 colonnes A et B, nous pourrions avoir les valeurs suivantes :

Avec le “incremental sort”, l’optimiseur PostgreSQL va alors classer les données en fonction de la colonne A pour faire des sous ensembles avec les valeurs de B, soit :

25,65 25,77678 25,90
32,89 32,176889 32,7
55,89 55,779

On constatera alors un gain en terme de coût, et en IO car l’optimiseur effectuera des tris ensemblistes sur bien moins de lignes .

PostgreSQL 12

Prenons un exemple avec une jointure entre la table “pgbench_accounts” et “pgbench_branches”.

manuelo=# select a.aid, b.bid, a.abalance from pgbench_accounts a,pgbench_branches b 
where a.bid=b.bid order by a.aid desc,b.bid desc fetch first 10 rows only;
aid | bid | abalance
---------+-----+----------
1000000 | 10 | 0
999999 | 10 | 0
999998 | 10 | 0
999997 | 10 | 0
999996 | 10 | 0
999995 | 10 | 0
999994 | 10 | 0
999993 | 10 | 0
999992 | 10 | 0
999991 | 10 | 0
(10 rows)

Time: 429.901 ms

400ms pour l’exécution de cette requête. Regardons le plan.

manuelo=# explain analyze select a.aid, b.bid, a.abalance from pgbench_accounts a,pgbench_branches b 
where a.bid=b.bid order by a.aid desc,b.bid desc fetch first 10 rows only;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=35928.28..35929.45 rows=10 width=12) (actual time=4207.405..4207.509 rows=10 loops=1)
-> Gather Merge (cost=35928.28..133157.37 rows=833334 width=12) (actual time=4207.402..4207.479 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=34928.26..35969.92 rows=416667 width=12) (actual time=4186.212..4186.219 rows=10 loops=3)
Sort Key: a.aid DESC, b.bid DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Hash Join (cost=1.23..25924.23 rows=416667 width=12) (actual time=3.593..3388.957 rows=333333 loops=3)
Hash Cond: (a.bid = b.bid)
-> Parallel Append (cost=0.00..24365.70 rows=416670 width=12) (actual time=3.483..1976.369 rows=333333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_1 a (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.007..187.719 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_2 a_1 (cost=0.00..2228.24 rows=58824 width=12) (actual time=10.429..277.525 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_3 a_2 (cost=0.00..2228.24 rows=58824 width=12) (actual time=10.436..232.550 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_4 a_3 (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.005..253.731 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_5 a_4 (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.012..187.681 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_6 a_5 (cost=0.00..2228.24 rows=58824 width=12) (actual time=10.865..234.133 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_7 a_6 (cost=0.00..2228.24 rows=58824 width=12) (actual time=3.713..88.257 rows=33333 loops=3)
-> Parallel Seq Scan on pgbench_accounts_8 a_7 (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.004..253.161 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_9 a_8 (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.005..230.784 rows=100000 loops=1)
-> Parallel Seq Scan on pgbench_accounts_10 a_9 (cost=0.00..2228.24 rows=58824 width=12) (actual time=0.007..120.950 rows=100000 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=4) (actual time=0.029..0.030 rows=10 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10 width=4) (actual time=0.009..0.017 rows=10 loops=3)
Planning Time: 0.335 ms
Execution Time: 4207.612 ms
(27 rows)

Pour ramener les 10 lignes de résultat, nous effectuons un tri sur plus de 416000 lignes.

PostgreSQL 13

Voyons maintenant cette même requête en version 13. Le plan d’exécution sera le suivant

manuelo=# explain analyze select a.aid, b.bid, a.abalance from pgbench_accounts a,pgbench_branches b 
where a.bid=b.bid order by a.aid desc,b.bid desc fetch first 10 rows only;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.12..5.42 rows=10 width=12) (actual time=0.229..0.256 rows=10 loops=1)
-> Incremental Sort (cost=3.12..230201.32 rows=1000000 width=12) (actual time=0.227..0.241 rows=10 loops=1)
Sort Key: a.aid DESC, b.bid DESC
Presorted Key: a.aid
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
-> Nested Loop (cost=2.92..185201.32 rows=1000000 width=12) (actual time=0.050..0.218 rows=11 loops=1)
Join Filter: (a.bid = b.bid)
Rows Removed by Join Filter: 99
-> Append (cost=2.92..47472.92 rows=1000000 width=12) (actual time=0.013..0.036 rows=11 loops=1)
-> Index Scan Backward using pgbench_accounts_10_pkey on pgbench_accounts_10 a_10 (cost=0.29..4247.29 rows=100000 width=12) (actual time=0.011..0.020 rows=11 loops=1)
-> Index Scan Backward using pgbench_accounts_9_pkey on pgbench_accounts_9 a_9 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_8_pkey on pgbench_accounts_8 a_8 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_7_pkey on pgbench_accounts_7 a_7 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_6_pkey on pgbench_accounts_6 a_6 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_5_pkey on pgbench_accounts_5 a_5 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_4_pkey on pgbench_accounts_4 a_4 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_3_pkey on pgbench_accounts_3 a_3 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_2_pkey on pgbench_accounts_2 a_2 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Index Scan Backward using pgbench_accounts_1_pkey on pgbench_accounts_1 a_1 (cost=0.29..4247.29 rows=100000 width=12) (never executed)
-> Materialize (cost=0.00..1.15 rows=10 width=4) (actual time=0.001..0.008 rows=10 loops=11)
-> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10 width=4) (actual time=0.005..0.012 rows=10 loops=1)
Planning Time: 0.328 ms
Execution Time: 0.308 ms
(23 rows)

On ramène ces 10 lignes en effectuant du tri incrémental sur seulement 100 000 lignes avec un temps de parsing de 0,2400ms.

A l’exécution, le résultat est sans appel :

manuelo=# select a.aid, b.bid, a.abalance from pgbench_accounts a,pgbench_branches b 
where a.bid=b.bid order by a.aid desc,b.bid desc fetch first 10 rows only;
aid | bid | abalance
---------+-----+----------
1000000 | 10 | 0
999999 | 10 | 0
999998 | 10 | 0
999997 | 10 | 0
999996 | 10 | 0
999995 | 10 | 0
999994 | 10 | 0
999993 | 10 | 0
999992 | 10 | 0
999991 | 10 | 0
(10 rows)

Time: 0.895 ms

Moins de 1ms alors que sur la version 12, nous étions à plus de 400ms.

Pourquoi certains de mes index B-Tree prennent moins de place en PostgreSQL 13 ?

Une nouvelle fonctionnalité également apparu avec la version 13, c’est l’agrégation des valeurs en doublons dans les index. En quelque sorte, nous ne répétons pas, dans l’index, chacune des valeurs en double, nous faisons de la déduplication !
Voyons comment ca marche.

Prenons notre champs “bid” de notre table “pgbench_accounts”. Nous savons que celui ci comporte le numéro de branche pour chaque “account id” de notre table. Nous aurons donc des valeurs qui vont se répéter sur chaque partition. Dans notre exemple, nous travaillerons par partition.

PostgreSQL 12.

Créons 2 index sur la partition 1 de notre table “pgbench_accounts”.
– 1 index pour le champs “aid”
– 1 index pour le champs “bid”

manuelo=# \d pgbench_accounts
Partitioned table "public.pgbench_accounts"
Column | Type | Collation | Nullable | Default
----------+---------------+-----------+----------+---------
aid | integer | | not null |
bid | integer | | |
abalance | integer | | |
filler | character(84) | | |
Partition key: RANGE (aid)

On sait que pour “bid”, la valeur 1 se répète pour cette partition 1

manuelo=# select * from pgbench_accounts_1 limit 10;
aid | bid | abalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
1 | 1 | 0 |
2 | 1 | 0 |
3 | 1 | 0 |
4 | 1 | 0 |
5 | 1 | 0 |
6 | 1 | 0 |
7 | 1 | 0 |
8 | 1 | 0 |
9 | 1 | 0 |
10 | 1 | 0 |
(10 rows)

manuelo=# create index account_id_1 on pgbench_accounts_1(aid);
CREATE INDEX


manuelo=# create index account_bid_1 on pgbench_accounts_1(bid);
CREATE INDEX

On passe les stats sur cette partition

manuelo=# vacuum analyse pgbench_accounts_1;
VACUUM

Voyons maintenant les tailles de chacun de ces index

manuelo=# select pg_table_size('account_id_1')/1024 as "Taille aid Ko";
Taille aid Ko
-----------
2208


manuelo=# select pg_table_size('account_bid_1')/1024 as "Taille bid Ko";
Taille bid Ko
-------------
2224

Nous avons donc 2 index sur cette table, champs “aid” et “bid”, qui font à peu près la même taille, soit 2Mo.

PostgreSQL 13

Effectuons la même opération sur PostgreSQL 13

manuelo=# create index account_bid_1 on pgbench_accounts_1(bid);
CREATE INDEX


manuelo=# create index account_id_1 on pgbench_accounts_1(aid);
CREATE INDEX


manuelo=# vacuum analyse pgbench_accounts_1;
VACUUM

Regardons les tailles de chacun des index

manuelo=# select pg_table_size('account_id_1')/1024 as "Taille aid Ko";
Taille aid Ko
---------------
2208
(1 row)

manuelo=# select pg_table_size('account_bid_1')/1024 as "Taille bid Ko";
Taille bid Ko
---------------
696
(1 row)

On voit que l’index sur “aid” fait toujours environ 2Mo, en effet, celui ci travaille sur des valeurs distinctes car c’est une contrainte unique sur ce champs.

En revanche, pour “bid”, comme nous n’avons que des 1 en valeur, celui ci ne fait plus que 600ko, PostgreSQL sait agréger la valeur 1 et la déduplique afin d’optimiser la place disque.

Vérifier l’intégrité de ses backup

Dans vos scripts de maintenance PostgreSQL, vous utilisez très certainement des appels aux outils tels “pg_basebackup” et “pg_archivecleanup” pour gérer les sauvegardes de bases et des WALs associés.
Il faudra maintenant compter sur un nouvel outil pour vérifier l’intégrité de vos sauvegardes, qui est “pg_verifybackup”.

L’outil “pg_verifybackup” va effectuer des opérations de checksum dans le backup à la manière d’un RESTORE VERIFY ONLY pour SQL Server. C’est pour cela qu’il est nécessaire de faire un backup via “pg_basebackup” en mode “plain” et non pas tar.

Effectuons un backup de notre instance PostgreSQL 13, mode “plain”, avec les WALs en mode fetch.

[postgres@pg13]$ mkdir -p /data/postgres/backup/bkp_PG13
[postgres@pg13]$ pg_basebackup -D /data/postgres/backup/bkp_PG13 -Fp -Xf -v -P
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/B4000028 on timeline 2
204392/204392 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/B4000100
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

S’il l’on regarde le contenu du répertoire, nous avons les fichiers PostgreSQL classiques comme nous savons le trouver dans un backup en mode “plain”.
Nous trouvons maintenant un fichier nommé “backup_manifest”.

Ce fichier est éditable, nous avons toutes les informations sur les différents fichiers composants notre instance. Le nom du fichier, la date de modification, l’algorithme de checksum, sa taille, et la valeur du checksum.
Voici un extrait de celui ci :

[postgres@PG13]$ head -n 15 backup_manifest
{ "PostgreSQL-Backup-Manifest-Version": 1,
"Files": [
{ "Path": "backup_label", "Size": 226, "Last-Modified": "2020-10-16 15:59:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "ea4e7ad9" },
{ "Path": "current_logfiles", "Size": 30, "Last-Modified": "2020-10-16 09:19:39 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "f6838adf" },
{ "Path": "pg_multixact/members/0000", "Size": 8192, "Last-Modified": "2020-10-12 16:20:35 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "pg_multixact/offsets/0000", "Size": 8192, "Last-Modified": "2020-10-16 09:24:39 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "base/16403/16576", "Size": 8192, "Last-Modified": "2020-10-12 16:56:22 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "4fbeb31c" },
{ "Path": "base/16403/16659", "Size": 2260992, "Last-Modified": "2020-10-13 15:41:00 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "0fc9222c" },
{ "Path": "base/16403/16461", "Size": 16384, "Last-Modified": "2020-10-13 15:44:15 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "0e627e6f" },
{ "Path": "base/16403/3381", "Size": 0, "Last-Modified": "2020-10-12 16:20:35 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "base/16403/16475", "Size": 16384, "Last-Modified": "2020-10-12 16:55:58 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "ccdbb572" },
{ "Path": "base/16403/16489", "Size": 8192, "Last-Modified": "2020-10-12 16:56:01 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "82a0e924" },
{ "Path": "base/16403/2602_fsm", "Size": 24576, "Last-Modified": "2020-10-12 16:20:35 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "5786bbba" },
{ "Path": "base/16403/16648_fsm", "Size": 24576, "Last-Modified": "2020-10-13 15:41:02 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "9e571b03" },
{ "Path": "base/16403/16497", "Size": 16384, "Last-Modified": "2020-10-13 15:44:16 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "bb227b8f" },

Ceci ressemble fortement à un formatage JSON qui sera exploité par “pg_verifybackup”, afin de valider notre sauvegarde.
On lance la commande :

[postgres@pg13]$ pg_verifybackup -e /data/postgres/backup/bkp_PG13

Attention, il est possible que vous ayez une erreur lors de la vérification des WALs, en effet, “pg_verifybackup” s’appuie sur l’outil “pg_waldump” pour chercher les WALs dans la sauvegarde, selon la Timeline en cours.
Ceci pourra se vérifier dans le fichier manifest :

"WAL-Ranges": [
{ "Timeline": 2, "Start-LSN": "0/9B0000A0", "End-LSN": "0/C1000100" }
],

L’erreur que vous pourriez rencontré est la suivante :

pg_waldump: fatal: could not find file "00000002000000000000009B": No such file or directory
pg_verifybackup: error: WAL parsing failed for timeline 2

D’autant plus, si vous êtes en mode “archive_mode” avec un “archive_command” renseigné qui envoie les WALs vers un répertoire d’archive.
En effet, “pg_basebackup” ne sauvegarde pas les archives envoyés vers un répertoire autre que ce qu’il y’a dans “pg_wal”.
Si vous souhaitez éviter de contrôler tous les WALs liés à cette ‘timeline’ courante, il sera possible de passer la commande avec l’option “-n” :

[postgres@ip-172-44-2-98 pg_wal]$ pg_verifybackup --help | grep "wal"
-n, --no-parse-wal do not try to parse WAL files

[postgres@ip-172-44-2-98 pg_wal]$ pg_verifybackup -e /data/postgres/backup/bkp_PG13 -n
backup successfully verified

Merci à vous !

N’hésitez pas à laisser un commentaire.

Continuez votre lecture sur le blog :

“Pruning” de partitions sous PostgreSQL ou comment bien élaguer ! (Capdata team) [PostgreSQL]
PostgreSQL Basics : lire un plan d’exécution comme un·e pro (ou presque) (Sarah FAVEERE) [PostgreSQL]
PostgreSQL : optimiser vos opérations vacuum et analyze ! (Emmanuel RAMI) [PostgreSQL]
Nouveautés pg_stat_statements avec PostgreSQL 15 (David Baffaleuf) [PostgreSQL]
PostgreSQL : évolution du partitionnement de 9.6 à 12 (2/2) (Capdata team) [PostgreSQL]

Les nouveautés

Quelques exemples d’améliorations

Le paritionning généré par PgBench

Le FETCH FIRST WITH TIES ça apporte quoi au juste ?

PostgreSQL 12

PostgreSQL 13

Le tri incrémental (incremental sort), optimiser les tris dans vos jointures !

Pourquoi certains de mes index B-Tree prennent moins de place en PostgreSQL 13 ?

Vérifier l’intégrité de ses backup

Continuez votre lecture sur le blog :

Emmanuel RAMI

Laisser un commentaire

GROUPE OSMOZIUM

CAP DATA CONSULTING

CAP DATA NANTES