From f04f3807c665ac5b5e219aceff5456eef6c7ee22 Mon Sep 17 00:00:00 2001 From: Leonardo Cecchi Date: Thu, 16 Oct 2025 15:18:10 +0200 Subject: [PATCH] fix: disable management of end-of-wal file flag during backup restoration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When the end of the WAL stream is reached, the parallel WAL restore feature attempts to predict the names of subsequent WAL files to restore and records the first missing WAL file. On high-availability (HA) replicas, if PostgreSQL requests the first missing WAL file, the code returns an error status that prompts PostgreSQL to switch to streaming replication. Currently, the code assumes a wal_segment_size of 16MB for predicting the next WAL file names. If the configured WAL segment size exceeds 16MB, it may request non-existent WAL files. For instance, with 16MB segments, the names would range from 000000010000000100000000 to 0000000100000001000000FF before moving to the next segment. For 1GB segments, they would range from 000000010000000100000000 to 000000010000000100000003. With the assumption of a 16MB segment size, the code will not find the WALs from 000000010000000100000004 to 0000000100000001000000FF. While this assumption does not affect HA replicas—which can shift to streaming mode—it's problematic for a PostgreSQL instance seeking consistency after a restore, as the restore process will fail. This patch disables end-of-wal file marker management during replication, addressing restore issues for backups that were: 1. using a custom WAL file segment size 2. utilizing parallel WAL recovery 3. initiated on one WAL segment and concluded on a different one Signed-off-by: Leonardo Cecchi --- internal/cnpgi/common/wal.go | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/internal/cnpgi/common/wal.go b/internal/cnpgi/common/wal.go index 0bd42ec..8e58cb4 100644 --- a/internal/cnpgi/common/wal.go +++ b/internal/cnpgi/common/wal.go @@ -428,7 +428,14 @@ func isStreamingAvailable(cluster *cnpgv1.Cluster, podName string) bool { return false } - // Easy case: If this pod is a replica, the streaming is always available + // Easy case take 1: we are helping PostgreSQL to create the first + // instance of a Cluster. No streaming connection is possible. + if cluster.Status.CurrentPrimary == "" { + return false + } + + // Easy case take 2: If this pod is a replica, the streaming is always + // available if cluster.Status.CurrentPrimary != podName { return true }