fix: disable management of end-of-wal file flag during backup restoration (#604)

When the end of the WAL stream is reached, the parallel WAL restore
feature attempts to predict the names of subsequent WAL files to restore
and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first
missing WAL file, the code returns an error status that prompts
PostgreSQL to switch to streaming replication.

Currently, the code assumes a `wal_segment_size` of 16MB for predicting
the next WAL file names. If the configured WAL segment size exceeds
16MB, it may request non-existent WAL files. For instance, with 16MB
segments, the names would range from `000000010000000100000000` to
`0000000100000001000000FF` before moving to the next segment. For 1GB
segments, they would range from `000000010000000100000000` to
`000000010000000100000003`.

With the assumption of a 16MB segment size, the code will not find the
WALs from `000000010000000100000004` to `0000000100000001000000FF`.

While this assumption does not affect HA replicas - which can shift to
streaming mode - it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during
replication, addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Fixes: #603

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
This commit is contained in:
Leonardo Cecchi 2025-10-17 19:16:54 +02:00 committed by GitHub
parent 8ec400aae7
commit 931a06a407
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -428,7 +428,14 @@ func isStreamingAvailable(cluster *cnpgv1.Cluster, podName string) bool {
return false
}
// Easy case: If this pod is a replica, the streaming is always available
// Easy case take 1: we are helping PostgreSQL to create the first
// instance of a Cluster. No streaming connection is possible.
if cluster.Status.CurrentPrimary == "" {
return false
}
// Easy case take 2: If this pod is a replica, the streaming is always
// available
if cluster.Status.CurrentPrimary != podName {
return true
}