mirror of https://github.com/cloudnative-pg/plugin-barman-cloud.git synced 2026-01-12 13:43:10 +01:00

Jeff Mealo 8aabd2860b docs(web): add troubleshooting guide and improve navigation

- Create troubleshooting page with debugging steps and known issues
- Add hero subtitle and quick links to landing page
- Document kubectl-cnpg backup command syntax changes
- Include migration notes for users switching from in-tree backup

Signed-off-by: Jeff Mealo <jmealo@protonmail.com>

2025-09-19 10:35:07 +02:00

18 KiB

Raw Blame History

sidebar_position
50

Troubleshooting

This guide helps you diagnose and resolve common issues with the Barman Cloud Plugin.

Before You Begin

Recommended Upgrades

:::important CloudNativePG 1.27.0 offers significantly improved error and status reporting for plugins. If you're experiencing issues, we strongly recommend upgrading to version 1.27.0 or later for better diagnostics.

Upgrade CloudNativePG: Follow the official upgrade guide
Update kubectl-cnpg plugin: Install or update the kubectl plugin for better debugging capabilities. See the kubectl plugin documentation :::

Viewing Logs

To effectively troubleshoot issues, you need to check logs from multiple sources:

:::note The following commands assume you've installed the CloudNativePG operator in the default cnpg-system namespace. If you've installed it in a different namespace, adjust the commands accordingly. :::

# View operator logs (contains plugin interaction logs)
# Assumes operator is installed in the default cnpg-system namespace
kubectl logs -n cnpg-system deployment/cnpg-controller-manager -f

# View sidecar container logs (barman-cloud operations)
kubectl logs -n <namespace> <cluster-pod-name> -c plugin-barman-cloud -f

# View plugin manager logs
kubectl logs -n cnpg-system deployment/barman-cloud -f

# View all containers in a pod
kubectl logs -n <namespace> <cluster-pod-name> --all-containers=true

# View previous container logs (if container restarted)
kubectl logs -n <namespace> <cluster-pod-name> -c plugin-barman-cloud --previous

Common Issues

Plugin Installation Issues

Plugin pods not starting

Symptoms:

Plugin pods are in CrashLoopBackOff or Error state
Plugin deployment is not ready

Possible causes and solutions:

Certificate issues

# Check if cert-manager is installed and running
kubectl get pods -n cert-manager

# Check if the plugin certificate is created
kubectl get certificates -n cnpg-system

If cert-manager is not installed, install it first:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

Image pull errors

# Check pod events for image pull errors
kubectl describe pod -n cnpg-system -l app.kubernetes.io/name=barman-cloud

Verify the image exists and you have proper credentials if using a private registry.

Resource constraints
```
# Check node resources
kubectl top nodes
kubectl describe nodes
```
Ensure your cluster has sufficient CPU and memory resources.

Backup Failures

Quick Backup Troubleshooting Checklist

When a backup fails, follow these steps in order:

Check backup status: kubectl get backups.postgresql.cnpg.io -n <namespace>

Get error details and target pod:

kubectl describe backups.postgresql.cnpg.io -n <namespace> <backup-name>
# Or extract just the target pod name
kubectl get backups.postgresql.cnpg.io -n <namespace> <backup-name> -o jsonpath='{.status.instanceID.podName}'

Check the specific target pod's sidecar logs:

TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n <namespace> <backup-name> -o jsonpath='{.status.instanceID.podName}')
kubectl logs -n <namespace> $TARGET_POD -c plugin-barman-cloud --tail=100 | grep -E "ERROR|FATAL|panic"

Check cluster events: kubectl get events -n <namespace> --field-selector involvedObject.name=<cluster-name> --sort-by='.lastTimestamp'
Verify plugin is running: kubectl get pods -n cnpg-system -l app.kubernetes.io/name=barman-cloud
Check operator logs: kubectl logs -n cnpg-system deployment/cnpg-controller-manager --tail=100 | grep -i "backup\|plugin"
Check plugin manager logs: kubectl logs -n cnpg-system deployment/barman-cloud --tail=100

Backup job fails immediately

Symptoms:

Backup pods terminate with error
No backup files appear in object storage
Backup shows failed phase with various error messages

Common failure modes and solutions:

"requested plugin is not available" errors

ERROR: requested plugin is not available: barman
ERROR: requested plugin is not available: barman-cloud
ERROR: requested plugin is not available: barman-cloud.cloudnative-pg.io

Cause: The plugin name in the Cluster configuration doesn't match the deployed plugin or the plugin isn't registered

Solution:

a. Check plugin registration status:

# If you have kubectl-cnpg plugin installed (v1.27.0+)
kubectl cnpg status -n <namespace> <cluster-name>

Look for the "Plugins status" section:

Plugins status
Name                            Version  Status  Reported Operator Capabilities
----                            -------  ------  ------------------------------
barman-cloud.cloudnative-pg.io  0.6.0    N/A     Reconciler Hooks, Lifecycle Service

:::tip If the Plugins status section is missing:

Install or update kubectl-cnpg plugin to the latest version
Ensure CloudNativePG operator is v1.27.0 or later :::

b. Verify correct plugin name in Cluster spec:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
spec:
  plugins:
    - name: barman-cloud.cloudnative-pg.io
      parameters:
        barmanObjectStore: <your-objectstore-name>

c. Check plugin deployment is running:

kubectl get deployment -n cnpg-system barman-cloud

"rpc error: code = Unknown desc = panic caught: assignment to entry in nil map" errors

Cause: Configuration issue, often a typo or missing required field in the ObjectStore configuration

Solution:
- Check the sidecar container logs for detailed error messages:
```
kubectl logs -n <namespace> <cluster-pod> -c plugin-barman-cloud
```
- Verify your ObjectStore configuration has all required fields
- Common issues include:
  - Missing or incorrect secret references
  - Typos in configuration parameters
  - Missing required environment variables in secrets

General debugging steps:

Check backup status and identify the target instance

# List all backups and their status
kubectl get backups.postgresql.cnpg.io -n <namespace>

# Using kubectl-cnpg plugin (if installed)
kubectl cnpg backup list -n <namespace>

# Get detailed backup information including error messages and target instance
kubectl describe backups.postgresql.cnpg.io -n <namespace> <backup-name>

# Extract the target pod name from a failed backup
kubectl get backups.postgresql.cnpg.io -n <namespace> <backup-name> -o jsonpath='{.status.instanceID.podName}'

# Or get more details including the target pod, method, phase and error
kubectl get backups.postgresql.cnpg.io -n <namespace> <backup-name> -o jsonpath='Pod: {.status.instanceID.podName}{"\n"}Method: {.status.method}{"\n"}Phase: {.status.phase}{"\n"}Error: {.status.error}{"\n"}'

# Check the cluster status for backup-related information
kubectl cnpg status <cluster-name> -n <namespace> --verbose

Check sidecar logs on the backup target pod

# First, identify which pod was the backup target (from step 1)
TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n <namespace> <backup-name> -o jsonpath='{.status.instanceID.podName}')
echo "Backup target pod: $TARGET_POD"

# Check the sidecar logs on the specific target pod
kubectl logs -n <namespace> $TARGET_POD -c plugin-barman-cloud --tail=100

# Follow the logs in real-time to see ongoing issues
kubectl logs -n <namespace> $TARGET_POD -c plugin-barman-cloud -f

# Check for specific errors in the target pod around the backup time
kubectl logs -n <namespace> $TARGET_POD -c plugin-barman-cloud --since=10m | grep -E "ERROR|FATAL|panic|failed"

# Alternative: List all cluster pods and their roles
kubectl get pods -n <namespace> -l cnpg.io/cluster=<cluster-name> \
  -o custom-columns=NAME:.metadata.name,ROLE:.metadata.labels.cnpg\\.io/instanceRole,INSTANCE:.metadata.labels.cnpg\\.io/instanceName

# Check sidecar logs on ALL cluster pods for any errors (if target is unclear)
for pod in $(kubectl get pods -n <namespace> -l cnpg.io/cluster=<cluster-name> -o name); do
  echo "=== Checking $pod ==="
  kubectl logs -n <namespace> $pod -c plugin-barman-cloud --tail=20 | grep -i error || echo "No errors found"
done

Check events for backup-related issues

# Check events for the cluster
kubectl get events -n <namespace> --field-selector involvedObject.name=<cluster-name>

# Check events for failed backups
kubectl get events -n <namespace> --field-selector involvedObject.kind=Backup

# Get all recent events in the namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20

Verify ObjectStore configuration

# Check the ObjectStore resource
kubectl get objectstores.barmancloud.cnpg.io -n <namespace> <objectstore-name> -o yaml

# Verify the secret exists and has correct keys
kubectl get secret -n <namespace> <secret-name> -o yaml

Common error messages and solutions:
- "AccessDenied" or "403 Forbidden": Check cloud credentials and bucket permissions
- "NoSuchBucket": Verify the bucket exists and the endpoint URL is correct
- "Connection timeout": Check network connectivity and firewall rules
- "SSL certificate problem": For self-signed certificates, check CA bundle configuration

Backup performance issues

Symptoms:

Backups take extremely long
Backups timeout

Plugin-specific considerations:

Check ObjectStore parallelism settings
- Adjust maxParallel in ObjectStore configuration
- Monitor sidecar container resource usage during backups
Verify plugin resource allocation
- Check if the sidecar container has sufficient CPU/memory
- Review plugin container logs for resource-related warnings

:::tip For Barman-specific features like compression, encryption, and performance tuning, refer to the Barman documentation. :::

WAL Archiving Issues

WAL archiving through plugin stops working

Symptoms:

WAL files accumulating on primary
Cluster warnings about WAL archiving
Plugin sidecar logs show WAL archive errors

Plugin-specific debugging:

Check plugin sidecar logs for WAL archiving errors

# Check recent WAL archive operations in sidecar
kubectl logs -n <namespace> <primary-pod> -c plugin-barman-cloud --tail=50 | grep -i wal

Verify the plugin is handling archive_command

# The archive_command should be routing through the plugin
kubectl exec -n <namespace> <primary-pod> -c postgres -- psql -U postgres -c "SHOW archive_command;"

Check ObjectStore configuration for WAL settings
- Ensure ObjectStore has proper WAL retention settings
- Verify credentials have permissions for WAL operations

Restore Issues

Restore fails during recovery

Symptoms:

New cluster stuck in recovery mode
Plugin sidecar shows restore errors
PostgreSQL won't start

Plugin-specific debugging:

Check plugin sidecar logs during restore

# Check the sidecar logs on the recovering cluster pods
kubectl logs -n <namespace> <cluster-pod-name> -c plugin-barman-cloud --tail=100

# Look for restore-related errors
kubectl logs -n <namespace> <cluster-pod-name> -c plugin-barman-cloud | grep -E "restore|recovery|ERROR"

Verify plugin can access backups

# Check if ObjectStore is properly configured for restore
kubectl get objectstores.barmancloud.cnpg.io -n <namespace> <objectstore-name> -o yaml

# Check PostgreSQL recovery logs
kubectl logs -n <namespace> <cluster-pod> -c postgres | grep -i recovery

:::tip For detailed Barman restore operations and troubleshooting, refer to the Barman documentation. :::

Point-in-time recovery (PITR) configuration issues

Symptoms:

PITR target time not reached
Plugin sidecar shows WAL access errors
Recovery stops before target

Plugin-specific configuration:

Verify plugin configuration for PITR

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
spec:
  plugins:
    - name: barman-cloud.cloudnative-pg.io
      parameters:
        barmanObjectStore: <objectstore-name>
  bootstrap:
    recovery:
      recoveryTarget:
        targetTime: "2024-01-15 10:30:00"
        targetTimezone: "UTC"

Check plugin sidecar for WAL access

# Check sidecar logs during recovery for WAL-related errors
kubectl logs -n <namespace> <cluster-pod> -c plugin-barman-cloud | grep -i wal

:::note For detailed PITR configuration and WAL management, see the Barman PITR documentation. :::

Plugin Configuration Issues

Plugin cannot connect to object storage

Symptoms:

Plugin sidecar logs show connection errors
Backups fail with authentication or network errors
ObjectStore resource shows errors

Plugin-specific solutions:

Verify ObjectStore CRD configuration

# Check ObjectStore resource status
kubectl get objectstores.barmancloud.cnpg.io -n <namespace> <objectstore-name> -o yaml

# Verify the secret exists and has correct keys for your provider
kubectl get secret -n <namespace> <secret-name> -o jsonpath='{.data}' | jq 'keys'

Check plugin sidecar connectivity

# Check sidecar logs for connection errors
kubectl logs -n <namespace> <cluster-pod> -c plugin-barman-cloud | grep -E "connection|timeout|SSL|certificate"

Provider-specific configuration
- See Object Store Configuration for provider-specific settings
- Ensure endpointURL and s3UsePathStyle match your storage type
- Verify network policies allow egress to your storage provider

Diagnostic Commands

Using kubectl-cnpg plugin

The kubectl-cnpg plugin provides enhanced debugging capabilities. Make sure you have it installed and updated:

# Install or update kubectl-cnpg plugin
kubectl krew install cnpg
# Or download directly from: https://github.com/cloudnative-pg/cloudnative-pg/releases

# Check plugin status (requires CNPG 1.27.0+)
kubectl cnpg status <cluster-name> -n <namespace>

# View cluster status in detail
kubectl cnpg status <cluster-name> -n <namespace> --verbose

# Check backup status
kubectl cnpg backup list -n <namespace>

# View plugin capabilities
kubectl cnpg plugin list -n <namespace>

Getting Help

If you continue to experience issues:

Check the documentation
- Review the Installation Guide
- Check Object Store Configuration for provider-specific settings
- Review Usage Examples for correct configuration patterns

Gather diagnostic information

# Create a diagnostic bundle (⚠️sanitize these before sharing!)
kubectl get objectstores.barmancloud.cnpg.io -A -o yaml > /tmp/objectstores.yaml
kubectl get clusters.postgresql.cnpg.io -A -o yaml > /tmp/clusters.yaml
kubectl logs -n cnpg-system deployment/barman-cloud --tail=1000 > /tmp/plugin.log

Community support
- CloudNativePG Slack: #cloudnativepg-users
- GitHub Issues: plugin-barman-cloud
When reporting issues, include:
- CloudNativePG version
- Barman Cloud Plugin version
- Kubernetes version
- Cloud provider and region
- Relevant configuration (⚠️sanitize/redact sensitive information)
- Error messages and logs
- Steps to reproduce

Known Issues and Limitations

Current Known Issues

WAL overwrite protection: Unlike the in-tree Barman archiver, the plugin doesn't prevent WAL overwrites when multiple clusters share the same name and object store path (#263)

Migration compatibility: After migrating from in-tree backup to the plugin, the kubectl cnpg backup command syntax has changed (#353):

# Old command (in-tree, no longer works after migration)
kubectl cnpg backup -n <namespace> <cluster-name> --method=barmanObjectStore

# New command (plugin-based)
kubectl cnpg backup -n <namespace> <cluster-name> --method=plugin --plugin-name=barman-cloud.cloudnative-pg.io

Plugin Limitations

Installation method: Currently only supports manifest and Kustomize installation (#351 - Helm chart requested)
Sidecar resource sharing: The plugin sidecar container shares pod resources with PostgreSQL
Plugin restart behavior: Restarting the sidecar container requires restarting the entire PostgreSQL pod

Compatibility Matrix

Plugin Version	CloudNativePG Version	Kubernetes Version	Notes
0.6.x	1.26.x, 1.27.x (recommended)	1.28+	CNPG 1.27.0+ provides enhanced plugin status reporting
0.5.x	1.25.x, 1.26.x	1.27+	Limited plugin diagnostics

:::tip Always check the Release Notes for version-specific known issues and fixes. :::

18 KiB Raw Blame History Unescape Escape

Troubleshooting

Before You Begin

Recommended Upgrades

Viewing Logs

Common Issues

Plugin Installation Issues

Plugin pods not starting

Backup Failures

Quick Backup Troubleshooting Checklist

Backup job fails immediately

Backup performance issues

WAL Archiving Issues

WAL archiving through plugin stops working

Restore Issues

Restore fails during recovery

Point-in-time recovery (PITR) configuration issues

Plugin Configuration Issues

Plugin cannot connect to object storage

Diagnostic Commands

Using kubectl-cnpg plugin

Getting Help

Known Issues and Limitations

Current Known Issues

Plugin Limitations

Compatibility Matrix

18 KiB

Raw Blame History