From 8aabd2860b70ea277ad771e3b2c0adf6c0b11d9e Mon Sep 17 00:00:00 2001 From: Jeff Mealo Date: Tue, 2 Sep 2025 15:23:52 -0400 Subject: [PATCH] docs(web): add troubleshooting guide and improve navigation - Create troubleshooting page with debugging steps and known issues - Add hero subtitle and quick links to landing page - Document kubectl-cnpg backup command syntax changes - Include migration notes for users switching from in-tree backup Signed-off-by: Jeff Mealo --- web/docs/intro.md | 7 +- web/docs/troubleshooting.md | 491 ++++++++++++++++++ web/docs/usage.md | 30 +- .../components/HomepageFeatures/feature.tsx | 67 ++- .../HomepageFeatures/styles.module.css | 16 + web/src/pages/index.tsx | 10 + web/versioned_docs/version-0.6.0/intro.md | 7 +- .../version-0.6.0/troubleshooting.md | 491 ++++++++++++++++++ 8 files changed, 1096 insertions(+), 23 deletions(-) create mode 100644 web/docs/troubleshooting.md create mode 100644 web/versioned_docs/version-0.6.0/troubleshooting.md diff --git a/web/docs/intro.md b/web/docs/intro.md index f22f383..39aa639 100644 --- a/web/docs/intro.md +++ b/web/docs/intro.md @@ -23,9 +23,14 @@ for detailed instructions. To use the Barman Cloud Plugin, you need: -- [CloudNativePG](https://cloudnative-pg.io) version **1.26** +- [CloudNativePG](https://cloudnative-pg.io) version **1.26** or later + - **Version 1.27.0 or later is strongly recommended** for enhanced plugin error and status reporting + - See the [upgrade guide](https://cloudnative-pg.io/documentation/current/installation_upgrade) if you need to upgrade - [cert-manager](https://cert-manager.io/) to enable TLS communication between the plugin and the operator +- [kubectl-cnpg plugin](https://cloudnative-pg.io/documentation/current/kubectl-plugin/) (recommended) + - Provides enhanced debugging and status checking capabilities + - Install via `kubectl krew install cnpg` or download from [releases](https://github.com/cloudnative-pg/cloudnative-pg/releases) ## Key Features diff --git a/web/docs/troubleshooting.md b/web/docs/troubleshooting.md new file mode 100644 index 0000000..0d61401 --- /dev/null +++ b/web/docs/troubleshooting.md @@ -0,0 +1,491 @@ +--- +sidebar_position: 50 +--- + +# Troubleshooting + + + +This guide helps you diagnose and resolve common issues with the Barman Cloud Plugin. + +## Before You Begin + +### Recommended Upgrades + +:::important +**CloudNativePG 1.27.0** offers significantly improved error and status reporting for plugins. If you're experiencing issues, we strongly recommend upgrading to version 1.27.0 or later for better diagnostics. + +- **Upgrade CloudNativePG**: Follow the [official upgrade guide](https://cloudnative-pg.io/documentation/current/installation_upgrade) +- **Update kubectl-cnpg plugin**: Install or update the kubectl plugin for better debugging capabilities. See the [kubectl plugin documentation](https://cloudnative-pg.io/documentation/current/kubectl-plugin/) +::: + +### Viewing Logs + +To effectively troubleshoot issues, you need to check logs from multiple sources: + +:::note +The following commands assume you've installed the CloudNativePG operator in the default `cnpg-system` namespace. If you've installed it in a different namespace, adjust the commands accordingly. +::: + +```bash +# View operator logs (contains plugin interaction logs) +# Assumes operator is installed in the default cnpg-system namespace +kubectl logs -n cnpg-system deployment/cnpg-controller-manager -f + +# View sidecar container logs (barman-cloud operations) +kubectl logs -n -c plugin-barman-cloud -f + +# View plugin manager logs +kubectl logs -n cnpg-system deployment/barman-cloud -f + +# View all containers in a pod +kubectl logs -n --all-containers=true + +# View previous container logs (if container restarted) +kubectl logs -n -c plugin-barman-cloud --previous +``` + +## Common Issues + +### Plugin Installation Issues + +#### Plugin pods not starting + +**Symptoms:** +- Plugin pods are in `CrashLoopBackOff` or `Error` state +- Plugin deployment is not ready + +**Possible causes and solutions:** + +1. **Certificate issues** + ```bash + # Check if cert-manager is installed and running + kubectl get pods -n cert-manager + + # Check if the plugin certificate is created + kubectl get certificates -n cnpg-system + ``` + + If cert-manager is not installed, install it first: + ```bash + kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml + ``` + +2. **Image pull errors** + ```bash + # Check pod events for image pull errors + kubectl describe pod -n cnpg-system -l app.kubernetes.io/name=barman-cloud + ``` + + Verify the image exists and you have proper credentials if using a private registry. + +3. **Resource constraints** + ```bash + # Check node resources + kubectl top nodes + kubectl describe nodes + ``` + + Ensure your cluster has sufficient CPU and memory resources. + +### Backup Failures + +#### Quick Backup Troubleshooting Checklist + +When a backup fails, follow these steps in order: + +1. **Check backup status**: `kubectl get backups.postgresql.cnpg.io -n ` +2. **Get error details and target pod**: + ```bash + kubectl describe backups.postgresql.cnpg.io -n + # Or extract just the target pod name + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}' + ``` +3. **Check the specific target pod's sidecar logs**: + ```bash + TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}') + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --tail=100 | grep -E "ERROR|FATAL|panic" + ``` +4. **Check cluster events**: `kubectl get events -n --field-selector involvedObject.name= --sort-by='.lastTimestamp'` +5. **Verify plugin is running**: `kubectl get pods -n cnpg-system -l app.kubernetes.io/name=barman-cloud` +6. **Check operator logs**: `kubectl logs -n cnpg-system deployment/cnpg-controller-manager --tail=100 | grep -i "backup\|plugin"` +7. **Check plugin manager logs**: `kubectl logs -n cnpg-system deployment/barman-cloud --tail=100` + +#### Backup job fails immediately + +**Symptoms:** +- Backup pods terminate with error +- No backup files appear in object storage +- Backup shows `failed` phase with various error messages + +**Common failure modes and solutions:** + +1. **"requested plugin is not available" errors** + ``` + ERROR: requested plugin is not available: barman + ERROR: requested plugin is not available: barman-cloud + ERROR: requested plugin is not available: barman-cloud.cloudnative-pg.io + ``` + + **Cause:** The plugin name in the Cluster configuration doesn't match the deployed plugin or the plugin isn't registered + + **Solution:** + + a. **Check plugin registration status**: + ```bash + # If you have kubectl-cnpg plugin installed (v1.27.0+) + kubectl cnpg status -n + ``` + + Look for the "Plugins status" section: + ``` + Plugins status + Name Version Status Reported Operator Capabilities + ---- ------- ------ ------------------------------ + barman-cloud.cloudnative-pg.io 0.6.0 N/A Reconciler Hooks, Lifecycle Service + ``` + + :::tip + If the Plugins status section is missing: + - Install or update kubectl-cnpg plugin to the latest version + - Ensure CloudNativePG operator is v1.27.0 or later + ::: + + b. **Verify correct plugin name in Cluster spec**: + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + spec: + plugins: + - name: barman-cloud.cloudnative-pg.io + parameters: + barmanObjectStore: + ``` + + c. **Check plugin deployment is running**: + ```bash + kubectl get deployment -n cnpg-system barman-cloud + ``` + +2. **"rpc error: code = Unknown desc = panic caught: assignment to entry in nil map" errors** + + **Cause:** Configuration issue, often a typo or missing required field in the ObjectStore configuration + + **Solution:** + - Check the sidecar container logs for detailed error messages: + ```bash + kubectl logs -n -c plugin-barman-cloud + ``` + - Verify your ObjectStore configuration has all required fields + - Common issues include: + - Missing or incorrect secret references + - Typos in configuration parameters + - Missing required environment variables in secrets + +**General debugging steps:** + +1. **Check backup status and identify the target instance** + ```bash + # List all backups and their status + kubectl get backups.postgresql.cnpg.io -n + + # Using kubectl-cnpg plugin (if installed) + kubectl cnpg backup list -n + + # Get detailed backup information including error messages and target instance + kubectl describe backups.postgresql.cnpg.io -n + + # Extract the target pod name from a failed backup + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}' + + # Or get more details including the target pod, method, phase and error + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='Pod: {.status.instanceID.podName}{"\n"}Method: {.status.method}{"\n"}Phase: {.status.phase}{"\n"}Error: {.status.error}{"\n"}' + + # Check the cluster status for backup-related information + kubectl cnpg status -n --verbose + ``` + +2. **Check sidecar logs on the backup target pod** + ```bash + # First, identify which pod was the backup target (from step 1) + TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}') + echo "Backup target pod: $TARGET_POD" + + # Check the sidecar logs on the specific target pod + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --tail=100 + + # Follow the logs in real-time to see ongoing issues + kubectl logs -n $TARGET_POD -c plugin-barman-cloud -f + + # Check for specific errors in the target pod around the backup time + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --since=10m | grep -E "ERROR|FATAL|panic|failed" + + # Alternative: List all cluster pods and their roles + kubectl get pods -n -l cnpg.io/cluster= \ + -o custom-columns=NAME:.metadata.name,ROLE:.metadata.labels.cnpg\\.io/instanceRole,INSTANCE:.metadata.labels.cnpg\\.io/instanceName + + # Check sidecar logs on ALL cluster pods for any errors (if target is unclear) + for pod in $(kubectl get pods -n -l cnpg.io/cluster= -o name); do + echo "=== Checking $pod ===" + kubectl logs -n $pod -c plugin-barman-cloud --tail=20 | grep -i error || echo "No errors found" + done + ``` + +3. **Check events for backup-related issues** + ```bash + # Check events for the cluster + kubectl get events -n --field-selector involvedObject.name= + + # Check events for failed backups + kubectl get events -n --field-selector involvedObject.kind=Backup + + # Get all recent events in the namespace + kubectl get events -n --sort-by='.lastTimestamp' | tail -20 + ``` + +4. **Verify ObjectStore configuration** + ```bash + # Check the ObjectStore resource + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Verify the secret exists and has correct keys + kubectl get secret -n -o yaml + ``` + +5. **Common error messages and solutions:** + + - **"AccessDenied" or "403 Forbidden"**: Check cloud credentials and bucket permissions + - **"NoSuchBucket"**: Verify the bucket exists and the endpoint URL is correct + - **"Connection timeout"**: Check network connectivity and firewall rules + - **"SSL certificate problem"**: For self-signed certificates, check CA bundle configuration + +#### Backup performance issues + +**Symptoms:** +- Backups take extremely long +- Backups timeout + +**Plugin-specific considerations:** + +1. **Check ObjectStore parallelism settings** + - Adjust `maxParallel` in ObjectStore configuration + - Monitor sidecar container resource usage during backups + +2. **Verify plugin resource allocation** + - Check if the sidecar container has sufficient CPU/memory + - Review plugin container logs for resource-related warnings + +:::tip +For Barman-specific features like compression, encryption, and performance tuning, refer to the [Barman documentation](https://docs.pgbarman.org/latest/). +::: + +### WAL Archiving Issues + +#### WAL archiving through plugin stops working + +**Symptoms:** +- WAL files accumulating on primary +- Cluster warnings about WAL archiving +- Plugin sidecar logs show WAL archive errors + +**Plugin-specific debugging:** + +1. **Check plugin sidecar logs for WAL archiving errors** + ```bash + # Check recent WAL archive operations in sidecar + kubectl logs -n -c plugin-barman-cloud --tail=50 | grep -i wal + ``` + +2. **Verify the plugin is handling archive_command** + ```bash + # The archive_command should be routing through the plugin + kubectl exec -n -c postgres -- psql -U postgres -c "SHOW archive_command;" + ``` + +3. **Check ObjectStore configuration for WAL settings** + - Ensure ObjectStore has proper WAL retention settings + - Verify credentials have permissions for WAL operations + +### Restore Issues + +#### Restore fails during recovery + +**Symptoms:** +- New cluster stuck in recovery mode +- Plugin sidecar shows restore errors +- PostgreSQL won't start + +**Plugin-specific debugging:** + +1. **Check plugin sidecar logs during restore** + ```bash + # Check the sidecar logs on the recovering cluster pods + kubectl logs -n -c plugin-barman-cloud --tail=100 + + # Look for restore-related errors + kubectl logs -n -c plugin-barman-cloud | grep -E "restore|recovery|ERROR" + ``` + +2. **Verify plugin can access backups** + ```bash + # Check if ObjectStore is properly configured for restore + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Check PostgreSQL recovery logs + kubectl logs -n -c postgres | grep -i recovery + ``` + +:::tip +For detailed Barman restore operations and troubleshooting, refer to the [Barman documentation](https://docs.pgbarman.org/latest/barman-cloud-restore.html). +::: + +#### Point-in-time recovery (PITR) configuration issues + +**Symptoms:** +- PITR target time not reached +- Plugin sidecar shows WAL access errors +- Recovery stops before target + +**Plugin-specific configuration:** + +1. **Verify plugin configuration for PITR** + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + spec: + plugins: + - name: barman-cloud.cloudnative-pg.io + parameters: + barmanObjectStore: + bootstrap: + recovery: + recoveryTarget: + targetTime: "2024-01-15 10:30:00" + targetTimezone: "UTC" + ``` + +2. **Check plugin sidecar for WAL access** + ```bash + # Check sidecar logs during recovery for WAL-related errors + kubectl logs -n -c plugin-barman-cloud | grep -i wal + ``` + +:::note +For detailed PITR configuration and WAL management, see the [Barman PITR documentation](https://docs.pgbarman.org/latest/). +::: + +### Plugin Configuration Issues + +#### Plugin cannot connect to object storage + +**Symptoms:** +- Plugin sidecar logs show connection errors +- Backups fail with authentication or network errors +- ObjectStore resource shows errors + +**Plugin-specific solutions:** + +1. **Verify ObjectStore CRD configuration** + ```bash + # Check ObjectStore resource status + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Verify the secret exists and has correct keys for your provider + kubectl get secret -n -o jsonpath='{.data}' | jq 'keys' + ``` + +2. **Check plugin sidecar connectivity** + ```bash + # Check sidecar logs for connection errors + kubectl logs -n -c plugin-barman-cloud | grep -E "connection|timeout|SSL|certificate" + ``` + +3. **Provider-specific configuration** + - See [Object Store Configuration](object_stores.md) for provider-specific settings + - Ensure `endpointURL` and `s3UsePathStyle` match your storage type + - Verify network policies allow egress to your storage provider + +## Diagnostic Commands + +### Using kubectl-cnpg plugin + +The kubectl-cnpg plugin provides enhanced debugging capabilities. Make sure you have it installed and updated: + +```bash +# Install or update kubectl-cnpg plugin +kubectl krew install cnpg +# Or download directly from: https://github.com/cloudnative-pg/cloudnative-pg/releases + +# Check plugin status (requires CNPG 1.27.0+) +kubectl cnpg status -n + +# View cluster status in detail +kubectl cnpg status -n --verbose + +# Check backup status +kubectl cnpg backup list -n + +# View plugin capabilities +kubectl cnpg plugin list -n +``` + +## Getting Help + +If you continue to experience issues: + +1. **Check the documentation** + - Review the [Installation Guide](installation.mdx) + - Check [Object Store Configuration](object_stores.md) for provider-specific settings + - Review [Usage Examples](usage.md) for correct configuration patterns + +2. **Gather diagnostic information** + ```bash + # Create a diagnostic bundle (⚠️sanitize these before sharing!) + kubectl get objectstores.barmancloud.cnpg.io -A -o yaml > /tmp/objectstores.yaml + kubectl get clusters.postgresql.cnpg.io -A -o yaml > /tmp/clusters.yaml + kubectl logs -n cnpg-system deployment/barman-cloud --tail=1000 > /tmp/plugin.log + ``` + +3. **Community support** + - CloudNativePG Slack: [#cloudnativepg-users](https://cloud-native.slack.com/messages/cloudnativepg-users) + - GitHub Issues: [plugin-barman-cloud](https://github.com/cloudnative-pg/plugin-barman-cloud/issues) + +4. **When reporting issues, include:** + - CloudNativePG version + - Barman Cloud Plugin version + - Kubernetes version + - Cloud provider and region + - Relevant configuration (⚠️sanitize/redact sensitive information) + - Error messages and logs + - Steps to reproduce + +## Known Issues and Limitations + +### Current Known Issues + +1. **WAL overwrite protection**: Unlike the in-tree Barman archiver, the plugin doesn't prevent WAL overwrites when multiple clusters share the same name and object store path ([#263](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/263)) +2. **Migration compatibility**: After migrating from in-tree backup to the plugin, the `kubectl cnpg backup` command syntax has changed ([#353](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/353)): + ```bash + # Old command (in-tree, no longer works after migration) + kubectl cnpg backup -n --method=barmanObjectStore + + # New command (plugin-based) + kubectl cnpg backup -n --method=plugin --plugin-name=barman-cloud.cloudnative-pg.io + ``` + +### Plugin Limitations + +1. **Installation method**: Currently only supports manifest and Kustomize installation ([#351](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/351) - Helm chart requested) +2. **Sidecar resource sharing**: The plugin sidecar container shares pod resources with PostgreSQL +3. **Plugin restart behavior**: Restarting the sidecar container requires restarting the entire PostgreSQL pod + +### Compatibility Matrix + +| Plugin Version | CloudNativePG Version | Kubernetes Version | Notes | +|---------------|----------------------|-------------------|--------| +| 0.6.x | 1.26.x, **1.27.x** (recommended) | 1.28+ | CNPG 1.27.0+ provides enhanced plugin status reporting | +| 0.5.x | 1.25.x, 1.26.x | 1.27+ | Limited plugin diagnostics | + +:::tip +Always check the [Release Notes](https://github.com/cloudnative-pg/plugin-barman-cloud/releases) for version-specific known issues and fixes. +::: \ No newline at end of file diff --git a/web/docs/usage.md b/web/docs/usage.md index dcff072..17ac3b0 100644 --- a/web/docs/usage.md +++ b/web/docs/usage.md @@ -81,8 +81,34 @@ This configuration enables both WAL archiving and data directory backups. ## Performing a Base Backup -Once WAL archiving is enabled, the cluster is ready for backups. To issue an -on-demand backup, use the following configuration with the plugin method: +Once WAL archiving is enabled, the cluster is ready for backups. + +### Using kubectl-cnpg plugin + +The quickest way to create an on-demand backup is using the kubectl-cnpg plugin: + +```bash +kubectl cnpg backup -n \ + --method=plugin \ + --plugin-name=barman-cloud.cloudnative-pg.io +``` + +:::note Migration from in-tree backup +If you're migrating from the in-tree backup system, note that the command has changed from: +```bash +# Old command (in-tree backup) +kubectl cnpg backup -n --method=barmanObjectStore +``` +to: +```bash +# New command (plugin backup) +kubectl cnpg backup -n --method=plugin --plugin-name=barman-cloud.cloudnative-pg.io +``` +::: + +### Using YAML manifests + +Alternatively, you can create a backup using a YAML manifest: ```yaml apiVersion: postgresql.cnpg.io/v1 diff --git a/web/src/components/HomepageFeatures/feature.tsx b/web/src/components/HomepageFeatures/feature.tsx index 3a64556..d08144c 100644 --- a/web/src/components/HomepageFeatures/feature.tsx +++ b/web/src/components/HomepageFeatures/feature.tsx @@ -1,5 +1,7 @@ import type {ComponentProps, ComponentType, ReactElement} from "react"; import clsx from "clsx"; + +import Link from "@docusaurus/Link"; import styles from "@site/src/components/HomepageFeatures/styles.module.css"; import Heading from "@theme/Heading"; @@ -25,25 +27,52 @@ function Feature({title, Svg, description}: FeatureItem): ReactElement { return ( -
- +
+ - - -
+ /> + + +
+
+
+ Quick Links +
+ + Installation Guide + + + Core Concepts + + + Usage Examples + + + Object Store Setup + + + Migration Guide + + + Troubleshooting + +
+
+
+ ) } diff --git a/web/src/components/HomepageFeatures/styles.module.css b/web/src/components/HomepageFeatures/styles.module.css index b248eb2..7714348 100644 --- a/web/src/components/HomepageFeatures/styles.module.css +++ b/web/src/components/HomepageFeatures/styles.module.css @@ -9,3 +9,19 @@ height: 200px; width: 200px; } + +.quickLinksSection { + margin-top: 3rem; +} + +.quickLinksTitle { + text-align: center; + margin-bottom: 2rem; +} + +.quickLinksContainer { + display: flex; + justify-content: center; + gap: 1rem; + flex-wrap: wrap; +} diff --git a/web/src/pages/index.tsx b/web/src/pages/index.tsx index 427ca31..5481b26 100644 --- a/web/src/pages/index.tsx +++ b/web/src/pages/index.tsx @@ -16,6 +16,16 @@ function HomepageHeader(): ReactElement { {siteConfig.title} +

+ Enable online continuous physical backups of PostgreSQL clusters to object storage +

+
+ + Get Started with the Documentation → + +
); diff --git a/web/versioned_docs/version-0.6.0/intro.md b/web/versioned_docs/version-0.6.0/intro.md index f22f383..39aa639 100644 --- a/web/versioned_docs/version-0.6.0/intro.md +++ b/web/versioned_docs/version-0.6.0/intro.md @@ -23,9 +23,14 @@ for detailed instructions. To use the Barman Cloud Plugin, you need: -- [CloudNativePG](https://cloudnative-pg.io) version **1.26** +- [CloudNativePG](https://cloudnative-pg.io) version **1.26** or later + - **Version 1.27.0 or later is strongly recommended** for enhanced plugin error and status reporting + - See the [upgrade guide](https://cloudnative-pg.io/documentation/current/installation_upgrade) if you need to upgrade - [cert-manager](https://cert-manager.io/) to enable TLS communication between the plugin and the operator +- [kubectl-cnpg plugin](https://cloudnative-pg.io/documentation/current/kubectl-plugin/) (recommended) + - Provides enhanced debugging and status checking capabilities + - Install via `kubectl krew install cnpg` or download from [releases](https://github.com/cloudnative-pg/cloudnative-pg/releases) ## Key Features diff --git a/web/versioned_docs/version-0.6.0/troubleshooting.md b/web/versioned_docs/version-0.6.0/troubleshooting.md new file mode 100644 index 0000000..0d61401 --- /dev/null +++ b/web/versioned_docs/version-0.6.0/troubleshooting.md @@ -0,0 +1,491 @@ +--- +sidebar_position: 50 +--- + +# Troubleshooting + + + +This guide helps you diagnose and resolve common issues with the Barman Cloud Plugin. + +## Before You Begin + +### Recommended Upgrades + +:::important +**CloudNativePG 1.27.0** offers significantly improved error and status reporting for plugins. If you're experiencing issues, we strongly recommend upgrading to version 1.27.0 or later for better diagnostics. + +- **Upgrade CloudNativePG**: Follow the [official upgrade guide](https://cloudnative-pg.io/documentation/current/installation_upgrade) +- **Update kubectl-cnpg plugin**: Install or update the kubectl plugin for better debugging capabilities. See the [kubectl plugin documentation](https://cloudnative-pg.io/documentation/current/kubectl-plugin/) +::: + +### Viewing Logs + +To effectively troubleshoot issues, you need to check logs from multiple sources: + +:::note +The following commands assume you've installed the CloudNativePG operator in the default `cnpg-system` namespace. If you've installed it in a different namespace, adjust the commands accordingly. +::: + +```bash +# View operator logs (contains plugin interaction logs) +# Assumes operator is installed in the default cnpg-system namespace +kubectl logs -n cnpg-system deployment/cnpg-controller-manager -f + +# View sidecar container logs (barman-cloud operations) +kubectl logs -n -c plugin-barman-cloud -f + +# View plugin manager logs +kubectl logs -n cnpg-system deployment/barman-cloud -f + +# View all containers in a pod +kubectl logs -n --all-containers=true + +# View previous container logs (if container restarted) +kubectl logs -n -c plugin-barman-cloud --previous +``` + +## Common Issues + +### Plugin Installation Issues + +#### Plugin pods not starting + +**Symptoms:** +- Plugin pods are in `CrashLoopBackOff` or `Error` state +- Plugin deployment is not ready + +**Possible causes and solutions:** + +1. **Certificate issues** + ```bash + # Check if cert-manager is installed and running + kubectl get pods -n cert-manager + + # Check if the plugin certificate is created + kubectl get certificates -n cnpg-system + ``` + + If cert-manager is not installed, install it first: + ```bash + kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml + ``` + +2. **Image pull errors** + ```bash + # Check pod events for image pull errors + kubectl describe pod -n cnpg-system -l app.kubernetes.io/name=barman-cloud + ``` + + Verify the image exists and you have proper credentials if using a private registry. + +3. **Resource constraints** + ```bash + # Check node resources + kubectl top nodes + kubectl describe nodes + ``` + + Ensure your cluster has sufficient CPU and memory resources. + +### Backup Failures + +#### Quick Backup Troubleshooting Checklist + +When a backup fails, follow these steps in order: + +1. **Check backup status**: `kubectl get backups.postgresql.cnpg.io -n ` +2. **Get error details and target pod**: + ```bash + kubectl describe backups.postgresql.cnpg.io -n + # Or extract just the target pod name + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}' + ``` +3. **Check the specific target pod's sidecar logs**: + ```bash + TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}') + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --tail=100 | grep -E "ERROR|FATAL|panic" + ``` +4. **Check cluster events**: `kubectl get events -n --field-selector involvedObject.name= --sort-by='.lastTimestamp'` +5. **Verify plugin is running**: `kubectl get pods -n cnpg-system -l app.kubernetes.io/name=barman-cloud` +6. **Check operator logs**: `kubectl logs -n cnpg-system deployment/cnpg-controller-manager --tail=100 | grep -i "backup\|plugin"` +7. **Check plugin manager logs**: `kubectl logs -n cnpg-system deployment/barman-cloud --tail=100` + +#### Backup job fails immediately + +**Symptoms:** +- Backup pods terminate with error +- No backup files appear in object storage +- Backup shows `failed` phase with various error messages + +**Common failure modes and solutions:** + +1. **"requested plugin is not available" errors** + ``` + ERROR: requested plugin is not available: barman + ERROR: requested plugin is not available: barman-cloud + ERROR: requested plugin is not available: barman-cloud.cloudnative-pg.io + ``` + + **Cause:** The plugin name in the Cluster configuration doesn't match the deployed plugin or the plugin isn't registered + + **Solution:** + + a. **Check plugin registration status**: + ```bash + # If you have kubectl-cnpg plugin installed (v1.27.0+) + kubectl cnpg status -n + ``` + + Look for the "Plugins status" section: + ``` + Plugins status + Name Version Status Reported Operator Capabilities + ---- ------- ------ ------------------------------ + barman-cloud.cloudnative-pg.io 0.6.0 N/A Reconciler Hooks, Lifecycle Service + ``` + + :::tip + If the Plugins status section is missing: + - Install or update kubectl-cnpg plugin to the latest version + - Ensure CloudNativePG operator is v1.27.0 or later + ::: + + b. **Verify correct plugin name in Cluster spec**: + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + spec: + plugins: + - name: barman-cloud.cloudnative-pg.io + parameters: + barmanObjectStore: + ``` + + c. **Check plugin deployment is running**: + ```bash + kubectl get deployment -n cnpg-system barman-cloud + ``` + +2. **"rpc error: code = Unknown desc = panic caught: assignment to entry in nil map" errors** + + **Cause:** Configuration issue, often a typo or missing required field in the ObjectStore configuration + + **Solution:** + - Check the sidecar container logs for detailed error messages: + ```bash + kubectl logs -n -c plugin-barman-cloud + ``` + - Verify your ObjectStore configuration has all required fields + - Common issues include: + - Missing or incorrect secret references + - Typos in configuration parameters + - Missing required environment variables in secrets + +**General debugging steps:** + +1. **Check backup status and identify the target instance** + ```bash + # List all backups and their status + kubectl get backups.postgresql.cnpg.io -n + + # Using kubectl-cnpg plugin (if installed) + kubectl cnpg backup list -n + + # Get detailed backup information including error messages and target instance + kubectl describe backups.postgresql.cnpg.io -n + + # Extract the target pod name from a failed backup + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}' + + # Or get more details including the target pod, method, phase and error + kubectl get backups.postgresql.cnpg.io -n -o jsonpath='Pod: {.status.instanceID.podName}{"\n"}Method: {.status.method}{"\n"}Phase: {.status.phase}{"\n"}Error: {.status.error}{"\n"}' + + # Check the cluster status for backup-related information + kubectl cnpg status -n --verbose + ``` + +2. **Check sidecar logs on the backup target pod** + ```bash + # First, identify which pod was the backup target (from step 1) + TARGET_POD=$(kubectl get backups.postgresql.cnpg.io -n -o jsonpath='{.status.instanceID.podName}') + echo "Backup target pod: $TARGET_POD" + + # Check the sidecar logs on the specific target pod + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --tail=100 + + # Follow the logs in real-time to see ongoing issues + kubectl logs -n $TARGET_POD -c plugin-barman-cloud -f + + # Check for specific errors in the target pod around the backup time + kubectl logs -n $TARGET_POD -c plugin-barman-cloud --since=10m | grep -E "ERROR|FATAL|panic|failed" + + # Alternative: List all cluster pods and their roles + kubectl get pods -n -l cnpg.io/cluster= \ + -o custom-columns=NAME:.metadata.name,ROLE:.metadata.labels.cnpg\\.io/instanceRole,INSTANCE:.metadata.labels.cnpg\\.io/instanceName + + # Check sidecar logs on ALL cluster pods for any errors (if target is unclear) + for pod in $(kubectl get pods -n -l cnpg.io/cluster= -o name); do + echo "=== Checking $pod ===" + kubectl logs -n $pod -c plugin-barman-cloud --tail=20 | grep -i error || echo "No errors found" + done + ``` + +3. **Check events for backup-related issues** + ```bash + # Check events for the cluster + kubectl get events -n --field-selector involvedObject.name= + + # Check events for failed backups + kubectl get events -n --field-selector involvedObject.kind=Backup + + # Get all recent events in the namespace + kubectl get events -n --sort-by='.lastTimestamp' | tail -20 + ``` + +4. **Verify ObjectStore configuration** + ```bash + # Check the ObjectStore resource + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Verify the secret exists and has correct keys + kubectl get secret -n -o yaml + ``` + +5. **Common error messages and solutions:** + + - **"AccessDenied" or "403 Forbidden"**: Check cloud credentials and bucket permissions + - **"NoSuchBucket"**: Verify the bucket exists and the endpoint URL is correct + - **"Connection timeout"**: Check network connectivity and firewall rules + - **"SSL certificate problem"**: For self-signed certificates, check CA bundle configuration + +#### Backup performance issues + +**Symptoms:** +- Backups take extremely long +- Backups timeout + +**Plugin-specific considerations:** + +1. **Check ObjectStore parallelism settings** + - Adjust `maxParallel` in ObjectStore configuration + - Monitor sidecar container resource usage during backups + +2. **Verify plugin resource allocation** + - Check if the sidecar container has sufficient CPU/memory + - Review plugin container logs for resource-related warnings + +:::tip +For Barman-specific features like compression, encryption, and performance tuning, refer to the [Barman documentation](https://docs.pgbarman.org/latest/). +::: + +### WAL Archiving Issues + +#### WAL archiving through plugin stops working + +**Symptoms:** +- WAL files accumulating on primary +- Cluster warnings about WAL archiving +- Plugin sidecar logs show WAL archive errors + +**Plugin-specific debugging:** + +1. **Check plugin sidecar logs for WAL archiving errors** + ```bash + # Check recent WAL archive operations in sidecar + kubectl logs -n -c plugin-barman-cloud --tail=50 | grep -i wal + ``` + +2. **Verify the plugin is handling archive_command** + ```bash + # The archive_command should be routing through the plugin + kubectl exec -n -c postgres -- psql -U postgres -c "SHOW archive_command;" + ``` + +3. **Check ObjectStore configuration for WAL settings** + - Ensure ObjectStore has proper WAL retention settings + - Verify credentials have permissions for WAL operations + +### Restore Issues + +#### Restore fails during recovery + +**Symptoms:** +- New cluster stuck in recovery mode +- Plugin sidecar shows restore errors +- PostgreSQL won't start + +**Plugin-specific debugging:** + +1. **Check plugin sidecar logs during restore** + ```bash + # Check the sidecar logs on the recovering cluster pods + kubectl logs -n -c plugin-barman-cloud --tail=100 + + # Look for restore-related errors + kubectl logs -n -c plugin-barman-cloud | grep -E "restore|recovery|ERROR" + ``` + +2. **Verify plugin can access backups** + ```bash + # Check if ObjectStore is properly configured for restore + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Check PostgreSQL recovery logs + kubectl logs -n -c postgres | grep -i recovery + ``` + +:::tip +For detailed Barman restore operations and troubleshooting, refer to the [Barman documentation](https://docs.pgbarman.org/latest/barman-cloud-restore.html). +::: + +#### Point-in-time recovery (PITR) configuration issues + +**Symptoms:** +- PITR target time not reached +- Plugin sidecar shows WAL access errors +- Recovery stops before target + +**Plugin-specific configuration:** + +1. **Verify plugin configuration for PITR** + ```yaml + apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + spec: + plugins: + - name: barman-cloud.cloudnative-pg.io + parameters: + barmanObjectStore: + bootstrap: + recovery: + recoveryTarget: + targetTime: "2024-01-15 10:30:00" + targetTimezone: "UTC" + ``` + +2. **Check plugin sidecar for WAL access** + ```bash + # Check sidecar logs during recovery for WAL-related errors + kubectl logs -n -c plugin-barman-cloud | grep -i wal + ``` + +:::note +For detailed PITR configuration and WAL management, see the [Barman PITR documentation](https://docs.pgbarman.org/latest/). +::: + +### Plugin Configuration Issues + +#### Plugin cannot connect to object storage + +**Symptoms:** +- Plugin sidecar logs show connection errors +- Backups fail with authentication or network errors +- ObjectStore resource shows errors + +**Plugin-specific solutions:** + +1. **Verify ObjectStore CRD configuration** + ```bash + # Check ObjectStore resource status + kubectl get objectstores.barmancloud.cnpg.io -n -o yaml + + # Verify the secret exists and has correct keys for your provider + kubectl get secret -n -o jsonpath='{.data}' | jq 'keys' + ``` + +2. **Check plugin sidecar connectivity** + ```bash + # Check sidecar logs for connection errors + kubectl logs -n -c plugin-barman-cloud | grep -E "connection|timeout|SSL|certificate" + ``` + +3. **Provider-specific configuration** + - See [Object Store Configuration](object_stores.md) for provider-specific settings + - Ensure `endpointURL` and `s3UsePathStyle` match your storage type + - Verify network policies allow egress to your storage provider + +## Diagnostic Commands + +### Using kubectl-cnpg plugin + +The kubectl-cnpg plugin provides enhanced debugging capabilities. Make sure you have it installed and updated: + +```bash +# Install or update kubectl-cnpg plugin +kubectl krew install cnpg +# Or download directly from: https://github.com/cloudnative-pg/cloudnative-pg/releases + +# Check plugin status (requires CNPG 1.27.0+) +kubectl cnpg status -n + +# View cluster status in detail +kubectl cnpg status -n --verbose + +# Check backup status +kubectl cnpg backup list -n + +# View plugin capabilities +kubectl cnpg plugin list -n +``` + +## Getting Help + +If you continue to experience issues: + +1. **Check the documentation** + - Review the [Installation Guide](installation.mdx) + - Check [Object Store Configuration](object_stores.md) for provider-specific settings + - Review [Usage Examples](usage.md) for correct configuration patterns + +2. **Gather diagnostic information** + ```bash + # Create a diagnostic bundle (⚠️sanitize these before sharing!) + kubectl get objectstores.barmancloud.cnpg.io -A -o yaml > /tmp/objectstores.yaml + kubectl get clusters.postgresql.cnpg.io -A -o yaml > /tmp/clusters.yaml + kubectl logs -n cnpg-system deployment/barman-cloud --tail=1000 > /tmp/plugin.log + ``` + +3. **Community support** + - CloudNativePG Slack: [#cloudnativepg-users](https://cloud-native.slack.com/messages/cloudnativepg-users) + - GitHub Issues: [plugin-barman-cloud](https://github.com/cloudnative-pg/plugin-barman-cloud/issues) + +4. **When reporting issues, include:** + - CloudNativePG version + - Barman Cloud Plugin version + - Kubernetes version + - Cloud provider and region + - Relevant configuration (⚠️sanitize/redact sensitive information) + - Error messages and logs + - Steps to reproduce + +## Known Issues and Limitations + +### Current Known Issues + +1. **WAL overwrite protection**: Unlike the in-tree Barman archiver, the plugin doesn't prevent WAL overwrites when multiple clusters share the same name and object store path ([#263](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/263)) +2. **Migration compatibility**: After migrating from in-tree backup to the plugin, the `kubectl cnpg backup` command syntax has changed ([#353](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/353)): + ```bash + # Old command (in-tree, no longer works after migration) + kubectl cnpg backup -n --method=barmanObjectStore + + # New command (plugin-based) + kubectl cnpg backup -n --method=plugin --plugin-name=barman-cloud.cloudnative-pg.io + ``` + +### Plugin Limitations + +1. **Installation method**: Currently only supports manifest and Kustomize installation ([#351](https://github.com/cloudnative-pg/plugin-barman-cloud/issues/351) - Helm chart requested) +2. **Sidecar resource sharing**: The plugin sidecar container shares pod resources with PostgreSQL +3. **Plugin restart behavior**: Restarting the sidecar container requires restarting the entire PostgreSQL pod + +### Compatibility Matrix + +| Plugin Version | CloudNativePG Version | Kubernetes Version | Notes | +|---------------|----------------------|-------------------|--------| +| 0.6.x | 1.26.x, **1.27.x** (recommended) | 1.28+ | CNPG 1.27.0+ provides enhanced plugin status reporting | +| 0.5.x | 1.25.x, 1.26.x | 1.27+ | Limited plugin diagnostics | + +:::tip +Always check the [Release Notes](https://github.com/cloudnative-pg/plugin-barman-cloud/releases) for version-specific known issues and fixes. +::: \ No newline at end of file