# CloudNativePG Setup Guide ## Overview CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides: - High availability with automatic failover - Automated backups to S3-compatible storage - Point-in-time recovery (PITR) - Rolling updates with zero downtime - Connection pooling with PgBouncer - Monitoring with Prometheus ## Architecture ```ascii ┌─────────────────────────────────────┐ │ CloudNativePG Operator │ │ (Manages PostgreSQL Clusters) │ └─────────────────────────────────────┘ │ ├─────────────────────────┐ │ │ ┌────────▼────────┐ ┌───────▼────────┐ │ PostgreSQL │ │ PostgreSQL │ │ Primary │◄─────►│ Replica │ │ (Read/Write) │ │ (Read-only) │ └─────────────────┘ └────────────────┘ │ │ (Backups) ▼ ┌─────────────────┐ │ Ceph S3 (RGW) │ │ Object Storage │ └─────────────────┘ ``` ## Current Configuration ### Operator Settings - **Namespace**: `cnpg-system` - **Monitoring**: Enabled (PodMonitor for Prometheus) - **Grafana Dashboard**: Auto-created - **Priority Class**: `system-cluster-critical` - **Resource Limits**: Conservative (50m CPU, 100Mi RAM) ### Example Cluster (Commented Out) The `values.yaml` includes a commented example cluster configuration. See "Creating Your First Cluster" below. ## Creating Your First Cluster ### Option 1: Using extraObjects in values.yaml Uncomment the `extraObjects` section in `values.yaml` and customize: ```yaml extraObjects: - apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: my-postgres namespace: cnpg-system spec: instances: 2 # 1 primary + 1 replica storage: size: 50Gi storageClass: ceph-block ``` ### Option 2: Separate Application For production, create a separate ArgoCD Application for each database cluster: ```bash mkdir -p apps/databases/my-app-db ``` Create `apps/databases/my-app-db/cluster.yaml`: ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: my-app-db namespace: my-app spec: instances: 3 postgresql: parameters: max_connections: "200" shared_buffers: "256MB" storage: size: 100Gi storageClass: ceph-block monitoring: enablePodMonitor: true backup: retentionPolicy: "30d" barmanObjectStore: destinationPath: s3://my-app-backups/ endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 s3Credentials: accessKeyId: name: backup-credentials key: ACCESS_KEY_ID secretAccessKey: name: backup-credentials key: ACCESS_SECRET_KEY data: compression: bzip2 wal: compression: bzip2 ``` ## Backup Configuration ### Prerequisites 1. **Create S3 Bucket** (using Ceph Object Storage): ```yaml apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: postgres-backups namespace: cnpg-system spec: bucketName: postgres-backups storageClassName: ceph-bucket additionalConfig: maxSize: "500Gi" ``` 2. **Create Credentials Secret**: After creating the ObjectBucketClaim, Rook will generate credentials: ```bash # Get the generated credentials kubectl get secret postgres-backups -n cnpg-system -o yaml # The secret will contain: # - AWS_ACCESS_KEY_ID # - AWS_SECRET_ACCESS_KEY ``` 3. **Reference in Cluster Spec**: Use these credentials in your PostgreSQL cluster backup configuration (see example above). ## Database Access ### Connect to Primary (Read/Write) ```bash # Service name pattern: -rw kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432 # Connect with psql psql -h localhost -U postgres -d postgres ``` ### Connect to Replica (Read-Only) ```bash # Service name pattern: -ro kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432 ``` ### Get Superuser Password ```bash # Password stored in secret: -superuser kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d ``` ### Create Application User ```bash # Connect to database kubectl exec -it -n cnpg-system my-postgres-1 -- psql -- Create database and user CREATE DATABASE myapp; CREATE USER myapp_user WITH PASSWORD 'secure-password'; GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user; ``` ## Connection from Applications ### Create Secret for Application ```yaml apiVersion: v1 kind: Secret metadata: name: postgres-credentials namespace: my-app type: Opaque stringData: username: myapp_user password: secure-password database: myapp host: my-postgres-rw.cnpg-system.svc port: "5432" ``` ### Use in Application ```yaml env: - name: DATABASE_URL value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)" - name: POSTGRES_USER valueFrom: secretKeyRef: name: postgres-credentials key: username - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: postgres-credentials key: password - name: POSTGRES_HOST valueFrom: secretKeyRef: name: postgres-credentials key: host - name: POSTGRES_PORT valueFrom: secretKeyRef: name: postgres-credentials key: port - name: POSTGRES_DATABASE valueFrom: secretKeyRef: name: postgres-credentials key: database ``` ## Monitoring ### Prometheus Metrics CloudNativePG exposes metrics via PodMonitor. Check Prometheus for: - `cnpg_pg_stat_database_*` - Database statistics - `cnpg_pg_replication_*` - Replication lag - `cnpg_backends_*` - Connection pool stats ### Check Cluster Status ```bash # Get cluster status kubectl get cluster -n cnpg-system # Detailed cluster info kubectl describe cluster my-postgres -n cnpg-system # Check pods kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres ``` ### View Logs ```bash # Operator logs kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100 # PostgreSQL logs kubectl logs -n cnpg-system my-postgres-1 --tail=100 ``` ## Backup and Recovery ### Manual Backup ```bash kubectl cnpg backup my-postgres -n cnpg-system ``` ### List Backups ```bash kubectl get backup -n cnpg-system ``` ### Point-in-Time Recovery Create a new cluster from a backup: ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: restored-cluster spec: instances: 2 bootstrap: recovery: source: my-postgres recoveryTarget: targetTime: "2024-11-09 10:00:00" externalClusters: - name: my-postgres barmanObjectStore: destinationPath: s3://postgres-backups/ endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 s3Credentials: accessKeyId: name: backup-credentials key: ACCESS_KEY_ID secretAccessKey: name: backup-credentials key: ACCESS_SECRET_KEY ``` ## Maintenance Operations ### Scale Replicas ```bash kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system ``` ### Switchover (Promote Replica) ```bash kubectl cnpg promote my-postgres-2 -n cnpg-system ``` ### Restart Cluster ```bash kubectl cnpg restart my-postgres -n cnpg-system ``` ## Production Recommendations ### 1. High Availability - Use at least 3 instances (1 primary + 2 replicas) - Spread across availability zones using pod anti-affinity - Configure automatic failover thresholds ### 2. Resource Configuration - **Small databases** (<10GB): 2 CPU, 4Gi RAM, 2 instances - **Medium databases** (10-100GB): 4 CPU, 8Gi RAM, 3 instances - **Large databases** (>100GB): 8 CPU, 16Gi RAM, 3+ instances ### 3. PostgreSQL Tuning - Adjust `shared_buffers` to 25% of RAM - Set `effective_cache_size` to 50-75% of RAM - Tune connection limits based on application needs - Use `random_page_cost: 1.1` for SSD storage ### 4. Backup Strategy - Enable automated backups to S3 - Set retention policy (e.g., 30 days) - Test recovery procedures regularly - Monitor backup success ### 5. Monitoring - Set up alerts for replication lag - Monitor connection pool saturation - Track query performance - Watch disk space usage ### 6. Security - Use strong passwords (stored in Kubernetes secrets) - Enable SSL/TLS for connections - Implement network policies - Regular security updates ## Troubleshooting ### Cluster Not Starting ```bash # Check operator logs kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg # Check events kubectl get events -n cnpg-system --sort-by='.lastTimestamp' # Check PVC status kubectl get pvc -n cnpg-system ``` ### Replication Issues ```bash # Check replication status kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;" # Check cluster status kubectl get cluster my-postgres -n cnpg-system -o yaml ``` ### Backup Failures ```bash # Check backup status kubectl get backup -n cnpg-system # View backup logs kubectl logs -n cnpg-system my-postgres-1 | grep -i backup # Test S3 connectivity kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 ``` ### High Resource Usage ```bash # Check resource usage kubectl top pods -n cnpg-system # Check PostgreSQL connections kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;" # Identify slow queries kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;" ``` ## Useful Commands ```bash # Install kubectl-cnpg plugin curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin # List all clusters kubectl cnpg status -n cnpg-system # Get cluster details kubectl cnpg status my-postgres -n cnpg-system # Promote a replica kubectl cnpg promote my-postgres-2 -n cnpg-system # Create backup kubectl cnpg backup my-postgres -n cnpg-system # Reload configuration kubectl cnpg reload my-postgres -n cnpg-system # Get connection info kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo ``` ## Next Steps 1. Deploy the operator: `git add . && git commit -m "Add CloudNativePG" && git push` 2. Wait for ArgoCD to sync 3. Create your first cluster (uncomment extraObjects or create separate app) 4. Set up S3 backups with Ceph Object Storage 5. Test backup and recovery 6. Configure monitoring and alerts ## Additional Resources - [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/) - [API Reference](https://cloudnative-pg.io/documentation/current/api_reference/) - [Best Practices](https://cloudnative-pg.io/documentation/current/guidelines/) - [Monitoring Guide](https://cloudnative-pg.io/documentation/current/monitoring/)