diff --git a/apps/cloudnative-pg/Chart.yaml b/apps/cloudnative-pg/Chart.yaml new file mode 100644 index 0000000..6e25e7a --- /dev/null +++ b/apps/cloudnative-pg/Chart.yaml @@ -0,0 +1,11 @@ +apiVersion: v2 +name: cloudnative-pg +description: CloudNativePG operator wrapper chart +type: application +version: 1.0.0 +appVersion: "1.27.1" + +dependencies: + - name: cloudnative-pg + version: 0.26.1 + repository: https://cloudnative-pg.github.io/charts diff --git a/apps/cloudnative-pg/README.md b/apps/cloudnative-pg/README.md new file mode 100644 index 0000000..2736763 --- /dev/null +++ b/apps/cloudnative-pg/README.md @@ -0,0 +1,586 @@ +# CloudNativePG Setup Guide + +## Overview + +CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides: + +- High availability with automatic failover +- Automated backups to S3-compatible storage +- Point-in-time recovery (PITR) +- Rolling updates with zero downtime +- Connection pooling with PgBouncer +- Monitoring with Prometheus + +## Architecture + +```ascii +┌─────────────────────────────────────┐ +│ CloudNativePG Operator │ +│ (Manages PostgreSQL Clusters) │ +└─────────────────────────────────────┘ + │ + ├─────────────────────────┐ + │ │ + ┌────────▼────────┐ ┌───────▼────────┐ + │ PostgreSQL │ │ PostgreSQL │ + │ Primary │◄─────►│ Replica │ + │ (Read/Write) │ │ (Read-only) │ + └─────────────────┘ └────────────────┘ + │ + │ (Backups) + ▼ + ┌─────────────────┐ + │ Ceph S3 (RGW) │ + │ Object Storage │ + └─────────────────┘ +``` + +## Current Configuration + +### Operator Settings + +- **Namespace**: `cnpg-system` +- **Monitoring**: Enabled (PodMonitor for Prometheus) +- **Grafana Dashboard**: Auto-created +- **Priority Class**: `system-cluster-critical` +- **Resource Limits**: Conservative (50m CPU, 100Mi RAM) + +### Example Cluster (Commented Out) + +The `values.yaml` includes a commented example cluster configuration with: +- **Storage**: `local-path` StorageClass (for development) +- **Backup**: Barman-cloud plugin with S3 (Ceph RGW) backend +- **Note**: See "Storage Considerations" section below + +## ⚠️ Storage Considerations + +### Local Path vs Ceph Block + +The example cluster uses `local-path` StorageClass, which is suitable for: +- ✅ **Development/Testing**: Quick setup, no Ceph dependency +- ✅ **Single-node scenarios**: When HA isn't required +- ✅ **Learning/Experimentation**: Testing PostgreSQL features + +**For production use, change to `ceph-block`:** + +```yaml +storage: + storageClass: ceph-block # Instead of local-path + size: 50Gi +``` + +### Why Ceph Block for Production? + +| Feature | local-path | ceph-block | +|---------|-----------|------------| +| **High Availability** | ❌ No | ✅ Yes | +| **Data Replication** | ❌ No | ✅ 2x copies | +| **Pod Mobility** | ❌ Pinned to node | ✅ Can move | +| **Snapshots** | ❌ No | ✅ Yes | +| **Auto Resize** | ❌ No | ✅ Yes | +| **Node Failure** | ❌ Data unavailable | ✅ Survives | + +### Hybrid Approach (Recommended for Dev) + +Even with local-path storage, the S3 backup provides safety: +- **Primary storage**: local-path (fast, simple) +- **Backups**: Ceph S3 (safe, replicated, off-node) +- **Recovery**: Restore from S3 if node fails + +This gives you: +- ✅ Point-in-time recovery +- ✅ Off-node backup storage +- ✅ Disaster recovery capability +- ✅ Fast local performance +- ⚠️ But no automatic HA + +## Barman-Cloud Backup Plugin + +CloudNativePG uses the modern barman-cloud toolset for backups. + +### Configuration Features: + +```yaml +backup: + barmanObjectStore: + # Parallel processing + data: + compression: bzip2 + jobs: 2 # Parallel compression threads + wal: + compression: bzip2 + maxParallel: 2 # Parallel WAL uploads + + # Metadata tags + tags: + environment: "development" + managed-by: "cloudnative-pg" + + # Backup lineage tracking + historyTags: + environment: "development" +``` + +### Plugin Benefits: +- ✅ **Better S3 compatibility**: Works with all S3-compatible stores +- ✅ **Improved parallelism**: Faster backups for large databases +- ✅ **Enhanced error handling**: Better retry logic +- ✅ **Cloud-native design**: Optimized for object storage +- ✅ **Metadata tagging**: Better backup organization + +### Backup Strategy: +1. **Continuous WAL archiving**: Real-time transaction logs to S3 +2. **Scheduled full backups**: Complete database snapshots +3. **Point-in-time recovery**: Restore to any timestamp +4. **Retention policies**: Automatic cleanup of old backups + +## Creating Your First Cluster + +### Option 1: Using extraObjects in values.yaml (Development) + +Uncomment the `extraObjects` section in `values.yaml` for a development cluster: + +```yaml +extraObjects: + - apiVersion: postgresql.cnpg.io/v1 + kind: Cluster + metadata: + name: my-postgres + namespace: cnpg-system + spec: + instances: 2 # 1 primary + 1 replica + + # Development: local-path for fast local storage + storage: + size: 50Gi + storageClass: local-path + + # Backup to Ceph S3 for safety + backup: + retentionPolicy: "30d" + barmanObjectStore: + destinationPath: s3://postgres-backups/ + endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 + s3Credentials: + accessKeyId: + name: postgres-backup-credentials + key: ACCESS_KEY_ID + secretAccessKey: + name: postgres-backup-credentials + key: ACCESS_SECRET_KEY + data: + compression: bzip2 + jobs: 2 + wal: + compression: bzip2 + maxParallel: 2 +``` + +### Option 2: Separate Application (Production) + +For production, create a separate ArgoCD Application with ceph-block storage: + +```bash +mkdir -p apps/databases/my-app-db +``` + +Create `apps/databases/my-app-db/cluster.yaml`: +```yaml +apiVersion: postgresql.cnpg.io/v1 +kind: Cluster +metadata: + name: my-app-db + namespace: my-app +spec: + instances: 3 + + postgresql: + parameters: + max_connections: "200" + shared_buffers: "256MB" + + # Production: ceph-block for HA + storage: + size: 100Gi + storageClass: ceph-block + + monitoring: + enablePodMonitor: true + + # Barman-cloud backup configuration + backup: + retentionPolicy: "30d" + barmanObjectStore: + destinationPath: s3://my-app-backups/ + endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 + s3Credentials: + accessKeyId: + name: backup-credentials + key: ACCESS_KEY_ID + secretAccessKey: + name: backup-credentials + key: ACCESS_SECRET_KEY + data: + compression: bzip2 + jobs: 2 # Parallel compression + wal: + compression: bzip2 + maxParallel: 2 # Parallel WAL uploads + tags: + environment: "production" + application: "my-app" + wal: + compression: bzip2 +``` + +## Backup Configuration + +### Prerequisites + +1. **Create S3 Bucket** (using Ceph Object Storage): + +```yaml +apiVersion: objectbucket.io/v1alpha1 +kind: ObjectBucketClaim +metadata: + name: postgres-backups + namespace: cnpg-system +spec: + bucketName: postgres-backups + storageClassName: ceph-bucket + additionalConfig: + maxSize: "500Gi" +``` + +2. **Create Credentials Secret**: + +After creating the ObjectBucketClaim, Rook will generate credentials: + +```bash +# Get the generated credentials +kubectl get secret postgres-backups -n cnpg-system -o yaml + +# The secret will contain: +# - AWS_ACCESS_KEY_ID +# - AWS_SECRET_ACCESS_KEY +``` + +3. **Reference in Cluster Spec**: + +Use these credentials in your PostgreSQL cluster backup configuration (see example above). + +## Database Access + +### Connect to Primary (Read/Write) + +```bash +# Service name pattern: -rw +kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432 + +# Connect with psql +psql -h localhost -U postgres -d postgres +``` + +### Connect to Replica (Read-Only) + +```bash +# Service name pattern: -ro +kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432 +``` + +### Get Superuser Password + +```bash +# Password stored in secret: -superuser +kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d +``` + +### Create Application User + +```bash +# Connect to database +kubectl exec -it -n cnpg-system my-postgres-1 -- psql + +-- Create database and user +CREATE DATABASE myapp; +CREATE USER myapp_user WITH PASSWORD 'secure-password'; +GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user; +``` + +## Connection from Applications + +### Create Secret for Application + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: postgres-credentials + namespace: my-app +type: Opaque +stringData: + username: myapp_user + password: secure-password + database: myapp + host: my-postgres-rw.cnpg-system.svc + port: "5432" +``` + +### Use in Application + +```yaml +env: + - name: DATABASE_URL + value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)" + - name: POSTGRES_USER + valueFrom: + secretKeyRef: + name: postgres-credentials + key: username + - name: POSTGRES_PASSWORD + valueFrom: + secretKeyRef: + name: postgres-credentials + key: password + - name: POSTGRES_HOST + valueFrom: + secretKeyRef: + name: postgres-credentials + key: host + - name: POSTGRES_PORT + valueFrom: + secretKeyRef: + name: postgres-credentials + key: port + - name: POSTGRES_DATABASE + valueFrom: + secretKeyRef: + name: postgres-credentials + key: database +``` + +## Monitoring + +### Prometheus Metrics + +CloudNativePG exposes metrics via PodMonitor. Check Prometheus for: +- `cnpg_pg_stat_database_*` - Database statistics +- `cnpg_pg_replication_*` - Replication lag +- `cnpg_backends_*` - Connection pool stats + +### Check Cluster Status + +```bash +# Get cluster status +kubectl get cluster -n cnpg-system + +# Detailed cluster info +kubectl describe cluster my-postgres -n cnpg-system + +# Check pods +kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres +``` + +### View Logs + +```bash +# Operator logs +kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100 + +# PostgreSQL logs +kubectl logs -n cnpg-system my-postgres-1 --tail=100 +``` + +## Backup and Recovery + +### Manual Backup + +```bash +kubectl cnpg backup my-postgres -n cnpg-system +``` + +### List Backups + +```bash +kubectl get backup -n cnpg-system +``` + +### Point-in-Time Recovery + +Create a new cluster from a backup: + +```yaml +apiVersion: postgresql.cnpg.io/v1 +kind: Cluster +metadata: + name: restored-cluster +spec: + instances: 2 + + bootstrap: + recovery: + source: my-postgres + recoveryTarget: + targetTime: "2024-11-09 10:00:00" + + externalClusters: + - name: my-postgres + barmanObjectStore: + destinationPath: s3://postgres-backups/ + endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 + s3Credentials: + accessKeyId: + name: backup-credentials + key: ACCESS_KEY_ID + secretAccessKey: + name: backup-credentials + key: ACCESS_SECRET_KEY +``` + +## Maintenance Operations + +### Scale Replicas + +```bash +kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system +``` + +### Switchover (Promote Replica) + +```bash +kubectl cnpg promote my-postgres-2 -n cnpg-system +``` + +### Restart Cluster + +```bash +kubectl cnpg restart my-postgres -n cnpg-system +``` + +## Production Recommendations + +### 1. High Availability +- Use at least 3 instances (1 primary + 2 replicas) +- Spread across availability zones using pod anti-affinity +- Configure automatic failover thresholds + +### 2. Resource Configuration +- **Small databases** (<10GB): 2 CPU, 4Gi RAM, 2 instances +- **Medium databases** (10-100GB): 4 CPU, 8Gi RAM, 3 instances +- **Large databases** (>100GB): 8 CPU, 16Gi RAM, 3+ instances + +### 3. PostgreSQL Tuning +- Adjust `shared_buffers` to 25% of RAM +- Set `effective_cache_size` to 50-75% of RAM +- Tune connection limits based on application needs +- Use `random_page_cost: 1.1` for SSD storage + +### 4. Backup Strategy +- Enable automated backups to S3 +- Set retention policy (e.g., 30 days) +- Test recovery procedures regularly +- Monitor backup success + +### 5. Monitoring +- Set up alerts for replication lag +- Monitor connection pool saturation +- Track query performance +- Watch disk space usage + +### 6. Security +- Use strong passwords (stored in Kubernetes secrets) +- Enable SSL/TLS for connections +- Implement network policies +- Regular security updates + +## Troubleshooting + +### Cluster Not Starting + +```bash +# Check operator logs +kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg + +# Check events +kubectl get events -n cnpg-system --sort-by='.lastTimestamp' + +# Check PVC status +kubectl get pvc -n cnpg-system +``` + +### Replication Issues + +```bash +# Check replication status +kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;" + +# Check cluster status +kubectl get cluster my-postgres -n cnpg-system -o yaml +``` + +### Backup Failures + +```bash +# Check backup status +kubectl get backup -n cnpg-system + +# View backup logs +kubectl logs -n cnpg-system my-postgres-1 | grep -i backup + +# Test S3 connectivity +kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 +``` + +### High Resource Usage + +```bash +# Check resource usage +kubectl top pods -n cnpg-system + +# Check PostgreSQL connections +kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;" + +# Identify slow queries +kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;" +``` + +## Useful Commands + +```bash +# Install kubectl-cnpg plugin +curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin + +# List all clusters +kubectl cnpg status -n cnpg-system + +# Get cluster details +kubectl cnpg status my-postgres -n cnpg-system + +# Promote a replica +kubectl cnpg promote my-postgres-2 -n cnpg-system + +# Create backup +kubectl cnpg backup my-postgres -n cnpg-system + +# Reload configuration +kubectl cnpg reload my-postgres -n cnpg-system + +# Get connection info +kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo +``` + +## Next Steps + +1. Deploy the operator: `git add . && git commit -m "Add CloudNativePG" && git push` +2. Wait for ArgoCD to sync +3. Create your first cluster (uncomment extraObjects or create separate app) +4. Set up S3 backups with Ceph Object Storage +5. Test backup and recovery +6. Configure monitoring and alerts + +## Additional Resources + +- [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/) +- [API Reference](https://cloudnative-pg.io/documentation/current/api_reference/) +- [Best Practices](https://cloudnative-pg.io/documentation/current/guidelines/) +- [Monitoring Guide](https://cloudnative-pg.io/documentation/current/monitoring/) diff --git a/apps/cloudnative-pg/application.yaml b/apps/cloudnative-pg/application.yaml new file mode 100644 index 0000000..03361e6 --- /dev/null +++ b/apps/cloudnative-pg/application.yaml @@ -0,0 +1,29 @@ +apiVersion: argoproj.io/v1alpha1 +kind: Application +metadata: + name: cloudnative-pg + namespace: argocd + annotations: + argocd.argoproj.io/sync-wave: "0" + finalizers: + - resources-finalizer.argocd.argoproj.io +spec: + project: default + source: + repoURL: https://git.mvzijl.nl/marco/veda.git + targetRevision: applicationset-rewrite + path: apps/cloudnative-pg + helm: + releaseName: cloudnative-pg + valueFiles: + - values.yaml + destination: + server: https://kubernetes.default.svc + namespace: cnpg-system + syncPolicy: + automated: + prune: true + selfHeal: true + syncOptions: + - CreateNamespace=true + - ServerSideApply=true diff --git a/apps/cloudnative-pg/values.yaml b/apps/cloudnative-pg/values.yaml new file mode 100644 index 0000000..d8bbde5 --- /dev/null +++ b/apps/cloudnative-pg/values.yaml @@ -0,0 +1,87 @@ +cloudnative-pg: + + monitoring: + podMonitorEnabled: true + grafanaDashboard: + create: true + + resources: + requests: + cpu: 50m + memory: 100Mi + limits: + memory: 256Mi + + priorityClassName: system-cluster-critical + +# Example PostgreSQL cluster configuration +# Uncomment and customize to create a test cluster +# extraObjects: +# - apiVersion: postgresql.cnpg.io/v1 +# kind: Cluster +# metadata: +# name: postgres-example +# namespace: cnpg-system +# spec: +# instances: 2 +# resources: +# requests: +# memory: 128Mi +# cpu: 100m +# limits: +# memory: 1Gi +# cpu: '1' +# postgresql: +# parameters: +# max_connections: "200" +# shared_buffers: "128MB" +# effective_cache_size: "256MB" +# maintenance_work_mem: "16MB" +# random_page_cost: "1.1" +# effective_io_concurrency: "300" +# monitoring: +# enablePodMonitor: true +# +# # Use local-path-provisioner for storage +# storage: +# size: 50Gi +# storageClass: local-path +# +# # Backup configuration using new plugin system +# backup: +# retentionPolicy: "30d" +# +# # Volume for barman backups (uses same StorageClass as main storage) +# volumeSnapshot: +# className: local-path +# +# # S3 backup using barman-cloud plugin +# barmanObjectStore: +# destinationPath: s3://postgres-backups/ +# endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80 +# +# # S3 credentials reference +# s3Credentials: +# accessKeyId: +# name: postgres-backup-credentials +# key: ACCESS_KEY_ID +# secretAccessKey: +# name: postgres-backup-credentials +# key: ACCESS_SECRET_KEY +# +# # Compression settings +# data: +# compression: bzip2 +# jobs: 2 +# wal: +# compression: bzip2 +# maxParallel: 2 +# +# # Tags for backup organization +# tags: +# environment: "development" +# managed-by: "cloudnative-pg" +# +# # Backup history and retention +# historyTags: +# environment: "development"