Add Application for CloudNativePG

2025-11-09 09:23:21 +01:00 · 2025-11-09 09:23:21 +01:00 · 81b69bc8e3
commit 81b69bc8e3
parent 9d626b45d1
4 changed files with 713 additions and 0 deletions
--- a/apps/cloudnative-pg/Chart.yaml
+++ b/apps/cloudnative-pg/Chart.yaml
@ -0,0 +1,11 @@
 apiVersion: v2
 name: cloudnative-pg
 description: CloudNativePG operator wrapper chart
 type: application
 version: 1.0.0
 appVersion: "1.27.1"
 dependencies:
  - name: cloudnative-pg
    version: 0.26.1
    repository: https://cloudnative-pg.github.io/charts
--- a/apps/cloudnative-pg/README.md
+++ b/apps/cloudnative-pg/README.md
@ -0,0 +1,586 @@
 # CloudNativePG Setup Guide
 ## Overview
 CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides:
 - High availability with automatic failover
 - Automated backups to S3-compatible storage
 - Point-in-time recovery (PITR)
 - Rolling updates with zero downtime
 - Connection pooling with PgBouncer
 - Monitoring with Prometheus
 ## Architecture
 ```ascii
 ┌─────────────────────────────────────┐
 │   CloudNativePG Operator            │
 │   (Manages PostgreSQL Clusters)     │
 └─────────────────────────────────────┘
              │
              ├─────────────────────────┐
              │                         │
     ┌────────▼────────┐       ┌───────▼────────┐
     │  PostgreSQL     │       │  PostgreSQL    │
     │  Primary        │◄─────►│  Replica       │
     │  (Read/Write)   │       │  (Read-only)   │
     └─────────────────┘       └────────────────┘
              │
              │ (Backups)
              ▼
     ┌─────────────────┐
     │  Ceph S3 (RGW)  │
     │  Object Storage │
     └─────────────────┘
 ```
 ## Current Configuration
 ### Operator Settings
 - **Namespace**: `cnpg-system`
 - **Monitoring**: Enabled (PodMonitor for Prometheus)
 - **Grafana Dashboard**: Auto-created
 - **Priority Class**: `system-cluster-critical`
 - **Resource Limits**: Conservative (50m CPU, 100Mi RAM)
 ### Example Cluster (Commented Out)
 The `values.yaml` includes a commented example cluster configuration with:
 - **Storage**: `local-path` StorageClass (for development)
 - **Backup**: Barman-cloud plugin with S3 (Ceph RGW) backend
 - **Note**: See "Storage Considerations" section below
 ## ⚠️ Storage Considerations
 ### Local Path vs Ceph Block
 The example cluster uses `local-path` StorageClass, which is suitable for:
 - ✅ **Development/Testing**: Quick setup, no Ceph dependency
 - ✅ **Single-node scenarios**: When HA isn't required
 - ✅ **Learning/Experimentation**: Testing PostgreSQL features
 **For production use, change to `ceph-block`:**
 ```yaml
 storage:
  storageClass: ceph-block  # Instead of local-path
  size: 50Gi
 ```
 ### Why Ceph Block for Production?
 | Feature | local-path | ceph-block |
 |---------|-----------|------------|
 | **High Availability** | ❌ No | ✅ Yes |
 | **Data Replication** | ❌ No | ✅ 2x copies |
 | **Pod Mobility** | ❌ Pinned to node | ✅ Can move |
 | **Snapshots** | ❌ No | ✅ Yes |
 | **Auto Resize** | ❌ No | ✅ Yes |
 | **Node Failure** | ❌ Data unavailable | ✅ Survives |
 ### Hybrid Approach (Recommended for Dev)
 Even with local-path storage, the S3 backup provides safety:
 - **Primary storage**: local-path (fast, simple)
 - **Backups**: Ceph S3 (safe, replicated, off-node)
 - **Recovery**: Restore from S3 if node fails
 This gives you:
 - ✅ Point-in-time recovery
 - ✅ Off-node backup storage
 - ✅ Disaster recovery capability
 - ✅ Fast local performance
 - ⚠️ But no automatic HA
 ## Barman-Cloud Backup Plugin
 CloudNativePG uses the modern barman-cloud toolset for backups.
 ### Configuration Features:
 ```yaml
 backup:
  barmanObjectStore:
    # Parallel processing
    data:
      compression: bzip2
      jobs: 2              # Parallel compression threads
    wal:
      compression: bzip2
      maxParallel: 2       # Parallel WAL uploads
    # Metadata tags
    tags:
      environment: "development"
      managed-by: "cloudnative-pg"
    # Backup lineage tracking
    historyTags:
      environment: "development"
 ```
 ### Plugin Benefits:
 - ✅ **Better S3 compatibility**: Works with all S3-compatible stores
 - ✅ **Improved parallelism**: Faster backups for large databases
 - ✅ **Enhanced error handling**: Better retry logic
 - ✅ **Cloud-native design**: Optimized for object storage
 - ✅ **Metadata tagging**: Better backup organization
 ### Backup Strategy:
 1. **Continuous WAL archiving**: Real-time transaction logs to S3
 2. **Scheduled full backups**: Complete database snapshots
 3. **Point-in-time recovery**: Restore to any timestamp
 4. **Retention policies**: Automatic cleanup of old backups
 ## Creating Your First Cluster
 ### Option 1: Using extraObjects in values.yaml (Development)
 Uncomment the `extraObjects` section in `values.yaml` for a development cluster:
 ```yaml
 extraObjects:
  - apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: my-postgres
      namespace: cnpg-system
    spec:
      instances: 2  # 1 primary + 1 replica
      # Development: local-path for fast local storage
      storage:
        size: 50Gi
        storageClass: local-path
      # Backup to Ceph S3 for safety
      backup:
        retentionPolicy: "30d"
        barmanObjectStore:
          destinationPath: s3://postgres-backups/
          endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
          s3Credentials:
            accessKeyId:
              name: postgres-backup-credentials
              key: ACCESS_KEY_ID
            secretAccessKey:
              name: postgres-backup-credentials
              key: ACCESS_SECRET_KEY
          data:
            compression: bzip2
            jobs: 2
          wal:
            compression: bzip2
            maxParallel: 2
 ```
 ### Option 2: Separate Application (Production)
 For production, create a separate ArgoCD Application with ceph-block storage:
 ```bash
 mkdir -p apps/databases/my-app-db
 ```
 Create `apps/databases/my-app-db/cluster.yaml`:
 ```yaml
 apiVersion: postgresql.cnpg.io/v1
 kind: Cluster
 metadata:
  name: my-app-db
  namespace: my-app
 spec:
  instances: 3
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
  # Production: ceph-block for HA
  storage:
    size: 100Gi
    storageClass: ceph-block
  monitoring:
    enablePodMonitor: true
  # Barman-cloud backup configuration
  backup:
    retentionPolicy: "30d"
    barmanObjectStore:
      destinationPath: s3://my-app-backups/
      endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
      s3Credentials:
        accessKeyId:
          name: backup-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-credentials
          key: ACCESS_SECRET_KEY
      data:
        compression: bzip2
        jobs: 2              # Parallel compression
      wal:
        compression: bzip2
        maxParallel: 2       # Parallel WAL uploads
      tags:
        environment: "production"
        application: "my-app"
      wal:
        compression: bzip2
 ```
 ## Backup Configuration
 ### Prerequisites
 1. **Create S3 Bucket** (using Ceph Object Storage):
 ```yaml
 apiVersion: objectbucket.io/v1alpha1
 kind: ObjectBucketClaim
 metadata:
  name: postgres-backups
  namespace: cnpg-system
 spec:
  bucketName: postgres-backups
  storageClassName: ceph-bucket
  additionalConfig:
    maxSize: "500Gi"
 ```
 2. **Create Credentials Secret**:
 After creating the ObjectBucketClaim, Rook will generate credentials:
 ```bash
 # Get the generated credentials
 kubectl get secret postgres-backups -n cnpg-system -o yaml
 # The secret will contain:
 # - AWS_ACCESS_KEY_ID
 # - AWS_SECRET_ACCESS_KEY
 ```
 3. **Reference in Cluster Spec**:
 Use these credentials in your PostgreSQL cluster backup configuration (see example above).
 ## Database Access
 ### Connect to Primary (Read/Write)
 ```bash
 # Service name pattern: <cluster-name>-rw
 kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432
 # Connect with psql
 psql -h localhost -U postgres -d postgres
 ```
 ### Connect to Replica (Read-Only)
 ```bash
 # Service name pattern: <cluster-name>-ro
 kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432
 ```
 ### Get Superuser Password
 ```bash
 # Password stored in secret: <cluster-name>-superuser
 kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d
 ```
 ### Create Application User
 ```bash
 # Connect to database
 kubectl exec -it -n cnpg-system my-postgres-1 -- psql
 -- Create database and user
 CREATE DATABASE myapp;
 CREATE USER myapp_user WITH PASSWORD 'secure-password';
 GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;
 ```
 ## Connection from Applications
 ### Create Secret for Application
 ```yaml
 apiVersion: v1
 kind: Secret
 metadata:
  name: postgres-credentials
  namespace: my-app
 type: Opaque
 stringData:
  username: myapp_user
  password: secure-password
  database: myapp
  host: my-postgres-rw.cnpg-system.svc
  port: "5432"
 ```
 ### Use in Application
 ```yaml
 env:
  - name: DATABASE_URL
    value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)"
  - name: POSTGRES_USER
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: username
  - name: POSTGRES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: password
  - name: POSTGRES_HOST
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: host
  - name: POSTGRES_PORT
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: port
  - name: POSTGRES_DATABASE
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: database
 ```
 ## Monitoring
 ### Prometheus Metrics
 CloudNativePG exposes metrics via PodMonitor. Check Prometheus for:
 - `cnpg_pg_stat_database_*` - Database statistics
 - `cnpg_pg_replication_*` - Replication lag
 - `cnpg_backends_*` - Connection pool stats
 ### Check Cluster Status
 ```bash
 # Get cluster status
 kubectl get cluster -n cnpg-system
 # Detailed cluster info
 kubectl describe cluster my-postgres -n cnpg-system
 # Check pods
 kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres
 ```
 ### View Logs
 ```bash
 # Operator logs
 kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100
 # PostgreSQL logs
 kubectl logs -n cnpg-system my-postgres-1 --tail=100
 ```
 ## Backup and Recovery
 ### Manual Backup
 ```bash
 kubectl cnpg backup my-postgres -n cnpg-system
 ```
 ### List Backups
 ```bash
 kubectl get backup -n cnpg-system
 ```
 ### Point-in-Time Recovery
 Create a new cluster from a backup:
 ```yaml
 apiVersion: postgresql.cnpg.io/v1
 kind: Cluster
 metadata:
  name: restored-cluster
 spec:
  instances: 2
  bootstrap:
    recovery:
      source: my-postgres
      recoveryTarget:
        targetTime: "2024-11-09 10:00:00"
  externalClusters:
    - name: my-postgres
      barmanObjectStore:
        destinationPath: s3://postgres-backups/
        endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
        s3Credentials:
          accessKeyId:
            name: backup-credentials
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: backup-credentials
            key: ACCESS_SECRET_KEY
 ```
 ## Maintenance Operations
 ### Scale Replicas
 ```bash
 kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system
 ```
 ### Switchover (Promote Replica)
 ```bash
 kubectl cnpg promote my-postgres-2 -n cnpg-system
 ```
 ### Restart Cluster
 ```bash
 kubectl cnpg restart my-postgres -n cnpg-system
 ```
 ## Production Recommendations
 ### 1. High Availability
 - Use at least 3 instances (1 primary + 2 replicas)
 - Spread across availability zones using pod anti-affinity
 - Configure automatic failover thresholds
 ### 2. Resource Configuration
 - **Small databases** (<10GB): 2 CPU, 4Gi RAM, 2 instances
 - **Medium databases** (10-100GB): 4 CPU, 8Gi RAM, 3 instances
 - **Large databases** (>100GB): 8 CPU, 16Gi RAM, 3+ instances
 ### 3. PostgreSQL Tuning
 - Adjust `shared_buffers` to 25% of RAM
 - Set `effective_cache_size` to 50-75% of RAM
 - Tune connection limits based on application needs
 - Use `random_page_cost: 1.1` for SSD storage
 ### 4. Backup Strategy
 - Enable automated backups to S3
 - Set retention policy (e.g., 30 days)
 - Test recovery procedures regularly
 - Monitor backup success
 ### 5. Monitoring
 - Set up alerts for replication lag
 - Monitor connection pool saturation
 - Track query performance
 - Watch disk space usage
 ### 6. Security
 - Use strong passwords (stored in Kubernetes secrets)
 - Enable SSL/TLS for connections
 - Implement network policies
 - Regular security updates
 ## Troubleshooting
 ### Cluster Not Starting
 ```bash
 # Check operator logs
 kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg
 # Check events
 kubectl get events -n cnpg-system --sort-by='.lastTimestamp'
 # Check PVC status
 kubectl get pvc -n cnpg-system
 ```
 ### Replication Issues
 ```bash
 # Check replication status
 kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;"
 # Check cluster status
 kubectl get cluster my-postgres -n cnpg-system -o yaml
 ```
 ### Backup Failures
 ```bash
 # Check backup status
 kubectl get backup -n cnpg-system
 # View backup logs
 kubectl logs -n cnpg-system my-postgres-1 | grep -i backup
 # Test S3 connectivity
 kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
 ```
 ### High Resource Usage
 ```bash
 # Check resource usage
 kubectl top pods -n cnpg-system
 # Check PostgreSQL connections
 kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;"
 # Identify slow queries
 kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;"
 ```
 ## Useful Commands
 ```bash
 # Install kubectl-cnpg plugin
 curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin
 # List all clusters
 kubectl cnpg status -n cnpg-system
 # Get cluster details
 kubectl cnpg status my-postgres -n cnpg-system
 # Promote a replica
 kubectl cnpg promote my-postgres-2 -n cnpg-system
 # Create backup
 kubectl cnpg backup my-postgres -n cnpg-system
 # Reload configuration
 kubectl cnpg reload my-postgres -n cnpg-system
 # Get connection info
 kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo
 ```
 ## Next Steps
 1. Deploy the operator: `git add . && git commit -m "Add CloudNativePG" && git push`
 2. Wait for ArgoCD to sync
 3. Create your first cluster (uncomment extraObjects or create separate app)
 4. Set up S3 backups with Ceph Object Storage
 5. Test backup and recovery
 6. Configure monitoring and alerts
 ## Additional Resources
 - [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/)
 - [API Reference](https://cloudnative-pg.io/documentation/current/api_reference/)
 - [Best Practices](https://cloudnative-pg.io/documentation/current/guidelines/)
 - [Monitoring Guide](https://cloudnative-pg.io/documentation/current/monitoring/)
--- a/apps/cloudnative-pg/application.yaml
+++ b/apps/cloudnative-pg/application.yaml
@ -0,0 +1,29 @@
 apiVersion: argoproj.io/v1alpha1
 kind: Application
 metadata:
  name: cloudnative-pg
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "0"
  finalizers:
    - resources-finalizer.argocd.argoproj.io
 spec:
  project: default
  source:
    repoURL: https://git.mvzijl.nl/marco/veda.git
    targetRevision: applicationset-rewrite
    path: apps/cloudnative-pg
    helm:
      releaseName: cloudnative-pg
      valueFiles:
        - values.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: cnpg-system
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
--- a/apps/cloudnative-pg/values.yaml
+++ b/apps/cloudnative-pg/values.yaml
@ -0,0 +1,87 @@
 cloudnative-pg:
  monitoring:
    podMonitorEnabled: true
    grafanaDashboard:
      create: true
  resources:
    requests:
      cpu: 50m
      memory: 100Mi
    limits:
      memory: 256Mi
  priorityClassName: system-cluster-critical
 # Example PostgreSQL cluster configuration
 # Uncomment and customize to create a test cluster
 # extraObjects:
 #   - apiVersion: postgresql.cnpg.io/v1
 #     kind: Cluster
 #     metadata:
 #       name: postgres-example
 #       namespace: cnpg-system
 #     spec:
 #       instances: 2
 #       resources:
 #         requests:
 #           memory: 128Mi
 #           cpu: 100m
 #         limits:
 #           memory: 1Gi
 #           cpu: '1'
 #       postgresql:
 #         parameters:
 #           max_connections: "200"
 #           shared_buffers: "128MB"
 #           effective_cache_size: "256MB"
 #           maintenance_work_mem: "16MB"
 #           random_page_cost: "1.1"
 #           effective_io_concurrency: "300"
 #       monitoring:
 #         enablePodMonitor: true
 #       
 #       # Use local-path-provisioner for storage
 #       storage:
 #         size: 50Gi
 #         storageClass: local-path
 #       
 #       # Backup configuration using new plugin system
 #       backup:
 #         retentionPolicy: "30d"
 #         
 #         # Volume for barman backups (uses same StorageClass as main storage)
 #         volumeSnapshot:
 #           className: local-path
 #         
 #         # S3 backup using barman-cloud plugin
 #         barmanObjectStore:
 #           destinationPath: s3://postgres-backups/
 #           endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
 #           
 #           # S3 credentials reference
 #           s3Credentials:
 #             accessKeyId:
 #               name: postgres-backup-credentials
 #               key: ACCESS_KEY_ID
 #             secretAccessKey:
 #               name: postgres-backup-credentials
 #               key: ACCESS_SECRET_KEY
 #           
 #           # Compression settings
 #           data:
 #             compression: bzip2
 #             jobs: 2
 #           wal:
 #             compression: bzip2
 #             maxParallel: 2
 #           
 #           # Tags for backup organization
 #           tags:
 #             environment: "development"
 #             managed-by: "cloudnative-pg"
 #           
 #           # Backup history and retention
 #           historyTags:
 #             environment: "development"