veda/apps/cloudnative-pg
2025-11-09 19:40:47 +01:00
..
templates Added monitoring and loggin stack 2025-11-09 17:12:33 +01:00
application.yaml Update sync-wave annotations for Ceph, Cert-Manager, Gateway API, and Grafana applications; remove unused Harbor Chart.yaml and values.yaml 2025-11-09 19:40:47 +01:00
Chart.yaml Add Application for CloudNativePG 2025-11-09 09:23:21 +01:00
CONFIGURATION.md Added monitoring and loggin stack 2025-11-09 17:12:33 +01:00
README.md Added monitoring and loggin stack 2025-11-09 17:12:33 +01:00
values.yaml Added monitoring and loggin stack 2025-11-09 17:12:33 +01:00

CloudNativePG Setup Guide

Overview

CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides:

  • High availability with automatic failover
  • Automated backups to S3-compatible storage
  • Point-in-time recovery (PITR)
  • Rolling updates with zero downtime
  • Connection pooling with PgBouncer
  • Monitoring with Prometheus

Architecture

┌─────────────────────────────────────┐
│   CloudNativePG Operator            │
│   (Manages PostgreSQL Clusters)     │
└─────────────────────────────────────┘
              │
              ├─────────────────────────┐
              │                         │
     ┌────────▼────────┐       ┌───────▼────────┐
     │  PostgreSQL     │       │  PostgreSQL    │
     │  Primary        │◄─────►│  Replica       │
     │  (Read/Write)   │       │  (Read-only)   │
     └─────────────────┘       └────────────────┘
              │
              │ (Backups)
              ▼
     ┌─────────────────┐
     │  Ceph S3 (RGW)  │
     │  Object Storage │
     └─────────────────┘

Current Configuration

Operator Settings

  • Namespace: cnpg-system
  • Monitoring: Enabled (PodMonitor for Prometheus)
  • Grafana Dashboard: Auto-created
  • Priority Class: system-cluster-critical
  • Resource Limits: Conservative (50m CPU, 100Mi RAM)

Example Cluster (Commented Out)

The values.yaml includes a commented example cluster configuration. See "Creating Your First Cluster" below.

Creating Your First Cluster

Option 1: Using extraObjects in values.yaml

Uncomment the extraObjects section in values.yaml and customize:

extraObjects:
  - apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: my-postgres
      namespace: cnpg-system
    spec:
      instances: 2  # 1 primary + 1 replica
      storage:
        size: 50Gi
        storageClass: ceph-block

Option 2: Separate Application

For production, create a separate ArgoCD Application for each database cluster:

mkdir -p apps/databases/my-app-db

Create apps/databases/my-app-db/cluster.yaml:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-app-db
  namespace: my-app
spec:
  instances: 3
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
  
  storage:
    size: 100Gi
    storageClass: ceph-block
  
  monitoring:
    enablePodMonitor: true
  
  backup:
    retentionPolicy: "30d"
    barmanObjectStore:
      destinationPath: s3://my-app-backups/
      endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
      s3Credentials:
        accessKeyId:
          name: backup-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: backup-credentials
          key: ACCESS_SECRET_KEY
      data:
        compression: bzip2
      wal:
        compression: bzip2

Backup Configuration

Prerequisites

  1. Create S3 Bucket (using Ceph Object Storage):
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: postgres-backups
  namespace: cnpg-system
spec:
  bucketName: postgres-backups
  storageClassName: ceph-bucket
  additionalConfig:
    maxSize: "500Gi"
  1. Create Credentials Secret:

After creating the ObjectBucketClaim, Rook will generate credentials:

# Get the generated credentials
kubectl get secret postgres-backups -n cnpg-system -o yaml

# The secret will contain:
# - AWS_ACCESS_KEY_ID
# - AWS_SECRET_ACCESS_KEY
  1. Reference in Cluster Spec:

Use these credentials in your PostgreSQL cluster backup configuration (see example above).

Database Access

Connect to Primary (Read/Write)

# Service name pattern: <cluster-name>-rw
kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432

# Connect with psql
psql -h localhost -U postgres -d postgres

Connect to Replica (Read-Only)

# Service name pattern: <cluster-name>-ro
kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432

Get Superuser Password

# Password stored in secret: <cluster-name>-superuser
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d

Create Application User

# Connect to database
kubectl exec -it -n cnpg-system my-postgres-1 -- psql

-- Create database and user
CREATE DATABASE myapp;
CREATE USER myapp_user WITH PASSWORD 'secure-password';
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;

Connection from Applications

Create Secret for Application

apiVersion: v1
kind: Secret
metadata:
  name: postgres-credentials
  namespace: my-app
type: Opaque
stringData:
  username: myapp_user
  password: secure-password
  database: myapp
  host: my-postgres-rw.cnpg-system.svc
  port: "5432"

Use in Application

env:
  - name: DATABASE_URL
    value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)"
  - name: POSTGRES_USER
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: username
  - name: POSTGRES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: password
  - name: POSTGRES_HOST
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: host
  - name: POSTGRES_PORT
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: port
  - name: POSTGRES_DATABASE
    valueFrom:
      secretKeyRef:
        name: postgres-credentials
        key: database

Monitoring

Prometheus Metrics

CloudNativePG exposes metrics via PodMonitor. Check Prometheus for:

  • cnpg_pg_stat_database_* - Database statistics
  • cnpg_pg_replication_* - Replication lag
  • cnpg_backends_* - Connection pool stats

Check Cluster Status

# Get cluster status
kubectl get cluster -n cnpg-system

# Detailed cluster info
kubectl describe cluster my-postgres -n cnpg-system

# Check pods
kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres

View Logs

# Operator logs
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100

# PostgreSQL logs
kubectl logs -n cnpg-system my-postgres-1 --tail=100

Backup and Recovery

Manual Backup

kubectl cnpg backup my-postgres -n cnpg-system

List Backups

kubectl get backup -n cnpg-system

Point-in-Time Recovery

Create a new cluster from a backup:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: restored-cluster
spec:
  instances: 2
  
  bootstrap:
    recovery:
      source: my-postgres
      recoveryTarget:
        targetTime: "2024-11-09 10:00:00"
  
  externalClusters:
    - name: my-postgres
      barmanObjectStore:
        destinationPath: s3://postgres-backups/
        endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
        s3Credentials:
          accessKeyId:
            name: backup-credentials
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: backup-credentials
            key: ACCESS_SECRET_KEY

Maintenance Operations

Scale Replicas

kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system

Switchover (Promote Replica)

kubectl cnpg promote my-postgres-2 -n cnpg-system

Restart Cluster

kubectl cnpg restart my-postgres -n cnpg-system

Production Recommendations

1. High Availability

  • Use at least 3 instances (1 primary + 2 replicas)
  • Spread across availability zones using pod anti-affinity
  • Configure automatic failover thresholds

2. Resource Configuration

  • Small databases (<10GB): 2 CPU, 4Gi RAM, 2 instances
  • Medium databases (10-100GB): 4 CPU, 8Gi RAM, 3 instances
  • Large databases (>100GB): 8 CPU, 16Gi RAM, 3+ instances

3. PostgreSQL Tuning

  • Adjust shared_buffers to 25% of RAM
  • Set effective_cache_size to 50-75% of RAM
  • Tune connection limits based on application needs
  • Use random_page_cost: 1.1 for SSD storage

4. Backup Strategy

  • Enable automated backups to S3
  • Set retention policy (e.g., 30 days)
  • Test recovery procedures regularly
  • Monitor backup success

5. Monitoring

  • Set up alerts for replication lag
  • Monitor connection pool saturation
  • Track query performance
  • Watch disk space usage

6. Security

  • Use strong passwords (stored in Kubernetes secrets)
  • Enable SSL/TLS for connections
  • Implement network policies
  • Regular security updates

Troubleshooting

Cluster Not Starting

# Check operator logs
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg

# Check events
kubectl get events -n cnpg-system --sort-by='.lastTimestamp'

# Check PVC status
kubectl get pvc -n cnpg-system

Replication Issues

# Check replication status
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;"

# Check cluster status
kubectl get cluster my-postgres -n cnpg-system -o yaml

Backup Failures

# Check backup status
kubectl get backup -n cnpg-system

# View backup logs
kubectl logs -n cnpg-system my-postgres-1 | grep -i backup

# Test S3 connectivity
kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80

High Resource Usage

# Check resource usage
kubectl top pods -n cnpg-system

# Check PostgreSQL connections
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;"

# Identify slow queries
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;"

Useful Commands

# Install kubectl-cnpg plugin
curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin

# List all clusters
kubectl cnpg status -n cnpg-system

# Get cluster details
kubectl cnpg status my-postgres -n cnpg-system

# Promote a replica
kubectl cnpg promote my-postgres-2 -n cnpg-system

# Create backup
kubectl cnpg backup my-postgres -n cnpg-system

# Reload configuration
kubectl cnpg reload my-postgres -n cnpg-system

# Get connection info
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo

Next Steps

  1. Deploy the operator: git add . && git commit -m "Add CloudNativePG" && git push
  2. Wait for ArgoCD to sync
  3. Create your first cluster (uncomment extraObjects or create separate app)
  4. Set up S3 backups with Ceph Object Storage
  5. Test backup and recovery
  6. Configure monitoring and alerts

Additional Resources