| .. | ||
| templates | ||
| application.yaml | ||
| Chart.yaml | ||
| CONFIGURATION.md | ||
| README.md | ||
| values.yaml | ||
CloudNativePG Setup Guide
Overview
CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides:
- High availability with automatic failover
- Automated backups to S3-compatible storage
- Point-in-time recovery (PITR)
- Rolling updates with zero downtime
- Connection pooling with PgBouncer
- Monitoring with Prometheus
Architecture
┌─────────────────────────────────────┐
│ CloudNativePG Operator │
│ (Manages PostgreSQL Clusters) │
└─────────────────────────────────────┘
│
├─────────────────────────┐
│ │
┌────────▼────────┐ ┌───────▼────────┐
│ PostgreSQL │ │ PostgreSQL │
│ Primary │◄─────►│ Replica │
│ (Read/Write) │ │ (Read-only) │
└─────────────────┘ └────────────────┘
│
│ (Backups)
▼
┌─────────────────┐
│ Ceph S3 (RGW) │
│ Object Storage │
└─────────────────┘
Current Configuration
Operator Settings
- Namespace:
cnpg-system - Monitoring: Enabled (PodMonitor for Prometheus)
- Grafana Dashboard: Auto-created
- Priority Class:
system-cluster-critical - Resource Limits: Conservative (50m CPU, 100Mi RAM)
Example Cluster (Commented Out)
The values.yaml includes a commented example cluster configuration. See "Creating Your First Cluster" below.
Creating Your First Cluster
Option 1: Using extraObjects in values.yaml
Uncomment the extraObjects section in values.yaml and customize:
extraObjects:
- apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
namespace: cnpg-system
spec:
instances: 2 # 1 primary + 1 replica
storage:
size: 50Gi
storageClass: ceph-block
Option 2: Separate Application
For production, create a separate ArgoCD Application for each database cluster:
mkdir -p apps/databases/my-app-db
Create apps/databases/my-app-db/cluster.yaml:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-app-db
namespace: my-app
spec:
instances: 3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
storage:
size: 100Gi
storageClass: ceph-block
monitoring:
enablePodMonitor: true
backup:
retentionPolicy: "30d"
barmanObjectStore:
destinationPath: s3://my-app-backups/
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
s3Credentials:
accessKeyId:
name: backup-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-credentials
key: ACCESS_SECRET_KEY
data:
compression: bzip2
wal:
compression: bzip2
Backup Configuration
Prerequisites
- Create S3 Bucket (using Ceph Object Storage):
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: postgres-backups
namespace: cnpg-system
spec:
bucketName: postgres-backups
storageClassName: ceph-bucket
additionalConfig:
maxSize: "500Gi"
- Create Credentials Secret:
After creating the ObjectBucketClaim, Rook will generate credentials:
# Get the generated credentials
kubectl get secret postgres-backups -n cnpg-system -o yaml
# The secret will contain:
# - AWS_ACCESS_KEY_ID
# - AWS_SECRET_ACCESS_KEY
- Reference in Cluster Spec:
Use these credentials in your PostgreSQL cluster backup configuration (see example above).
Database Access
Connect to Primary (Read/Write)
# Service name pattern: <cluster-name>-rw
kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432
# Connect with psql
psql -h localhost -U postgres -d postgres
Connect to Replica (Read-Only)
# Service name pattern: <cluster-name>-ro
kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432
Get Superuser Password
# Password stored in secret: <cluster-name>-superuser
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d
Create Application User
# Connect to database
kubectl exec -it -n cnpg-system my-postgres-1 -- psql
-- Create database and user
CREATE DATABASE myapp;
CREATE USER myapp_user WITH PASSWORD 'secure-password';
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;
Connection from Applications
Create Secret for Application
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
namespace: my-app
type: Opaque
stringData:
username: myapp_user
password: secure-password
database: myapp
host: my-postgres-rw.cnpg-system.svc
port: "5432"
Use in Application
env:
- name: DATABASE_URL
value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)"
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_HOST
valueFrom:
secretKeyRef:
name: postgres-credentials
key: host
- name: POSTGRES_PORT
valueFrom:
secretKeyRef:
name: postgres-credentials
key: port
- name: POSTGRES_DATABASE
valueFrom:
secretKeyRef:
name: postgres-credentials
key: database
Monitoring
Prometheus Metrics
CloudNativePG exposes metrics via PodMonitor. Check Prometheus for:
cnpg_pg_stat_database_*- Database statisticscnpg_pg_replication_*- Replication lagcnpg_backends_*- Connection pool stats
Check Cluster Status
# Get cluster status
kubectl get cluster -n cnpg-system
# Detailed cluster info
kubectl describe cluster my-postgres -n cnpg-system
# Check pods
kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres
View Logs
# Operator logs
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100
# PostgreSQL logs
kubectl logs -n cnpg-system my-postgres-1 --tail=100
Backup and Recovery
Manual Backup
kubectl cnpg backup my-postgres -n cnpg-system
List Backups
kubectl get backup -n cnpg-system
Point-in-Time Recovery
Create a new cluster from a backup:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: restored-cluster
spec:
instances: 2
bootstrap:
recovery:
source: my-postgres
recoveryTarget:
targetTime: "2024-11-09 10:00:00"
externalClusters:
- name: my-postgres
barmanObjectStore:
destinationPath: s3://postgres-backups/
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
s3Credentials:
accessKeyId:
name: backup-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: backup-credentials
key: ACCESS_SECRET_KEY
Maintenance Operations
Scale Replicas
kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system
Switchover (Promote Replica)
kubectl cnpg promote my-postgres-2 -n cnpg-system
Restart Cluster
kubectl cnpg restart my-postgres -n cnpg-system
Production Recommendations
1. High Availability
- Use at least 3 instances (1 primary + 2 replicas)
- Spread across availability zones using pod anti-affinity
- Configure automatic failover thresholds
2. Resource Configuration
- Small databases (<10GB): 2 CPU, 4Gi RAM, 2 instances
- Medium databases (10-100GB): 4 CPU, 8Gi RAM, 3 instances
- Large databases (>100GB): 8 CPU, 16Gi RAM, 3+ instances
3. PostgreSQL Tuning
- Adjust
shared_buffersto 25% of RAM - Set
effective_cache_sizeto 50-75% of RAM - Tune connection limits based on application needs
- Use
random_page_cost: 1.1for SSD storage
4. Backup Strategy
- Enable automated backups to S3
- Set retention policy (e.g., 30 days)
- Test recovery procedures regularly
- Monitor backup success
5. Monitoring
- Set up alerts for replication lag
- Monitor connection pool saturation
- Track query performance
- Watch disk space usage
6. Security
- Use strong passwords (stored in Kubernetes secrets)
- Enable SSL/TLS for connections
- Implement network policies
- Regular security updates
Troubleshooting
Cluster Not Starting
# Check operator logs
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg
# Check events
kubectl get events -n cnpg-system --sort-by='.lastTimestamp'
# Check PVC status
kubectl get pvc -n cnpg-system
Replication Issues
# Check replication status
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;"
# Check cluster status
kubectl get cluster my-postgres -n cnpg-system -o yaml
Backup Failures
# Check backup status
kubectl get backup -n cnpg-system
# View backup logs
kubectl logs -n cnpg-system my-postgres-1 | grep -i backup
# Test S3 connectivity
kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
High Resource Usage
# Check resource usage
kubectl top pods -n cnpg-system
# Check PostgreSQL connections
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;"
# Identify slow queries
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;"
Useful Commands
# Install kubectl-cnpg plugin
curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin
# List all clusters
kubectl cnpg status -n cnpg-system
# Get cluster details
kubectl cnpg status my-postgres -n cnpg-system
# Promote a replica
kubectl cnpg promote my-postgres-2 -n cnpg-system
# Create backup
kubectl cnpg backup my-postgres -n cnpg-system
# Reload configuration
kubectl cnpg reload my-postgres -n cnpg-system
# Get connection info
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo
Next Steps
- Deploy the operator:
git add . && git commit -m "Add CloudNativePG" && git push - Wait for ArgoCD to sync
- Create your first cluster (uncomment extraObjects or create separate app)
- Set up S3 backups with Ceph Object Storage
- Test backup and recovery
- Configure monitoring and alerts