471 lines
11 KiB
Markdown
471 lines
11 KiB
Markdown
# CloudNativePG Setup Guide
|
|
|
|
## Overview
|
|
|
|
CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Kubernetes custom resources. It provides:
|
|
|
|
- High availability with automatic failover
|
|
- Automated backups to S3-compatible storage
|
|
- Point-in-time recovery (PITR)
|
|
- Rolling updates with zero downtime
|
|
- Connection pooling with PgBouncer
|
|
- Monitoring with Prometheus
|
|
|
|
## Architecture
|
|
|
|
```ascii
|
|
┌─────────────────────────────────────┐
|
|
│ CloudNativePG Operator │
|
|
│ (Manages PostgreSQL Clusters) │
|
|
└─────────────────────────────────────┘
|
|
│
|
|
├─────────────────────────┐
|
|
│ │
|
|
┌────────▼────────┐ ┌───────▼────────┐
|
|
│ PostgreSQL │ │ PostgreSQL │
|
|
│ Primary │◄─────►│ Replica │
|
|
│ (Read/Write) │ │ (Read-only) │
|
|
└─────────────────┘ └────────────────┘
|
|
│
|
|
│ (Backups)
|
|
▼
|
|
┌─────────────────┐
|
|
│ Ceph S3 (RGW) │
|
|
│ Object Storage │
|
|
└─────────────────┘
|
|
```
|
|
|
|
## Current Configuration
|
|
|
|
### Operator Settings
|
|
|
|
- **Namespace**: `cnpg-system`
|
|
- **Monitoring**: Enabled (PodMonitor for Prometheus)
|
|
- **Grafana Dashboard**: Auto-created
|
|
- **Priority Class**: `system-cluster-critical`
|
|
- **Resource Limits**: Conservative (50m CPU, 100Mi RAM)
|
|
|
|
### Example Cluster (Commented Out)
|
|
|
|
The `values.yaml` includes a commented example cluster configuration. See "Creating Your First Cluster" below.
|
|
|
|
## Creating Your First Cluster
|
|
|
|
### Option 1: Using extraObjects in values.yaml
|
|
|
|
Uncomment the `extraObjects` section in `values.yaml` and customize:
|
|
|
|
```yaml
|
|
extraObjects:
|
|
- apiVersion: postgresql.cnpg.io/v1
|
|
kind: Cluster
|
|
metadata:
|
|
name: my-postgres
|
|
namespace: cnpg-system
|
|
spec:
|
|
instances: 2 # 1 primary + 1 replica
|
|
storage:
|
|
size: 50Gi
|
|
storageClass: ceph-block
|
|
```
|
|
|
|
### Option 2: Separate Application
|
|
|
|
For production, create a separate ArgoCD Application for each database cluster:
|
|
|
|
```bash
|
|
mkdir -p apps/databases/my-app-db
|
|
```
|
|
|
|
Create `apps/databases/my-app-db/cluster.yaml`:
|
|
```yaml
|
|
apiVersion: postgresql.cnpg.io/v1
|
|
kind: Cluster
|
|
metadata:
|
|
name: my-app-db
|
|
namespace: my-app
|
|
spec:
|
|
instances: 3
|
|
|
|
postgresql:
|
|
parameters:
|
|
max_connections: "200"
|
|
shared_buffers: "256MB"
|
|
|
|
storage:
|
|
size: 100Gi
|
|
storageClass: ceph-block
|
|
|
|
monitoring:
|
|
enablePodMonitor: true
|
|
|
|
backup:
|
|
retentionPolicy: "30d"
|
|
barmanObjectStore:
|
|
destinationPath: s3://my-app-backups/
|
|
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
s3Credentials:
|
|
accessKeyId:
|
|
name: backup-credentials
|
|
key: ACCESS_KEY_ID
|
|
secretAccessKey:
|
|
name: backup-credentials
|
|
key: ACCESS_SECRET_KEY
|
|
data:
|
|
compression: bzip2
|
|
wal:
|
|
compression: bzip2
|
|
```
|
|
|
|
## Backup Configuration
|
|
|
|
### Prerequisites
|
|
|
|
1. **Create S3 Bucket** (using Ceph Object Storage):
|
|
|
|
```yaml
|
|
apiVersion: objectbucket.io/v1alpha1
|
|
kind: ObjectBucketClaim
|
|
metadata:
|
|
name: postgres-backups
|
|
namespace: cnpg-system
|
|
spec:
|
|
bucketName: postgres-backups
|
|
storageClassName: ceph-bucket
|
|
additionalConfig:
|
|
maxSize: "500Gi"
|
|
```
|
|
|
|
2. **Create Credentials Secret**:
|
|
|
|
After creating the ObjectBucketClaim, Rook will generate credentials:
|
|
|
|
```bash
|
|
# Get the generated credentials
|
|
kubectl get secret postgres-backups -n cnpg-system -o yaml
|
|
|
|
# The secret will contain:
|
|
# - AWS_ACCESS_KEY_ID
|
|
# - AWS_SECRET_ACCESS_KEY
|
|
```
|
|
|
|
3. **Reference in Cluster Spec**:
|
|
|
|
Use these credentials in your PostgreSQL cluster backup configuration (see example above).
|
|
|
|
## Database Access
|
|
|
|
### Connect to Primary (Read/Write)
|
|
|
|
```bash
|
|
# Service name pattern: <cluster-name>-rw
|
|
kubectl port-forward -n cnpg-system svc/my-postgres-rw 5432:5432
|
|
|
|
# Connect with psql
|
|
psql -h localhost -U postgres -d postgres
|
|
```
|
|
|
|
### Connect to Replica (Read-Only)
|
|
|
|
```bash
|
|
# Service name pattern: <cluster-name>-ro
|
|
kubectl port-forward -n cnpg-system svc/my-postgres-ro 5432:5432
|
|
```
|
|
|
|
### Get Superuser Password
|
|
|
|
```bash
|
|
# Password stored in secret: <cluster-name>-superuser
|
|
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d
|
|
```
|
|
|
|
### Create Application User
|
|
|
|
```bash
|
|
# Connect to database
|
|
kubectl exec -it -n cnpg-system my-postgres-1 -- psql
|
|
|
|
-- Create database and user
|
|
CREATE DATABASE myapp;
|
|
CREATE USER myapp_user WITH PASSWORD 'secure-password';
|
|
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;
|
|
```
|
|
|
|
## Connection from Applications
|
|
|
|
### Create Secret for Application
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: postgres-credentials
|
|
namespace: my-app
|
|
type: Opaque
|
|
stringData:
|
|
username: myapp_user
|
|
password: secure-password
|
|
database: myapp
|
|
host: my-postgres-rw.cnpg-system.svc
|
|
port: "5432"
|
|
```
|
|
|
|
### Use in Application
|
|
|
|
```yaml
|
|
env:
|
|
- name: DATABASE_URL
|
|
value: "postgresql://$(POSTGRES_USER):$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DATABASE)"
|
|
- name: POSTGRES_USER
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: username
|
|
- name: POSTGRES_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: password
|
|
- name: POSTGRES_HOST
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: host
|
|
- name: POSTGRES_PORT
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: port
|
|
- name: POSTGRES_DATABASE
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgres-credentials
|
|
key: database
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Prometheus Metrics
|
|
|
|
CloudNativePG exposes metrics via PodMonitor. Check Prometheus for:
|
|
- `cnpg_pg_stat_database_*` - Database statistics
|
|
- `cnpg_pg_replication_*` - Replication lag
|
|
- `cnpg_backends_*` - Connection pool stats
|
|
|
|
### Check Cluster Status
|
|
|
|
```bash
|
|
# Get cluster status
|
|
kubectl get cluster -n cnpg-system
|
|
|
|
# Detailed cluster info
|
|
kubectl describe cluster my-postgres -n cnpg-system
|
|
|
|
# Check pods
|
|
kubectl get pods -n cnpg-system -l cnpg.io/cluster=my-postgres
|
|
```
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# Operator logs
|
|
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=100
|
|
|
|
# PostgreSQL logs
|
|
kubectl logs -n cnpg-system my-postgres-1 --tail=100
|
|
```
|
|
|
|
## Backup and Recovery
|
|
|
|
### Manual Backup
|
|
|
|
```bash
|
|
kubectl cnpg backup my-postgres -n cnpg-system
|
|
```
|
|
|
|
### List Backups
|
|
|
|
```bash
|
|
kubectl get backup -n cnpg-system
|
|
```
|
|
|
|
### Point-in-Time Recovery
|
|
|
|
Create a new cluster from a backup:
|
|
|
|
```yaml
|
|
apiVersion: postgresql.cnpg.io/v1
|
|
kind: Cluster
|
|
metadata:
|
|
name: restored-cluster
|
|
spec:
|
|
instances: 2
|
|
|
|
bootstrap:
|
|
recovery:
|
|
source: my-postgres
|
|
recoveryTarget:
|
|
targetTime: "2024-11-09 10:00:00"
|
|
|
|
externalClusters:
|
|
- name: my-postgres
|
|
barmanObjectStore:
|
|
destinationPath: s3://postgres-backups/
|
|
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
s3Credentials:
|
|
accessKeyId:
|
|
name: backup-credentials
|
|
key: ACCESS_KEY_ID
|
|
secretAccessKey:
|
|
name: backup-credentials
|
|
key: ACCESS_SECRET_KEY
|
|
```
|
|
|
|
## Maintenance Operations
|
|
|
|
### Scale Replicas
|
|
|
|
```bash
|
|
kubectl cnpg scale my-postgres --replicas=3 -n cnpg-system
|
|
```
|
|
|
|
### Switchover (Promote Replica)
|
|
|
|
```bash
|
|
kubectl cnpg promote my-postgres-2 -n cnpg-system
|
|
```
|
|
|
|
### Restart Cluster
|
|
|
|
```bash
|
|
kubectl cnpg restart my-postgres -n cnpg-system
|
|
```
|
|
|
|
## Production Recommendations
|
|
|
|
### 1. High Availability
|
|
- Use at least 3 instances (1 primary + 2 replicas)
|
|
- Spread across availability zones using pod anti-affinity
|
|
- Configure automatic failover thresholds
|
|
|
|
### 2. Resource Configuration
|
|
- **Small databases** (<10GB): 2 CPU, 4Gi RAM, 2 instances
|
|
- **Medium databases** (10-100GB): 4 CPU, 8Gi RAM, 3 instances
|
|
- **Large databases** (>100GB): 8 CPU, 16Gi RAM, 3+ instances
|
|
|
|
### 3. PostgreSQL Tuning
|
|
- Adjust `shared_buffers` to 25% of RAM
|
|
- Set `effective_cache_size` to 50-75% of RAM
|
|
- Tune connection limits based on application needs
|
|
- Use `random_page_cost: 1.1` for SSD storage
|
|
|
|
### 4. Backup Strategy
|
|
- Enable automated backups to S3
|
|
- Set retention policy (e.g., 30 days)
|
|
- Test recovery procedures regularly
|
|
- Monitor backup success
|
|
|
|
### 5. Monitoring
|
|
- Set up alerts for replication lag
|
|
- Monitor connection pool saturation
|
|
- Track query performance
|
|
- Watch disk space usage
|
|
|
|
### 6. Security
|
|
- Use strong passwords (stored in Kubernetes secrets)
|
|
- Enable SSL/TLS for connections
|
|
- Implement network policies
|
|
- Regular security updates
|
|
|
|
## Troubleshooting
|
|
|
|
### Cluster Not Starting
|
|
|
|
```bash
|
|
# Check operator logs
|
|
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg
|
|
|
|
# Check events
|
|
kubectl get events -n cnpg-system --sort-by='.lastTimestamp'
|
|
|
|
# Check PVC status
|
|
kubectl get pvc -n cnpg-system
|
|
```
|
|
|
|
### Replication Issues
|
|
|
|
```bash
|
|
# Check replication status
|
|
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT * FROM pg_stat_replication;"
|
|
|
|
# Check cluster status
|
|
kubectl get cluster my-postgres -n cnpg-system -o yaml
|
|
```
|
|
|
|
### Backup Failures
|
|
|
|
```bash
|
|
# Check backup status
|
|
kubectl get backup -n cnpg-system
|
|
|
|
# View backup logs
|
|
kubectl logs -n cnpg-system my-postgres-1 | grep -i backup
|
|
|
|
# Test S3 connectivity
|
|
kubectl exec -it -n cnpg-system my-postgres-1 -- curl -I http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
```
|
|
|
|
### High Resource Usage
|
|
|
|
```bash
|
|
# Check resource usage
|
|
kubectl top pods -n cnpg-system
|
|
|
|
# Check PostgreSQL connections
|
|
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT count(*) FROM pg_stat_activity;"
|
|
|
|
# Identify slow queries
|
|
kubectl exec -it -n cnpg-system my-postgres-1 -- psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;"
|
|
```
|
|
|
|
## Useful Commands
|
|
|
|
```bash
|
|
# Install kubectl-cnpg plugin
|
|
curl -sSfL https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | sh -s -- -b /usr/local/bin
|
|
|
|
# List all clusters
|
|
kubectl cnpg status -n cnpg-system
|
|
|
|
# Get cluster details
|
|
kubectl cnpg status my-postgres -n cnpg-system
|
|
|
|
# Promote a replica
|
|
kubectl cnpg promote my-postgres-2 -n cnpg-system
|
|
|
|
# Create backup
|
|
kubectl cnpg backup my-postgres -n cnpg-system
|
|
|
|
# Reload configuration
|
|
kubectl cnpg reload my-postgres -n cnpg-system
|
|
|
|
# Get connection info
|
|
kubectl get secret my-postgres-superuser -n cnpg-system -o jsonpath='{.data.password}' | base64 -d && echo
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. Deploy the operator: `git add . && git commit -m "Add CloudNativePG" && git push`
|
|
2. Wait for ArgoCD to sync
|
|
3. Create your first cluster (uncomment extraObjects or create separate app)
|
|
4. Set up S3 backups with Ceph Object Storage
|
|
5. Test backup and recovery
|
|
6. Configure monitoring and alerts
|
|
|
|
## Additional Resources
|
|
|
|
- [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/)
|
|
- [API Reference](https://cloudnative-pg.io/documentation/current/api_reference/)
|
|
- [Best Practices](https://cloudnative-pg.io/documentation/current/guidelines/)
|
|
- [Monitoring Guide](https://cloudnative-pg.io/documentation/current/monitoring/)
|