Compare commits
No commits in common. "7040788d012bb441784df0d56b8510828f449142" and "81b69bc8e3f91676f975ec6bebdc746f2d59a70f" have entirely different histories.
7040788d01
...
81b69bc8e3
@ -1,132 +0,0 @@
|
|||||||
# Mirroring CloudNativePG Barman Plugin
|
|
||||||
|
|
||||||
## Setup Mirror Repository
|
|
||||||
|
|
||||||
1. **Clone the upstream repository:**
|
|
||||||
```bash
|
|
||||||
cd /tmp
|
|
||||||
git clone --mirror https://github.com/cloudnative-pg/plugin-barman-cloud.git
|
|
||||||
cd plugin-barman-cloud.git
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Push to your Git server:**
|
|
||||||
```bash
|
|
||||||
# Create repo on your Git server first (git.mvzijl.nl)
|
|
||||||
# Then push:
|
|
||||||
git push --mirror https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Set up periodic sync (optional):**
|
|
||||||
```bash
|
|
||||||
# Create a script to sync weekly
|
|
||||||
cat > /usr/local/bin/sync-barman-plugin.sh <<'EOF'
|
|
||||||
#!/bin/bash
|
|
||||||
cd /var/git/mirrors/plugin-barman-cloud.git
|
|
||||||
git fetch --prune origin
|
|
||||||
git push --mirror https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /usr/local/bin/sync-barman-plugin.sh
|
|
||||||
|
|
||||||
# Add to cron (weekly on Sunday at 2 AM)
|
|
||||||
echo "0 2 * * 0 /usr/local/bin/sync-barman-plugin.sh" | crontab -
|
|
||||||
```
|
|
||||||
|
|
||||||
## Update Application Reference
|
|
||||||
|
|
||||||
After mirroring, update the application.yaml to use your mirror:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
spec:
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
targetRevision: main # or specific tag like v1.0.0
|
|
||||||
path: deployments/manifests
|
|
||||||
```
|
|
||||||
|
|
||||||
## Version Pinning Strategy
|
|
||||||
|
|
||||||
Instead of tracking `main`, pin to specific releases:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
spec:
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
targetRevision: v1.0.0 # Pin to specific version
|
|
||||||
path: deployments/manifests
|
|
||||||
```
|
|
||||||
|
|
||||||
This gives you:
|
|
||||||
- ✅ Predictable deployments
|
|
||||||
- ✅ Controlled updates
|
|
||||||
- ✅ Rollback capability
|
|
||||||
|
|
||||||
## Update Process
|
|
||||||
|
|
||||||
When a new version is released:
|
|
||||||
|
|
||||||
1. **Check upstream for updates:**
|
|
||||||
```bash
|
|
||||||
cd /var/git/mirrors/plugin-barman-cloud.git
|
|
||||||
git fetch origin
|
|
||||||
git tag -l
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Review changes:**
|
|
||||||
```bash
|
|
||||||
git log HEAD..origin/main --oneline
|
|
||||||
git diff HEAD..origin/main deployments/manifests/
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Sync to your mirror:**
|
|
||||||
```bash
|
|
||||||
git push --mirror https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Update application.yaml:**
|
|
||||||
```yaml
|
|
||||||
targetRevision: v1.1.0 # Update to new version
|
|
||||||
```
|
|
||||||
|
|
||||||
5. **Test and deploy:**
|
|
||||||
```bash
|
|
||||||
git add apps/cloudnative-pg-plugin/application.yaml
|
|
||||||
git commit -m "Update barman plugin to v1.1.0"
|
|
||||||
git push
|
|
||||||
```
|
|
||||||
|
|
||||||
## Monitoring Upstream
|
|
||||||
|
|
||||||
Subscribe to releases:
|
|
||||||
- GitHub: Watch → Custom → Releases only
|
|
||||||
- RSS: `https://github.com/cloudnative-pg/plugin-barman-cloud/releases.atom`
|
|
||||||
|
|
||||||
## Alternative: Subtree Approach
|
|
||||||
|
|
||||||
Instead of mirroring, you could use git subtree:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd /Users/marco/Documents/Hobby/Veda/talos
|
|
||||||
git subtree add --prefix vendor/plugin-barman-cloud \
|
|
||||||
https://github.com/cloudnative-pg/plugin-barman-cloud.git main --squash
|
|
||||||
|
|
||||||
# Then reference in application:
|
|
||||||
# path: vendor/plugin-barman-cloud/deployments/manifests
|
|
||||||
```
|
|
||||||
|
|
||||||
Update when needed:
|
|
||||||
```bash
|
|
||||||
git subtree pull --prefix vendor/plugin-barman-cloud \
|
|
||||||
https://github.com/cloudnative-pg/plugin-barman-cloud.git main --squash
|
|
||||||
```
|
|
||||||
|
|
||||||
## Recommended Approach
|
|
||||||
|
|
||||||
For your setup, I recommend:
|
|
||||||
|
|
||||||
1. **Mirror to your Git server** at `git.mvzijl.nl/marco/plugin-barman-cloud`
|
|
||||||
2. **Pin to specific versions** (not `main`)
|
|
||||||
3. **Review updates** before applying
|
|
||||||
4. **Set up monitoring** for new releases
|
|
||||||
|
|
||||||
This gives you the best balance of control and maintainability.
|
|
||||||
@ -1,301 +0,0 @@
|
|||||||
# CloudNativePG Barman-Cloud Plugin
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The Barman Cloud Plugin provides object storage backup capabilities for CloudNativePG using the Barman toolset.
|
|
||||||
|
|
||||||
**Important**: As of CloudNativePG v1.26+, the native `barmanObjectStore` backup method is **deprecated**. You should use this plugin instead.
|
|
||||||
|
|
||||||
## Why This Plugin is Required
|
|
||||||
|
|
||||||
From the CloudNativePG 1.27 documentation:
|
|
||||||
|
|
||||||
> Starting with version 1.26, native backup and recovery capabilities are being progressively phased out of the core operator and moved to official CNPG-I plugins.
|
|
||||||
|
|
||||||
The built-in barman integration (`method: barmanObjectStore`) is deprecated and will be removed in future versions. This plugin provides the official replacement.
|
|
||||||
|
|
||||||
## What This Plugin Provides
|
|
||||||
|
|
||||||
- ✅ **WAL archiving** to S3-compatible object stores
|
|
||||||
- ✅ **Base backups** with compression and encryption
|
|
||||||
- ✅ **Point-in-time recovery (PITR)**
|
|
||||||
- ✅ **Retention policies** for automated cleanup
|
|
||||||
- ✅ **Backup from standby** servers
|
|
||||||
- ✅ **Support for multiple storage backends**: S3, Azure Blob, GCS, MinIO, Ceph S3 (RGW)
|
|
||||||
|
|
||||||
## Installation
|
|
||||||
|
|
||||||
This application deploys the plugin to the `cnpg-system` namespace where the CloudNativePG operator runs.
|
|
||||||
|
|
||||||
The plugin will be available for all PostgreSQL clusters managed by CloudNativePG.
|
|
||||||
|
|
||||||
## Configuration in PostgreSQL Clusters
|
|
||||||
|
|
||||||
### Using the Plugin (New Method)
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: Cluster
|
|
||||||
metadata:
|
|
||||||
name: my-cluster
|
|
||||||
spec:
|
|
||||||
backup:
|
|
||||||
target: prefer-standby
|
|
||||||
|
|
||||||
# Use the plugin method (required for v1.26+)
|
|
||||||
method: plugin
|
|
||||||
|
|
||||||
# Plugin configuration
|
|
||||||
pluginConfiguration:
|
|
||||||
name: barman-cloud.cloudnative-pg.io
|
|
||||||
|
|
||||||
# S3 configuration
|
|
||||||
barmanObjectStore:
|
|
||||||
destinationPath: s3://postgres-backups/
|
|
||||||
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
||||||
|
|
||||||
# Credentials
|
|
||||||
s3Credentials:
|
|
||||||
accessKeyId:
|
|
||||||
name: backup-credentials
|
|
||||||
key: ACCESS_KEY_ID
|
|
||||||
secretAccessKey:
|
|
||||||
name: backup-credentials
|
|
||||||
key: ACCESS_SECRET_KEY
|
|
||||||
|
|
||||||
# Compression and parallelism
|
|
||||||
data:
|
|
||||||
compression: bzip2
|
|
||||||
jobs: 2
|
|
||||||
immediateCheckpoint: true
|
|
||||||
|
|
||||||
wal:
|
|
||||||
compression: bzip2
|
|
||||||
maxParallel: 2
|
|
||||||
|
|
||||||
# Retention policy
|
|
||||||
retentionPolicy: "30d"
|
|
||||||
|
|
||||||
# Tags for organization
|
|
||||||
tags:
|
|
||||||
environment: "production"
|
|
||||||
cluster: "my-cluster"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Old Method (Deprecated)
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# ❌ DON'T USE - This is deprecated
|
|
||||||
spec:
|
|
||||||
backup:
|
|
||||||
method: barmanObjectStore # Deprecated!
|
|
||||||
barmanObjectStore:
|
|
||||||
# ... config
|
|
||||||
```
|
|
||||||
|
|
||||||
## WAL Archiving
|
|
||||||
|
|
||||||
The plugin also handles WAL archiving. Configure it at the cluster level:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: Cluster
|
|
||||||
metadata:
|
|
||||||
name: my-cluster
|
|
||||||
spec:
|
|
||||||
backup:
|
|
||||||
# Backup configuration (as above)
|
|
||||||
...
|
|
||||||
|
|
||||||
# WAL archiving uses the same plugin configuration
|
|
||||||
# Automatically enabled when backup is configured
|
|
||||||
```
|
|
||||||
|
|
||||||
## Scheduled Backups
|
|
||||||
|
|
||||||
Create scheduled backups using the plugin:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: ScheduledBackup
|
|
||||||
metadata:
|
|
||||||
name: daily-backup
|
|
||||||
spec:
|
|
||||||
schedule: "0 0 2 * * *" # 2 AM daily
|
|
||||||
backupOwnerReference: self
|
|
||||||
cluster:
|
|
||||||
name: my-cluster
|
|
||||||
|
|
||||||
# Use plugin method
|
|
||||||
method: plugin
|
|
||||||
|
|
||||||
# Plugin configuration (or inherits from cluster)
|
|
||||||
pluginConfiguration:
|
|
||||||
name: barman-cloud.cloudnative-pg.io
|
|
||||||
```
|
|
||||||
|
|
||||||
## On-Demand Backups
|
|
||||||
|
|
||||||
Trigger manual backups:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: Backup
|
|
||||||
metadata:
|
|
||||||
name: manual-backup
|
|
||||||
spec:
|
|
||||||
cluster:
|
|
||||||
name: my-cluster
|
|
||||||
|
|
||||||
method: plugin
|
|
||||||
|
|
||||||
pluginConfiguration:
|
|
||||||
name: barman-cloud.cloudnative-pg.io
|
|
||||||
```
|
|
||||||
|
|
||||||
Or use kubectl:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl cnpg backup my-cluster --method plugin
|
|
||||||
```
|
|
||||||
|
|
||||||
## Retention Policies
|
|
||||||
|
|
||||||
The plugin supports advanced retention policies:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
pluginConfiguration:
|
|
||||||
barmanObjectStore:
|
|
||||||
retentionPolicy: "30d" # Keep backups for 30 days
|
|
||||||
# or
|
|
||||||
# retentionPolicy: "7 days"
|
|
||||||
# retentionPolicy: "4 weeks"
|
|
||||||
# retentionPolicy: "3 months"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Supported Storage Backends
|
|
||||||
|
|
||||||
### AWS S3
|
|
||||||
```yaml
|
|
||||||
destinationPath: s3://bucket-name/
|
|
||||||
# endpointURL not needed for AWS S3
|
|
||||||
```
|
|
||||||
|
|
||||||
### Ceph S3 (RGW) - Your Setup
|
|
||||||
```yaml
|
|
||||||
destinationPath: s3://postgres-backups/
|
|
||||||
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
||||||
```
|
|
||||||
|
|
||||||
### Azure Blob Storage
|
|
||||||
```yaml
|
|
||||||
destinationPath: https://storageaccount.blob.core.windows.net/container/
|
|
||||||
```
|
|
||||||
|
|
||||||
### Google Cloud Storage
|
|
||||||
```yaml
|
|
||||||
destinationPath: gs://bucket-name/
|
|
||||||
```
|
|
||||||
|
|
||||||
### MinIO
|
|
||||||
```yaml
|
|
||||||
destinationPath: s3://bucket-name/
|
|
||||||
endpointURL: http://minio:9000
|
|
||||||
```
|
|
||||||
|
|
||||||
## Verification
|
|
||||||
|
|
||||||
After deploying, verify the plugin is running:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check plugin deployment
|
|
||||||
kubectl get deployment -n cnpg-system | grep plugin
|
|
||||||
|
|
||||||
# Check plugin pods
|
|
||||||
kubectl get pods -n cnpg-system -l app=barman-cloud-plugin
|
|
||||||
|
|
||||||
# Verify plugin is registered
|
|
||||||
kubectl get configmap -n cnpg-system cnpg-plugin-registry -o yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Plugin Not Found
|
|
||||||
|
|
||||||
If you see errors like "plugin not found":
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check if plugin is deployed
|
|
||||||
kubectl get pods -n cnpg-system -l app=barman-cloud-plugin
|
|
||||||
|
|
||||||
# Check operator logs
|
|
||||||
kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg
|
|
||||||
```
|
|
||||||
|
|
||||||
### Backup Failures
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check backup status
|
|
||||||
kubectl get backup -n <namespace>
|
|
||||||
|
|
||||||
# Check backup logs
|
|
||||||
kubectl describe backup <backup-name> -n <namespace>
|
|
||||||
|
|
||||||
# Check PostgreSQL pod logs
|
|
||||||
kubectl logs -n <namespace> <postgres-pod> | grep -i backup
|
|
||||||
```
|
|
||||||
|
|
||||||
### WAL Archiving Issues
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check WAL archive status
|
|
||||||
kubectl exec -it -n <namespace> <postgres-pod> -- \
|
|
||||||
psql -c "SELECT * FROM pg_stat_archiver;"
|
|
||||||
|
|
||||||
# Check plugin logs
|
|
||||||
kubectl logs -n cnpg-system -l app=barman-cloud-plugin
|
|
||||||
```
|
|
||||||
|
|
||||||
## Migration from Built-in to Plugin
|
|
||||||
|
|
||||||
If you're migrating from the deprecated `barmanObjectStore` method:
|
|
||||||
|
|
||||||
1. **Deploy this plugin application**
|
|
||||||
2. **Update your Cluster resource**:
|
|
||||||
```yaml
|
|
||||||
spec:
|
|
||||||
backup:
|
|
||||||
method: plugin # Change from barmanObjectStore
|
|
||||||
pluginConfiguration:
|
|
||||||
name: barman-cloud.cloudnative-pg.io
|
|
||||||
barmanObjectStore:
|
|
||||||
# Keep same configuration
|
|
||||||
```
|
|
||||||
3. **Existing backups remain accessible** - the plugin can read backups created by the built-in method
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
1. ✅ **Always use the plugin** for CloudNativePG v1.26+
|
|
||||||
2. ✅ **Configure retention policies** to manage storage costs
|
|
||||||
3. ✅ **Enable backup from standby** to reduce primary load
|
|
||||||
4. ✅ **Use compression** (bzip2) to reduce storage usage
|
|
||||||
5. ✅ **Set up scheduled backups** for automated protection
|
|
||||||
6. ✅ **Test recovery procedures** regularly
|
|
||||||
7. ✅ **Monitor backup status** with Prometheus metrics
|
|
||||||
8. ✅ **Tag backups** for easy identification and filtering
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. Deploy this application: `git add . && git commit && git push`
|
|
||||||
2. Wait for ArgoCD to sync
|
|
||||||
3. Update your PostgreSQL Cluster to use `method: plugin`
|
|
||||||
4. Create an S3 bucket for backups (ObjectBucketClaim)
|
|
||||||
5. Configure backup credentials
|
|
||||||
6. Test with an on-demand backup
|
|
||||||
|
|
||||||
## Additional Resources
|
|
||||||
|
|
||||||
- [Barman Cloud Plugin Documentation](https://cloudnative-pg.io/plugin-barman-cloud/)
|
|
||||||
- [CloudNativePG Backup Guide](https://cloudnative-pg.io/documentation/1.27/backup/)
|
|
||||||
- [CNPG-I Plugin Architecture](https://cloudnative-pg.io/documentation/1.27/cnpg_i/)
|
|
||||||
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: cloudnative-pg-plugin
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
argocd.argoproj.io/sync-wave: "0"
|
|
||||||
finalizers:
|
|
||||||
- resources-finalizer.argocd.argoproj.io
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/plugin-barman-cloud.git
|
|
||||||
targetRevision: 0.9.0
|
|
||||||
path: deployments/manifests
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: cnpg-system
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=false
|
|
||||||
- ServerSideApply=true
|
|
||||||
# Ensure operator is healthy before deploying plugin
|
|
||||||
retry:
|
|
||||||
limit: 5
|
|
||||||
backoff:
|
|
||||||
duration: 5s
|
|
||||||
factor: 2
|
|
||||||
maxDuration: 3m
|
|
||||||
@ -47,13 +47,98 @@ CloudNativePG is a Kubernetes operator that manages PostgreSQL clusters using Ku
|
|||||||
|
|
||||||
### Example Cluster (Commented Out)
|
### Example Cluster (Commented Out)
|
||||||
|
|
||||||
The `values.yaml` includes a commented example cluster configuration. See "Creating Your First Cluster" below.
|
The `values.yaml` includes a commented example cluster configuration with:
|
||||||
|
- **Storage**: `local-path` StorageClass (for development)
|
||||||
|
- **Backup**: Barman-cloud plugin with S3 (Ceph RGW) backend
|
||||||
|
- **Note**: See "Storage Considerations" section below
|
||||||
|
|
||||||
|
## ⚠️ Storage Considerations
|
||||||
|
|
||||||
|
### Local Path vs Ceph Block
|
||||||
|
|
||||||
|
The example cluster uses `local-path` StorageClass, which is suitable for:
|
||||||
|
- ✅ **Development/Testing**: Quick setup, no Ceph dependency
|
||||||
|
- ✅ **Single-node scenarios**: When HA isn't required
|
||||||
|
- ✅ **Learning/Experimentation**: Testing PostgreSQL features
|
||||||
|
|
||||||
|
**For production use, change to `ceph-block`:**
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
storage:
|
||||||
|
storageClass: ceph-block # Instead of local-path
|
||||||
|
size: 50Gi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why Ceph Block for Production?
|
||||||
|
|
||||||
|
| Feature | local-path | ceph-block |
|
||||||
|
|---------|-----------|------------|
|
||||||
|
| **High Availability** | ❌ No | ✅ Yes |
|
||||||
|
| **Data Replication** | ❌ No | ✅ 2x copies |
|
||||||
|
| **Pod Mobility** | ❌ Pinned to node | ✅ Can move |
|
||||||
|
| **Snapshots** | ❌ No | ✅ Yes |
|
||||||
|
| **Auto Resize** | ❌ No | ✅ Yes |
|
||||||
|
| **Node Failure** | ❌ Data unavailable | ✅ Survives |
|
||||||
|
|
||||||
|
### Hybrid Approach (Recommended for Dev)
|
||||||
|
|
||||||
|
Even with local-path storage, the S3 backup provides safety:
|
||||||
|
- **Primary storage**: local-path (fast, simple)
|
||||||
|
- **Backups**: Ceph S3 (safe, replicated, off-node)
|
||||||
|
- **Recovery**: Restore from S3 if node fails
|
||||||
|
|
||||||
|
This gives you:
|
||||||
|
- ✅ Point-in-time recovery
|
||||||
|
- ✅ Off-node backup storage
|
||||||
|
- ✅ Disaster recovery capability
|
||||||
|
- ✅ Fast local performance
|
||||||
|
- ⚠️ But no automatic HA
|
||||||
|
|
||||||
|
## Barman-Cloud Backup Plugin
|
||||||
|
|
||||||
|
CloudNativePG uses the modern barman-cloud toolset for backups.
|
||||||
|
|
||||||
|
### Configuration Features:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backup:
|
||||||
|
barmanObjectStore:
|
||||||
|
# Parallel processing
|
||||||
|
data:
|
||||||
|
compression: bzip2
|
||||||
|
jobs: 2 # Parallel compression threads
|
||||||
|
wal:
|
||||||
|
compression: bzip2
|
||||||
|
maxParallel: 2 # Parallel WAL uploads
|
||||||
|
|
||||||
|
# Metadata tags
|
||||||
|
tags:
|
||||||
|
environment: "development"
|
||||||
|
managed-by: "cloudnative-pg"
|
||||||
|
|
||||||
|
# Backup lineage tracking
|
||||||
|
historyTags:
|
||||||
|
environment: "development"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Plugin Benefits:
|
||||||
|
- ✅ **Better S3 compatibility**: Works with all S3-compatible stores
|
||||||
|
- ✅ **Improved parallelism**: Faster backups for large databases
|
||||||
|
- ✅ **Enhanced error handling**: Better retry logic
|
||||||
|
- ✅ **Cloud-native design**: Optimized for object storage
|
||||||
|
- ✅ **Metadata tagging**: Better backup organization
|
||||||
|
|
||||||
|
### Backup Strategy:
|
||||||
|
1. **Continuous WAL archiving**: Real-time transaction logs to S3
|
||||||
|
2. **Scheduled full backups**: Complete database snapshots
|
||||||
|
3. **Point-in-time recovery**: Restore to any timestamp
|
||||||
|
4. **Retention policies**: Automatic cleanup of old backups
|
||||||
|
|
||||||
## Creating Your First Cluster
|
## Creating Your First Cluster
|
||||||
|
|
||||||
### Option 1: Using extraObjects in values.yaml
|
### Option 1: Using extraObjects in values.yaml (Development)
|
||||||
|
|
||||||
Uncomment the `extraObjects` section in `values.yaml` and customize:
|
Uncomment the `extraObjects` section in `values.yaml` for a development cluster:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
extraObjects:
|
extraObjects:
|
||||||
@ -64,14 +149,36 @@ extraObjects:
|
|||||||
namespace: cnpg-system
|
namespace: cnpg-system
|
||||||
spec:
|
spec:
|
||||||
instances: 2 # 1 primary + 1 replica
|
instances: 2 # 1 primary + 1 replica
|
||||||
|
|
||||||
|
# Development: local-path for fast local storage
|
||||||
storage:
|
storage:
|
||||||
size: 50Gi
|
size: 50Gi
|
||||||
storageClass: ceph-block
|
storageClass: local-path
|
||||||
|
|
||||||
|
# Backup to Ceph S3 for safety
|
||||||
|
backup:
|
||||||
|
retentionPolicy: "30d"
|
||||||
|
barmanObjectStore:
|
||||||
|
destinationPath: s3://postgres-backups/
|
||||||
|
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
||||||
|
s3Credentials:
|
||||||
|
accessKeyId:
|
||||||
|
name: postgres-backup-credentials
|
||||||
|
key: ACCESS_KEY_ID
|
||||||
|
secretAccessKey:
|
||||||
|
name: postgres-backup-credentials
|
||||||
|
key: ACCESS_SECRET_KEY
|
||||||
|
data:
|
||||||
|
compression: bzip2
|
||||||
|
jobs: 2
|
||||||
|
wal:
|
||||||
|
compression: bzip2
|
||||||
|
maxParallel: 2
|
||||||
```
|
```
|
||||||
|
|
||||||
### Option 2: Separate Application
|
### Option 2: Separate Application (Production)
|
||||||
|
|
||||||
For production, create a separate ArgoCD Application for each database cluster:
|
For production, create a separate ArgoCD Application with ceph-block storage:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir -p apps/databases/my-app-db
|
mkdir -p apps/databases/my-app-db
|
||||||
@ -92,6 +199,7 @@ spec:
|
|||||||
max_connections: "200"
|
max_connections: "200"
|
||||||
shared_buffers: "256MB"
|
shared_buffers: "256MB"
|
||||||
|
|
||||||
|
# Production: ceph-block for HA
|
||||||
storage:
|
storage:
|
||||||
size: 100Gi
|
size: 100Gi
|
||||||
storageClass: ceph-block
|
storageClass: ceph-block
|
||||||
@ -99,6 +207,7 @@ spec:
|
|||||||
monitoring:
|
monitoring:
|
||||||
enablePodMonitor: true
|
enablePodMonitor: true
|
||||||
|
|
||||||
|
# Barman-cloud backup configuration
|
||||||
backup:
|
backup:
|
||||||
retentionPolicy: "30d"
|
retentionPolicy: "30d"
|
||||||
barmanObjectStore:
|
barmanObjectStore:
|
||||||
@ -113,6 +222,13 @@ spec:
|
|||||||
key: ACCESS_SECRET_KEY
|
key: ACCESS_SECRET_KEY
|
||||||
data:
|
data:
|
||||||
compression: bzip2
|
compression: bzip2
|
||||||
|
jobs: 2 # Parallel compression
|
||||||
|
wal:
|
||||||
|
compression: bzip2
|
||||||
|
maxParallel: 2 # Parallel WAL uploads
|
||||||
|
tags:
|
||||||
|
environment: "production"
|
||||||
|
application: "my-app"
|
||||||
wal:
|
wal:
|
||||||
compression: bzip2
|
compression: bzip2
|
||||||
```
|
```
|
||||||
|
|||||||
@ -41,14 +41,26 @@ cloudnative-pg:
|
|||||||
# effective_io_concurrency: "300"
|
# effective_io_concurrency: "300"
|
||||||
# monitoring:
|
# monitoring:
|
||||||
# enablePodMonitor: true
|
# enablePodMonitor: true
|
||||||
|
#
|
||||||
|
# # Use local-path-provisioner for storage
|
||||||
# storage:
|
# storage:
|
||||||
# size: 50Gi
|
# size: 50Gi
|
||||||
# storageClass: ceph-block
|
# storageClass: local-path
|
||||||
|
#
|
||||||
|
# # Backup configuration using new plugin system
|
||||||
# backup:
|
# backup:
|
||||||
# retentionPolicy: "30d"
|
# retentionPolicy: "30d"
|
||||||
|
#
|
||||||
|
# # Volume for barman backups (uses same StorageClass as main storage)
|
||||||
|
# volumeSnapshot:
|
||||||
|
# className: local-path
|
||||||
|
#
|
||||||
|
# # S3 backup using barman-cloud plugin
|
||||||
# barmanObjectStore:
|
# barmanObjectStore:
|
||||||
# destinationPath: s3://postgres-backups/
|
# destinationPath: s3://postgres-backups/
|
||||||
# endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
# endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
||||||
|
#
|
||||||
|
# # S3 credentials reference
|
||||||
# s3Credentials:
|
# s3Credentials:
|
||||||
# accessKeyId:
|
# accessKeyId:
|
||||||
# name: postgres-backup-credentials
|
# name: postgres-backup-credentials
|
||||||
@ -56,7 +68,20 @@ cloudnative-pg:
|
|||||||
# secretAccessKey:
|
# secretAccessKey:
|
||||||
# name: postgres-backup-credentials
|
# name: postgres-backup-credentials
|
||||||
# key: ACCESS_SECRET_KEY
|
# key: ACCESS_SECRET_KEY
|
||||||
|
#
|
||||||
|
# # Compression settings
|
||||||
# data:
|
# data:
|
||||||
# compression: bzip2
|
# compression: bzip2
|
||||||
|
# jobs: 2
|
||||||
# wal:
|
# wal:
|
||||||
# compression: bzip2
|
# compression: bzip2
|
||||||
|
# maxParallel: 2
|
||||||
|
#
|
||||||
|
# # Tags for backup organization
|
||||||
|
# tags:
|
||||||
|
# environment: "development"
|
||||||
|
# managed-by: "cloudnative-pg"
|
||||||
|
#
|
||||||
|
# # Backup history and retention
|
||||||
|
# historyTags:
|
||||||
|
# environment: "development"
|
||||||
|
|||||||
@ -1,11 +0,0 @@
|
|||||||
apiVersion: v2
|
|
||||||
name: loki
|
|
||||||
description: Grafana Loki logging stack wrapper chart
|
|
||||||
type: application
|
|
||||||
version: 1.0.0
|
|
||||||
appVersion: "3.5.7"
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- name: loki
|
|
||||||
version: 6.46.0
|
|
||||||
repository: https://grafana.github.io/helm-charts
|
|
||||||
@ -1,30 +0,0 @@
|
|||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: loki
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
argocd.argoproj.io/sync-wave: "1"
|
|
||||||
finalizers:
|
|
||||||
- resources-finalizer.argocd.argoproj.io
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/veda.git
|
|
||||||
targetRevision: applicationset-rewrite
|
|
||||||
path: apps/logging/loki
|
|
||||||
helm:
|
|
||||||
releaseName: loki
|
|
||||||
valueFiles:
|
|
||||||
- values.yaml
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: logging
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=true
|
|
||||||
- ServerSideApply=true
|
|
||||||
- SkipDryRunOnMissingResource=true
|
|
||||||
@ -1,4 +0,0 @@
|
|||||||
{{- range .Values.extraObjects }}
|
|
||||||
---
|
|
||||||
{{ toYaml . }}
|
|
||||||
{{- end }}
|
|
||||||
@ -1,160 +0,0 @@
|
|||||||
loki:
|
|
||||||
# Single binary deployment mode
|
|
||||||
deploymentMode: SingleBinary
|
|
||||||
|
|
||||||
# Disable other deployment modes
|
|
||||||
backend:
|
|
||||||
replicas: 0
|
|
||||||
read:
|
|
||||||
replicas: 0
|
|
||||||
write:
|
|
||||||
replicas: 0
|
|
||||||
|
|
||||||
loki:
|
|
||||||
# Authentication
|
|
||||||
auth_enabled: false
|
|
||||||
|
|
||||||
# Common configuration
|
|
||||||
commonConfig:
|
|
||||||
replication_factor: 1
|
|
||||||
|
|
||||||
# Storage configuration
|
|
||||||
schemaConfig:
|
|
||||||
configs:
|
|
||||||
- from: "2024-01-01"
|
|
||||||
store: tsdb
|
|
||||||
object_store: s3
|
|
||||||
schema: v13
|
|
||||||
index:
|
|
||||||
prefix: loki_index_
|
|
||||||
period: 24h
|
|
||||||
|
|
||||||
# Storage backend configuration
|
|
||||||
storage:
|
|
||||||
type: s3
|
|
||||||
bucketNames:
|
|
||||||
chunks: loki-logs
|
|
||||||
ruler: loki-logs
|
|
||||||
admin: loki-logs
|
|
||||||
s3:
|
|
||||||
endpoint: rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
||||||
region: us-east-1
|
|
||||||
insecure: true
|
|
||||||
s3ForcePathStyle: true
|
|
||||||
accessKeyId: ${AWS_ACCESS_KEY_ID}
|
|
||||||
secretAccessKey: ${AWS_SECRET_ACCESS_KEY}
|
|
||||||
|
|
||||||
# Limits and retention
|
|
||||||
limits_config:
|
|
||||||
retention_period: 90d
|
|
||||||
ingestion_rate_mb: 10
|
|
||||||
ingestion_burst_size_mb: 20
|
|
||||||
max_query_series: 10000
|
|
||||||
max_query_parallelism: 32
|
|
||||||
reject_old_samples: true
|
|
||||||
reject_old_samples_max_age: 168h
|
|
||||||
|
|
||||||
# Compactor configuration for retention
|
|
||||||
compactor:
|
|
||||||
working_directory: /var/loki/compactor
|
|
||||||
compaction_interval: 10m
|
|
||||||
retention_enabled: true
|
|
||||||
retention_delete_delay: 2h
|
|
||||||
retention_delete_worker_count: 150
|
|
||||||
|
|
||||||
# Storage config
|
|
||||||
storage_config:
|
|
||||||
tsdb_shipper:
|
|
||||||
active_index_directory: /var/loki/tsdb-index
|
|
||||||
cache_location: /var/loki/tsdb-cache
|
|
||||||
shared_store: s3
|
|
||||||
|
|
||||||
# Hedging requests
|
|
||||||
hedging:
|
|
||||||
at: 250ms
|
|
||||||
max_per_second: 20
|
|
||||||
up_to: 3
|
|
||||||
|
|
||||||
# Query configuration
|
|
||||||
query_scheduler:
|
|
||||||
max_outstanding_requests_per_tenant: 2048
|
|
||||||
|
|
||||||
# Frontend configuration
|
|
||||||
frontend:
|
|
||||||
max_outstanding_per_tenant: 2048
|
|
||||||
|
|
||||||
# Single binary configuration
|
|
||||||
singleBinary:
|
|
||||||
replicas: 1
|
|
||||||
persistence:
|
|
||||||
enabled: true
|
|
||||||
storageClass: ceph-block
|
|
||||||
size: 10Gi
|
|
||||||
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 100m
|
|
||||||
memory: 256Mi
|
|
||||||
limits:
|
|
||||||
memory: 1Gi
|
|
||||||
|
|
||||||
extraEnv:
|
|
||||||
- name: AWS_ACCESS_KEY_ID
|
|
||||||
valueFrom:
|
|
||||||
secretKeyRef:
|
|
||||||
name: loki-objstore-secret
|
|
||||||
key: AWS_ACCESS_KEY_ID
|
|
||||||
- name: AWS_SECRET_ACCESS_KEY
|
|
||||||
valueFrom:
|
|
||||||
secretKeyRef:
|
|
||||||
name: loki-objstore-secret
|
|
||||||
key: AWS_SECRET_ACCESS_KEY
|
|
||||||
|
|
||||||
# Gateway
|
|
||||||
gateway:
|
|
||||||
enabled: true
|
|
||||||
replicas: 1
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 50m
|
|
||||||
memory: 128Mi
|
|
||||||
limits:
|
|
||||||
memory: 256Mi
|
|
||||||
|
|
||||||
# Monitoring
|
|
||||||
monitoring:
|
|
||||||
selfMonitoring:
|
|
||||||
enabled: true
|
|
||||||
grafanaAgent:
|
|
||||||
installOperator: false
|
|
||||||
serviceMonitor:
|
|
||||||
enabled: true
|
|
||||||
|
|
||||||
# Service configuration
|
|
||||||
service:
|
|
||||||
type: ClusterIP
|
|
||||||
|
|
||||||
# S3 Bucket and credentials provisioning
|
|
||||||
extraObjects:
|
|
||||||
# ObjectBucketClaim for Loki logs
|
|
||||||
- apiVersion: objectbucket.io/v1alpha1
|
|
||||||
kind: ObjectBucketClaim
|
|
||||||
metadata:
|
|
||||||
name: loki-logs
|
|
||||||
namespace: logging
|
|
||||||
spec:
|
|
||||||
bucketName: loki-logs
|
|
||||||
storageClassName: ceph-bucket
|
|
||||||
additionalConfig:
|
|
||||||
maxSize: "200Gi"
|
|
||||||
|
|
||||||
# Secret with S3 credentials (populated by Rook from OBC)
|
|
||||||
- apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: loki-objstore-secret
|
|
||||||
namespace: logging
|
|
||||||
type: Opaque
|
|
||||||
stringData:
|
|
||||||
AWS_ACCESS_KEY_ID: placeholder
|
|
||||||
AWS_SECRET_ACCESS_KEY: placeholder
|
|
||||||
@ -1,11 +0,0 @@
|
|||||||
apiVersion: v2
|
|
||||||
name: promtail
|
|
||||||
description: Promtail log collection agent wrapper chart
|
|
||||||
type: application
|
|
||||||
version: 1.0.0
|
|
||||||
appVersion: "3.3.2"
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- name: promtail
|
|
||||||
version: 6.17.1
|
|
||||||
repository: https://grafana.github.io/helm-charts
|
|
||||||
@ -1,29 +0,0 @@
|
|||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: promtail
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
argocd.argoproj.io/sync-wave: "3"
|
|
||||||
finalizers:
|
|
||||||
- resources-finalizer.argocd.argoproj.io
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/veda.git
|
|
||||||
targetRevision: applicationset-rewrite
|
|
||||||
path: apps/logging/promtail
|
|
||||||
helm:
|
|
||||||
releaseName: promtail
|
|
||||||
valueFiles:
|
|
||||||
- values.yaml
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: logging
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=false
|
|
||||||
- ServerSideApply=true
|
|
||||||
@ -1,163 +0,0 @@
|
|||||||
promtail:
|
|
||||||
# DaemonSet configuration
|
|
||||||
daemonset:
|
|
||||||
enabled: true
|
|
||||||
|
|
||||||
# Resources
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 50m
|
|
||||||
memory: 128Mi
|
|
||||||
limits:
|
|
||||||
memory: 256Mi
|
|
||||||
|
|
||||||
# Configuration
|
|
||||||
config:
|
|
||||||
# Loki endpoint
|
|
||||||
clients:
|
|
||||||
- url: http://loki-gateway.logging.svc.cluster.local/loki/api/v1/push
|
|
||||||
tenant_id: ""
|
|
||||||
batchwait: 1s
|
|
||||||
batchsize: 1048576
|
|
||||||
timeout: 10s
|
|
||||||
|
|
||||||
# Positions file (persisted)
|
|
||||||
positions:
|
|
||||||
filename: /run/promtail/positions.yaml
|
|
||||||
|
|
||||||
# Server config
|
|
||||||
server:
|
|
||||||
log_level: info
|
|
||||||
http_listen_port: 3101
|
|
||||||
|
|
||||||
# Scrape configs
|
|
||||||
scrape_configs:
|
|
||||||
# Kubernetes pods
|
|
||||||
- job_name: kubernetes-pods
|
|
||||||
pipeline_stages:
|
|
||||||
# Extract log level
|
|
||||||
- regex:
|
|
||||||
expression: '(?i)(?P<level>trace|debug|info|warn|warning|error|err|fatal|critical|panic)'
|
|
||||||
|
|
||||||
# Parse JSON logs
|
|
||||||
- json:
|
|
||||||
expressions:
|
|
||||||
level: level
|
|
||||||
timestamp: timestamp
|
|
||||||
message: message
|
|
||||||
|
|
||||||
# Drop high-cardinality labels
|
|
||||||
- labeldrop:
|
|
||||||
- pod_uid
|
|
||||||
- container_id
|
|
||||||
- image_id
|
|
||||||
- stream
|
|
||||||
|
|
||||||
# Add log level as label (only keep certain levels)
|
|
||||||
- labels:
|
|
||||||
level:
|
|
||||||
|
|
||||||
kubernetes_sd_configs:
|
|
||||||
- role: pod
|
|
||||||
|
|
||||||
relabel_configs:
|
|
||||||
# Only scrape running pods
|
|
||||||
- source_labels: [__meta_kubernetes_pod_phase]
|
|
||||||
action: keep
|
|
||||||
regex: Running
|
|
||||||
|
|
||||||
# Keep essential labels
|
|
||||||
- source_labels: [__meta_kubernetes_namespace]
|
|
||||||
target_label: namespace
|
|
||||||
|
|
||||||
- source_labels: [__meta_kubernetes_pod_name]
|
|
||||||
target_label: pod
|
|
||||||
|
|
||||||
- source_labels: [__meta_kubernetes_pod_label_app]
|
|
||||||
target_label: app
|
|
||||||
|
|
||||||
- source_labels: [__meta_kubernetes_pod_container_name]
|
|
||||||
target_label: container
|
|
||||||
|
|
||||||
- source_labels: [__meta_kubernetes_pod_node_name]
|
|
||||||
target_label: node
|
|
||||||
|
|
||||||
# Add cluster label
|
|
||||||
- replacement: homelab
|
|
||||||
target_label: cluster
|
|
||||||
|
|
||||||
# Drop pods in kube-system namespace (optional)
|
|
||||||
# - source_labels: [__meta_kubernetes_namespace]
|
|
||||||
# action: drop
|
|
||||||
# regex: kube-system
|
|
||||||
|
|
||||||
# Container log path
|
|
||||||
- source_labels: [__meta_kubernetes_pod_uid, __meta_kubernetes_pod_container_name]
|
|
||||||
target_label: __path__
|
|
||||||
separator: /
|
|
||||||
replacement: /var/log/pods/*$1/*.log
|
|
||||||
|
|
||||||
# Journald logs (systemd)
|
|
||||||
- job_name: systemd-journal
|
|
||||||
journal:
|
|
||||||
path: /var/log/journal
|
|
||||||
max_age: 12h
|
|
||||||
labels:
|
|
||||||
job: systemd-journal
|
|
||||||
cluster: homelab
|
|
||||||
|
|
||||||
pipeline_stages:
|
|
||||||
# Parse priority to log level
|
|
||||||
- match:
|
|
||||||
selector: '{job="systemd-journal"}'
|
|
||||||
stages:
|
|
||||||
- template:
|
|
||||||
source: level
|
|
||||||
template: '{{ if eq .PRIORITY "0" }}fatal{{ else if eq .PRIORITY "1" }}alert{{ else if eq .PRIORITY "2" }}crit{{ else if eq .PRIORITY "3" }}error{{ else if eq .PRIORITY "4" }}warning{{ else if eq .PRIORITY "5" }}notice{{ else if eq .PRIORITY "6" }}info{{ else }}debug{{ end }}'
|
|
||||||
|
|
||||||
- labels:
|
|
||||||
level:
|
|
||||||
|
|
||||||
relabel_configs:
|
|
||||||
- source_labels: [__journal__systemd_unit]
|
|
||||||
target_label: unit
|
|
||||||
|
|
||||||
- source_labels: [__journal__hostname]
|
|
||||||
target_label: node
|
|
||||||
|
|
||||||
- source_labels: [__journal_syslog_identifier]
|
|
||||||
target_label: syslog_identifier
|
|
||||||
|
|
||||||
# Volumes
|
|
||||||
extraVolumes:
|
|
||||||
- name: journal
|
|
||||||
hostPath:
|
|
||||||
path: /var/log/journal
|
|
||||||
|
|
||||||
- name: positions
|
|
||||||
hostPath:
|
|
||||||
path: /var/lib/promtail/positions
|
|
||||||
type: DirectoryOrCreate
|
|
||||||
|
|
||||||
extraVolumeMounts:
|
|
||||||
- name: journal
|
|
||||||
mountPath: /var/log/journal
|
|
||||||
readOnly: true
|
|
||||||
|
|
||||||
- name: positions
|
|
||||||
mountPath: /run/promtail
|
|
||||||
|
|
||||||
# Tolerations to run on all nodes
|
|
||||||
tolerations:
|
|
||||||
- effect: NoSchedule
|
|
||||||
operator: Exists
|
|
||||||
|
|
||||||
# Service Monitor
|
|
||||||
serviceMonitor:
|
|
||||||
enabled: true
|
|
||||||
|
|
||||||
# Update strategy
|
|
||||||
updateStrategy:
|
|
||||||
type: RollingUpdate
|
|
||||||
rollingUpdate:
|
|
||||||
maxUnavailable: 1
|
|
||||||
@ -1,11 +0,0 @@
|
|||||||
apiVersion: v2
|
|
||||||
name: grafana
|
|
||||||
description: Grafana visualization platform wrapper chart
|
|
||||||
type: application
|
|
||||||
version: 1.0.0
|
|
||||||
appVersion: "12.2.1"
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- name: grafana
|
|
||||||
version: 10.1.4
|
|
||||||
repository: https://grafana.github.io/helm-charts
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: grafana
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
argocd.argoproj.io/sync-wave: "2"
|
|
||||||
finalizers:
|
|
||||||
- resources-finalizer.argocd.argoproj.io
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/veda.git
|
|
||||||
targetRevision: applicationset-rewrite
|
|
||||||
path: apps/monitoring/grafana
|
|
||||||
helm:
|
|
||||||
releaseName: grafana
|
|
||||||
valueFiles:
|
|
||||||
- values.yaml
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: monitoring
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=false
|
|
||||||
- ServerSideApply=true
|
|
||||||
ignoreDifferences:
|
|
||||||
- group: gateway.networking.k8s.io
|
|
||||||
kind: HTTPRoute
|
|
||||||
jsonPointers:
|
|
||||||
- /spec/parentRefs/0/group
|
|
||||||
- /spec/parentRefs/0/kind
|
|
||||||
- /spec/rules/0/backendRefs/0/group
|
|
||||||
- /spec/rules/0/backendRefs/0/kind
|
|
||||||
- /spec/rules/0/backendRefs/0/weight
|
|
||||||
@ -1,223 +0,0 @@
|
|||||||
grafana:
|
|
||||||
|
|
||||||
adminUser: admin
|
|
||||||
adminPassword: changeme # TODO: Use secret management
|
|
||||||
|
|
||||||
# Disable local persistence - using PostgreSQL database
|
|
||||||
persistence:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 100m
|
|
||||||
memory: 256Mi
|
|
||||||
limits:
|
|
||||||
memory: 512Mi
|
|
||||||
|
|
||||||
extraSecretMounts:
|
|
||||||
- name: db-secret
|
|
||||||
secretName: grafana-pg-cluster-app
|
|
||||||
mountPath: /secrets/my-db
|
|
||||||
readOnly: true
|
|
||||||
|
|
||||||
datasources:
|
|
||||||
datasources.yaml:
|
|
||||||
apiVersion: 1
|
|
||||||
datasources:
|
|
||||||
- name: Prometheus
|
|
||||||
type: prometheus
|
|
||||||
access: proxy
|
|
||||||
url: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
|
|
||||||
isDefault: true
|
|
||||||
editable: false
|
|
||||||
jsonData:
|
|
||||||
timeInterval: 30s
|
|
||||||
queryTimeout: 60s
|
|
||||||
|
|
||||||
- name: Loki
|
|
||||||
type: loki
|
|
||||||
access: proxy
|
|
||||||
url: http://loki-gateway.logging.svc.cluster.local
|
|
||||||
editable: false
|
|
||||||
jsonData:
|
|
||||||
maxLines: 1000
|
|
||||||
derivedFields:
|
|
||||||
- datasourceUid: Prometheus
|
|
||||||
matcherRegex: "traceID=(\\w+)"
|
|
||||||
name: TraceID
|
|
||||||
url: "$${__value.raw}"
|
|
||||||
|
|
||||||
dashboardProviders:
|
|
||||||
dashboardproviders.yaml:
|
|
||||||
apiVersion: 1
|
|
||||||
providers:
|
|
||||||
- name: 'default'
|
|
||||||
orgId: 1
|
|
||||||
folder: ''
|
|
||||||
type: file
|
|
||||||
disableDeletion: false
|
|
||||||
editable: true
|
|
||||||
options:
|
|
||||||
path: /var/lib/grafana/dashboards/default
|
|
||||||
- name: 'kubernetes'
|
|
||||||
orgId: 1
|
|
||||||
folder: 'Kubernetes'
|
|
||||||
type: file
|
|
||||||
disableDeletion: false
|
|
||||||
editable: true
|
|
||||||
options:
|
|
||||||
path: /var/lib/grafana/dashboards/kubernetes
|
|
||||||
|
|
||||||
dashboards:
|
|
||||||
default:
|
|
||||||
node-exporter:
|
|
||||||
gnetId: 1860
|
|
||||||
revision: 37
|
|
||||||
datasource: Prometheus
|
|
||||||
|
|
||||||
k8s-cluster:
|
|
||||||
gnetId: 7249
|
|
||||||
revision: 1
|
|
||||||
datasource: Prometheus
|
|
||||||
|
|
||||||
kubernetes:
|
|
||||||
k8s-pods:
|
|
||||||
gnetId: 6417
|
|
||||||
revision: 1
|
|
||||||
datasource: Prometheus
|
|
||||||
|
|
||||||
loki-logs:
|
|
||||||
gnetId: 13639
|
|
||||||
revision: 2
|
|
||||||
datasource: Loki
|
|
||||||
|
|
||||||
grafana.ini:
|
|
||||||
server:
|
|
||||||
root_url: https://grafana.noxxos.nl
|
|
||||||
serve_from_sub_path: false
|
|
||||||
|
|
||||||
database:
|
|
||||||
type: postgres
|
|
||||||
host: "$__file{/secrets/my-db/host}:$__file{/secrets/my-db/port}"
|
|
||||||
name: "$__file{/secrets/my-db/dbname}"
|
|
||||||
user: "$__file{/secrets/my-db/user}"
|
|
||||||
password: "$__file{/secrets/my-db/password}"
|
|
||||||
|
|
||||||
auth.generic_oauth:
|
|
||||||
enabled: false # Enable after configuring secret
|
|
||||||
name: Authentik
|
|
||||||
client_id: grafana
|
|
||||||
# client_secret should be set via envValueFrom or existingSecret
|
|
||||||
scopes: openid profile email
|
|
||||||
auth_url: https://auth.noxxos.nl/application/o/authorize/
|
|
||||||
token_url: https://auth.noxxos.nl/application/o/token/
|
|
||||||
api_url: https://auth.noxxos.nl/application/o/userinfo/
|
|
||||||
role_attribute_path: contains(groups[*], 'Grafana Admins') && 'Admin' || contains(groups[*], 'Grafana Editors') && 'Editor' || 'Viewer'
|
|
||||||
allow_sign_up: true
|
|
||||||
|
|
||||||
analytics:
|
|
||||||
reporting_enabled: false
|
|
||||||
check_for_updates: false
|
|
||||||
|
|
||||||
log:
|
|
||||||
mode: console
|
|
||||||
level: info
|
|
||||||
|
|
||||||
users:
|
|
||||||
auto_assign_org: true
|
|
||||||
auto_assign_org_role: Viewer
|
|
||||||
|
|
||||||
serviceMonitor:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
plugins:
|
|
||||||
- grafana-piechart-panel
|
|
||||||
- grafana-clock-panel
|
|
||||||
|
|
||||||
route:
|
|
||||||
main:
|
|
||||||
enabled: true
|
|
||||||
hostnames:
|
|
||||||
- grafana.noxxos.nl
|
|
||||||
parentRefs:
|
|
||||||
- name: traefik-gateway
|
|
||||||
namespace: traefik
|
|
||||||
sectionName: websecure
|
|
||||||
|
|
||||||
extraObjects:
|
|
||||||
- apiVersion: postgresql.cnpg.io/v1
|
|
||||||
kind: Cluster
|
|
||||||
metadata:
|
|
||||||
name: grafana-pg-cluster
|
|
||||||
namespace: monitoring
|
|
||||||
spec:
|
|
||||||
instances: 2
|
|
||||||
postgresql:
|
|
||||||
parameters:
|
|
||||||
max_connections: "20"
|
|
||||||
shared_buffers: "25MB"
|
|
||||||
effective_cache_size: "75MB"
|
|
||||||
maintenance_work_mem: "6400kB"
|
|
||||||
checkpoint_completion_target: "0.9"
|
|
||||||
wal_buffers: "768kB"
|
|
||||||
default_statistics_target: "100"
|
|
||||||
random_page_cost: "1.1"
|
|
||||||
effective_io_concurrency: "300"
|
|
||||||
work_mem: "640kB"
|
|
||||||
huge_pages: "off"
|
|
||||||
max_wal_size: "128MB"
|
|
||||||
bootstrap:
|
|
||||||
initdb:
|
|
||||||
database: grafana
|
|
||||||
owner: grafana
|
|
||||||
storage:
|
|
||||||
size: 1Gi
|
|
||||||
storageClass: ceph-block
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 100m
|
|
||||||
memory: 100Mi
|
|
||||||
limits:
|
|
||||||
memory: 512Mi
|
|
||||||
backup:
|
|
||||||
method: plugin
|
|
||||||
pluginConfiguration:
|
|
||||||
name: barman-cloud.cloudnative-pg.io
|
|
||||||
retentionPolicy: "30d"
|
|
||||||
barmanObjectStore:
|
|
||||||
destinationPath: s3://postgresql-backups/grafana
|
|
||||||
endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc:80
|
|
||||||
s3Credentials:
|
|
||||||
accessKeyId:
|
|
||||||
name: grafana-pg-backup-creds
|
|
||||||
key: AWS_ACCESS_KEY_ID
|
|
||||||
secretAccessKey:
|
|
||||||
name: grafana-pg-backup-creds
|
|
||||||
key: AWS_SECRET_ACCESS_KEY
|
|
||||||
wal:
|
|
||||||
compression: bzip2
|
|
||||||
data:
|
|
||||||
compression: bzip2
|
|
||||||
scheduledBackups:
|
|
||||||
- name: daily-backup
|
|
||||||
schedule: "0 2 * * *" # 2 AM daily
|
|
||||||
backupOwnerReference: self
|
|
||||||
- apiVersion: objectbucket.io/v1alpha1
|
|
||||||
kind: ObjectBucketClaim
|
|
||||||
metadata:
|
|
||||||
name: grafana-pg-backups
|
|
||||||
namespace: monitoring
|
|
||||||
spec:
|
|
||||||
bucketName: postgresql-backups
|
|
||||||
storageClassName: ceph-bucket
|
|
||||||
additionalConfig:
|
|
||||||
maxSize: "50Gi"
|
|
||||||
- apiVersion: v1
|
|
||||||
kind: Secret
|
|
||||||
metadata:
|
|
||||||
name: grafana-pg-backup-creds
|
|
||||||
namespace: monitoring
|
|
||||||
type: Opaque
|
|
||||||
stringData:
|
|
||||||
AWS_ACCESS_KEY_ID: placeholder
|
|
||||||
AWS_SECRET_ACCESS_KEY: placeholder
|
|
||||||
@ -1,11 +0,0 @@
|
|||||||
apiVersion: v2
|
|
||||||
name: prometheus
|
|
||||||
description: Prometheus monitoring stack with Thanos sidecar wrapper chart
|
|
||||||
type: application
|
|
||||||
version: 1.0.0
|
|
||||||
appVersion: "0.86.2"
|
|
||||||
|
|
||||||
dependencies:
|
|
||||||
- name: kube-prometheus-stack
|
|
||||||
version: 79.4.1
|
|
||||||
repository: oci://ghcr.io/prometheus-community/charts
|
|
||||||
@ -1,30 +0,0 @@
|
|||||||
apiVersion: argoproj.io/v1alpha1
|
|
||||||
kind: Application
|
|
||||||
metadata:
|
|
||||||
name: prometheus
|
|
||||||
namespace: argocd
|
|
||||||
annotations:
|
|
||||||
argocd.argoproj.io/sync-wave: "2"
|
|
||||||
finalizers:
|
|
||||||
- resources-finalizer.argocd.argoproj.io
|
|
||||||
spec:
|
|
||||||
project: default
|
|
||||||
source:
|
|
||||||
repoURL: https://git.mvzijl.nl/marco/veda.git
|
|
||||||
targetRevision: applicationset-rewrite
|
|
||||||
path: apps/monitoring/prometheus
|
|
||||||
helm:
|
|
||||||
releaseName: prometheus
|
|
||||||
valueFiles:
|
|
||||||
- values.yaml
|
|
||||||
destination:
|
|
||||||
server: https://kubernetes.default.svc
|
|
||||||
namespace: monitoring
|
|
||||||
syncPolicy:
|
|
||||||
automated:
|
|
||||||
prune: true
|
|
||||||
selfHeal: true
|
|
||||||
syncOptions:
|
|
||||||
- CreateNamespace=false
|
|
||||||
- ServerSideApply=true
|
|
||||||
- SkipDryRunOnMissingResource=true
|
|
||||||
@ -1,118 +0,0 @@
|
|||||||
kube-prometheus-stack:
|
|
||||||
|
|
||||||
crds:
|
|
||||||
enabled: true
|
|
||||||
|
|
||||||
defaultRules:
|
|
||||||
create: false
|
|
||||||
|
|
||||||
alertmanager:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
grafana:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
kubeProxy:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
kubeControllerManager:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
kubeEtcd:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
prometheusOperator:
|
|
||||||
enabled: true
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 50m
|
|
||||||
memory: 128Mi
|
|
||||||
limits:
|
|
||||||
memory: 256Mi
|
|
||||||
networkPolicy:
|
|
||||||
enabled: true
|
|
||||||
flavor: Cilium
|
|
||||||
|
|
||||||
prometheus:
|
|
||||||
enabled: true
|
|
||||||
networkPolicy:
|
|
||||||
enabled: true
|
|
||||||
flavor: Cilium
|
|
||||||
cilium: {}
|
|
||||||
|
|
||||||
# Disable Thanos integration
|
|
||||||
thanosService:
|
|
||||||
enabled: false
|
|
||||||
thanosServiceMonitor:
|
|
||||||
enabled: false
|
|
||||||
thanosServiceExternal:
|
|
||||||
enabled: false
|
|
||||||
thanosIngress:
|
|
||||||
enabled: false
|
|
||||||
|
|
||||||
route:
|
|
||||||
main:
|
|
||||||
enabled: true
|
|
||||||
hostnames:
|
|
||||||
- prometheus.noxxos.nl
|
|
||||||
parentRefs:
|
|
||||||
- name: traefik-gateway
|
|
||||||
namespace: traefik
|
|
||||||
sectionName: websecure
|
|
||||||
serviceMonitor:
|
|
||||||
selfMonitor: false
|
|
||||||
prometheusSpec:
|
|
||||||
# Enable compaction (was disabled for Thanos)
|
|
||||||
disableCompaction: false
|
|
||||||
scrapeInterval: 30s
|
|
||||||
|
|
||||||
# 3 months retention (~90 days)
|
|
||||||
retention: 90d
|
|
||||||
retentionSize: 100GB
|
|
||||||
|
|
||||||
replicas: 1
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 100m
|
|
||||||
memory: 400Mi
|
|
||||||
limits:
|
|
||||||
memory: 2Gi
|
|
||||||
|
|
||||||
# Increased storage for 3 month retention
|
|
||||||
storageSpec:
|
|
||||||
volumeClaimTemplate:
|
|
||||||
spec:
|
|
||||||
storageClassName: ceph-block
|
|
||||||
accessModes: ["ReadWriteOnce"]
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
storage: 150Gi
|
|
||||||
|
|
||||||
# Service monitors
|
|
||||||
scrapeConfigSelectorNilUsesHelmValues: false
|
|
||||||
serviceMonitorSelectorNilUsesHelmValues: false
|
|
||||||
podMonitorSelectorNilUsesHelmValues: false
|
|
||||||
ruleSelectorNilUsesHelmValues: false
|
|
||||||
|
|
||||||
# Additional scrape configs
|
|
||||||
additionalScrapeConfigs: []
|
|
||||||
|
|
||||||
# Node Exporter
|
|
||||||
nodeExporter:
|
|
||||||
enabled: true
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 50m
|
|
||||||
memory: 64Mi
|
|
||||||
limits:
|
|
||||||
memory: 128Mi
|
|
||||||
|
|
||||||
# Kube State Metrics
|
|
||||||
kubeStateMetrics:
|
|
||||||
enabled: true
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 50m
|
|
||||||
memory: 128Mi
|
|
||||||
limits:
|
|
||||||
memory: 256Mi
|
|
||||||
@ -1,281 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
# Kubernetes/Helm Configuration Validator
|
|
||||||
# Validates all applications without deploying them
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
# Colors for output
|
|
||||||
RED='\033[0;31m'
|
|
||||||
GREEN='\033[0;32m'
|
|
||||||
YELLOW='\033[1;33m'
|
|
||||||
BLUE='\033[0;34m'
|
|
||||||
NC='\033[0m' # No Color
|
|
||||||
|
|
||||||
# Counters
|
|
||||||
TOTAL=0
|
|
||||||
PASSED=0
|
|
||||||
FAILED=0
|
|
||||||
|
|
||||||
echo -e "${BLUE}=== Kubernetes Configuration Validator ===${NC}\n"
|
|
||||||
|
|
||||||
# Function to validate a Helm chart
|
|
||||||
validate_helm_chart() {
|
|
||||||
local app_path=$1
|
|
||||||
local app_name=$(basename "$app_path")
|
|
||||||
local namespace=$2
|
|
||||||
|
|
||||||
TOTAL=$((TOTAL + 1))
|
|
||||||
|
|
||||||
echo -e "${YELLOW}[$TOTAL] Validating: $app_name (namespace: $namespace)${NC}"
|
|
||||||
|
|
||||||
# Check if Chart.yaml exists
|
|
||||||
if [ ! -f "$app_path/Chart.yaml" ]; then
|
|
||||||
echo -e "${YELLOW} → Not a Helm chart - skipping Helm validation${NC}\n"
|
|
||||||
TOTAL=$((TOTAL - 1))
|
|
||||||
return 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check if dependencies are built (build to temp location if not)
|
|
||||||
local temp_dir=""
|
|
||||||
if [ -f "$app_path/Chart.yaml" ] && grep -q "dependencies:" "$app_path/Chart.yaml"; then
|
|
||||||
if [ ! -d "$app_path/charts" ]; then
|
|
||||||
echo " → Dependencies not built - building to temporary location..."
|
|
||||||
|
|
||||||
# Create temp directory
|
|
||||||
temp_dir=$(mktemp -d)
|
|
||||||
|
|
||||||
# Copy chart to temp location (remove trailing slash if present)
|
|
||||||
local clean_path="${app_path%/}"
|
|
||||||
cp -r "$clean_path" "$temp_dir/"
|
|
||||||
local temp_chart="$temp_dir/$(basename "$clean_path")"
|
|
||||||
|
|
||||||
# Build dependencies in temp location
|
|
||||||
if ! (cd "$temp_chart" && helm dependency build > /dev/null 2>&1); then
|
|
||||||
echo -e "${RED} ✗ Failed to build dependencies${NC}\n"
|
|
||||||
rm -rf "$temp_dir"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Use temp location for validation
|
|
||||||
app_path="$temp_chart"
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Lint the chart
|
|
||||||
echo " → Running Helm lint..."
|
|
||||||
if ! (cd "$app_path" && helm lint . 2>&1 | grep -q "0 chart(s) failed"); then
|
|
||||||
echo -e "${RED} ✗ Helm lint failed${NC}"
|
|
||||||
(cd "$app_path" && helm lint .)
|
|
||||||
echo ""
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Template the chart
|
|
||||||
echo " → Rendering Helm templates..."
|
|
||||||
|
|
||||||
# Try rendering with validation first (redirect to temp file to avoid hanging on large output)
|
|
||||||
local temp_output=$(mktemp)
|
|
||||||
if (cd "$app_path" && helm template "$app_name" . --namespace "$namespace" --validate > "$temp_output" 2>&1); then
|
|
||||||
template_exit=0
|
|
||||||
else
|
|
||||||
template_exit=$?
|
|
||||||
fi
|
|
||||||
|
|
||||||
if [ $template_exit -ne 0 ]; then
|
|
||||||
# Check if it's just CRD validation warnings
|
|
||||||
if grep -Eqi "(no matches for kind|ensure CRDs are installed)" "$temp_output"; then
|
|
||||||
echo -e "${YELLOW} ⚠ Template validation skipped - requires CRDs to be installed${NC}"
|
|
||||||
# Still try to render without validation
|
|
||||||
if (cd "$app_path" && helm template "$app_name" . --namespace "$namespace" > /dev/null 2>&1); then
|
|
||||||
# Rendering works without validation, this is acceptable
|
|
||||||
rm -f "$temp_output"
|
|
||||||
# Continue with other checks...
|
|
||||||
else
|
|
||||||
echo -e "${RED} ✗ Helm template rendering failed${NC}"
|
|
||||||
head -20 "$temp_output"
|
|
||||||
echo ""
|
|
||||||
rm -f "$temp_output"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
elif grep -qi "exists and cannot be imported into the current release" "$temp_output"; then
|
|
||||||
echo -e "${YELLOW} ⚠ Resource ownership validation skipped - resources may already exist in cluster${NC}"
|
|
||||||
# This is expected when resources already exist, try without validation
|
|
||||||
if (cd "$app_path" && helm template "$app_name" . --namespace "$namespace" > /dev/null 2>&1); then
|
|
||||||
rm -f "$temp_output"
|
|
||||||
# Continue with other checks...
|
|
||||||
else
|
|
||||||
echo -e "${RED} ✗ Helm template rendering failed${NC}"
|
|
||||||
head -20 "$temp_output"
|
|
||||||
echo ""
|
|
||||||
rm -f "$temp_output"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo -e "${RED} ✗ Helm template failed${NC}"
|
|
||||||
head -20 "$temp_output"
|
|
||||||
echo ""
|
|
||||||
rm -f "$temp_output"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
rm -f "$temp_output"
|
|
||||||
|
|
||||||
# Validate with kubeval (if installed)
|
|
||||||
if command -v kubeval &> /dev/null; then
|
|
||||||
echo " → Validating manifests with kubeval..."
|
|
||||||
if ! (cd "$app_path" && helm template "$app_name" . --namespace "$namespace" | kubeval --ignore-missing-schemas > /dev/null 2>&1); then
|
|
||||||
echo -e "${YELLOW} ⚠ Kubeval warnings (may be acceptable)${NC}"
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check for common issues
|
|
||||||
echo " → Checking for common issues..."
|
|
||||||
local rendered=$(cd "$app_path" && helm template "$app_name" . --namespace "$namespace" 2>&1)
|
|
||||||
|
|
||||||
# Check for placeholder secrets
|
|
||||||
if echo "$rendered" | grep -qi "changeme\|placeholder\|CHANGE_ME\|TODO"; then
|
|
||||||
echo -e "${YELLOW} ⚠ Warning: Found placeholder values (changeme/placeholder/TODO)${NC}"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check for resource requests/limits
|
|
||||||
if ! echo "$rendered" | grep -q "resources:"; then
|
|
||||||
echo -e "${YELLOW} ⚠ Warning: No resource requests/limits found${NC}"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Cleanup temp directory if created
|
|
||||||
if [ -n "$temp_dir" ] && [ -d "$temp_dir" ]; then
|
|
||||||
rm -rf "$temp_dir"
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo -e "${GREEN} ✓ Validation passed${NC}\n"
|
|
||||||
PASSED=$((PASSED + 1))
|
|
||||||
return 0
|
|
||||||
}
|
|
||||||
|
|
||||||
# Function to validate an ArgoCD Application manifest
|
|
||||||
validate_argocd_app() {
|
|
||||||
local app_file=$1
|
|
||||||
local app_name=$(basename "$(dirname "$app_file")")
|
|
||||||
|
|
||||||
TOTAL=$((TOTAL + 1))
|
|
||||||
|
|
||||||
echo -e "${YELLOW}[$TOTAL] Validating ArgoCD Application: $app_name${NC}"
|
|
||||||
|
|
||||||
# Check YAML syntax using yq or basic validation
|
|
||||||
if command -v yq &> /dev/null; then
|
|
||||||
if ! yq eval '.' "$app_file" > /dev/null 2>&1; then
|
|
||||||
echo -e "${RED} ✗ Invalid YAML syntax${NC}\n"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
elif ! grep -q "^apiVersion:" "$app_file"; then
|
|
||||||
echo -e "${RED} ✗ Invalid YAML - missing apiVersion${NC}\n"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check for required fields
|
|
||||||
local missing_fields=()
|
|
||||||
grep -q "kind: Application" "$app_file" || missing_fields+=("kind: Application")
|
|
||||||
grep -q "metadata:" "$app_file" || missing_fields+=("metadata")
|
|
||||||
grep -q "spec:" "$app_file" || missing_fields+=("spec")
|
|
||||||
grep -q "source:" "$app_file" || missing_fields+=("source")
|
|
||||||
grep -q "destination:" "$app_file" || missing_fields+=("destination")
|
|
||||||
|
|
||||||
if [ ${#missing_fields[@]} -gt 0 ]; then
|
|
||||||
echo -e "${RED} ✗ Missing required fields: ${missing_fields[*]}${NC}\n"
|
|
||||||
FAILED=$((FAILED + 1))
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo -e "${GREEN} ✓ Validation passed${NC}\n"
|
|
||||||
PASSED=$((PASSED + 1))
|
|
||||||
return 0
|
|
||||||
}
|
|
||||||
|
|
||||||
# Main validation flow
|
|
||||||
echo -e "${BLUE}Validating Monitoring Stack...${NC}\n"
|
|
||||||
|
|
||||||
# Thanos
|
|
||||||
if [ -d "monitoring/thanos" ]; then
|
|
||||||
validate_helm_chart "monitoring/thanos" "monitoring"
|
|
||||||
validate_argocd_app "monitoring/thanos/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Prometheus
|
|
||||||
if [ -d "monitoring/prometheus" ]; then
|
|
||||||
validate_helm_chart "monitoring/prometheus" "monitoring"
|
|
||||||
validate_argocd_app "monitoring/prometheus/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Grafana
|
|
||||||
if [ -d "monitoring/grafana" ]; then
|
|
||||||
validate_helm_chart "monitoring/grafana" "monitoring"
|
|
||||||
validate_argocd_app "monitoring/grafana/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo -e "${BLUE}Validating Logging Stack...${NC}\n"
|
|
||||||
|
|
||||||
# Loki
|
|
||||||
if [ -d "logging/loki" ]; then
|
|
||||||
validate_helm_chart "logging/loki" "logging"
|
|
||||||
validate_argocd_app "logging/loki/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Promtail
|
|
||||||
if [ -d "logging/promtail" ]; then
|
|
||||||
validate_helm_chart "logging/promtail" "logging"
|
|
||||||
validate_argocd_app "logging/promtail/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Additional apps (if they exist)
|
|
||||||
echo -e "${BLUE}Validating Other Applications...${NC}\n"
|
|
||||||
|
|
||||||
for app_dir in */; do
|
|
||||||
# Skip special directories
|
|
||||||
if [[ "$app_dir" == "monitoring/" ]] || [[ "$app_dir" == "logging/" ]]; then
|
|
||||||
continue
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check if it's a Helm chart
|
|
||||||
if [ -f "$app_dir/Chart.yaml" ] && [ -f "$app_dir/application.yaml" ]; then
|
|
||||||
app_name=$(basename "$app_dir")
|
|
||||||
# Try to extract namespace from application.yaml
|
|
||||||
namespace=$(grep -A 10 "destination:" "$app_dir/application.yaml" | grep "namespace:" | head -1 | awk '{print $2}')
|
|
||||||
[ -z "$namespace" ] && namespace="default"
|
|
||||||
validate_helm_chart "$app_dir" "$namespace"
|
|
||||||
validate_argocd_app "$app_dir/application.yaml"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check for nested charts (like ceph/operator, ceph/cluster)
|
|
||||||
for nested_dir in "$app_dir"*/; do
|
|
||||||
if [ -f "$nested_dir/Chart.yaml" ] && [ -f "$nested_dir/application.yaml" ]; then
|
|
||||||
nested_name=$(basename "$nested_dir")
|
|
||||||
# Try to extract namespace from application.yaml
|
|
||||||
namespace=$(grep -A 10 "destination:" "$nested_dir/application.yaml" | grep "namespace:" | head -1 | awk '{print $2}')
|
|
||||||
[ -z "$namespace" ] && namespace="default"
|
|
||||||
validate_helm_chart "$nested_dir" "$namespace"
|
|
||||||
validate_argocd_app "$nested_dir/application.yaml"
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
done
|
|
||||||
|
|
||||||
# Summary
|
|
||||||
echo -e "${BLUE}=== Validation Summary ===${NC}"
|
|
||||||
echo -e "Total checks: $TOTAL"
|
|
||||||
echo -e "${GREEN}Passed: $PASSED${NC}"
|
|
||||||
echo -e "${RED}Failed: $FAILED${NC}\n"
|
|
||||||
|
|
||||||
if [ $FAILED -eq 0 ]; then
|
|
||||||
echo -e "${GREEN}✓ All validations passed!${NC}"
|
|
||||||
exit 0
|
|
||||||
else
|
|
||||||
echo -e "${RED}✗ Some validations failed. Please review the errors above.${NC}"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
Loading…
Reference in New Issue
Block a user