From 29771edc0d7d6603eb9210ddda3182b7c2fec1a5 Mon Sep 17 00:00:00 2001 From: Gabriele Bartolini Date: Fri, 2 May 2025 15:03:42 +0200 Subject: [PATCH] docs: review Signed-off-by: Gabriele Bartolini --- web/docs/object_stores.md | 324 +++++++++++++++++--------------------- 1 file changed, 148 insertions(+), 176 deletions(-) diff --git a/web/docs/object_stores.md b/web/docs/object_stores.md index 6ca8f55..8f516af 100644 --- a/web/docs/object_stores.md +++ b/web/docs/object_stores.md @@ -2,46 +2,52 @@ sidebar_position: 99 --- -# Object store providers +# Object Store Providers -The Barman Cloud Plugin allows you to store Postgres clusters backup files in any object store service that is supported by the Barman Cloud infrastructure. Barman Cloud supports the following providers: +The Barman Cloud Plugin enables the storage of PostgreSQL cluster backup files +in any object storage service supported by the +[Barman Cloud infrastructure](https://docs.pgbarman.org/release/latest/). + +Currently, Barman Cloud supports the following providers: - [Amazon S3](#aws-s3) - [Microsoft Azure Blob Storage](#azure-blob-storage) - [Google Cloud Storage](#google-cloud-storage) -You can also use any compatible implementation of the supported services. +You may also use any S3- or Azure-compatible implementation of the above +services. -Barman Cloud Plugin requires you to define an [`ObjectStore` object](plugin-barman-cloud.v1.md#objectstore) that provides the link with the object store service. +To configure object storage with Barman Cloud, you must define an +[`ObjectStore` object](plugin-barman-cloud.v1.md#objectstore), which +establishes the connection between your PostgreSQL cluster and the object +storage backend. -The required setup depends on the chosen storage provider and is discussed in the following sections. +The configuration steps vary depending on the storage provider. The following +sections detail the setup for each. + +--- ## AWS S3 -[AWS Simple Storage Service (S3)](https://aws.amazon.com/s3/) is -a very popular object storage service offered by Amazon. +[AWS Simple Storage Service (S3)](https://aws.amazon.com/s3/) is one of the +most widely adopted object storage solutions. -As far as CloudNativePG backup is concerned, you can define the permissions to -store backups in S3 buckets in two ways: +The Barman Cloud plugin for CloudNativePG integrates with S3 through two +primary authentication mechanisms: -- If CloudNativePG is running in EKS. you may want to use the - [IRSA authentication method](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) -- Alternatively, you can use the `ACCESS_KEY_ID` and `ACCESS_SECRET_KEY` credentials +- [IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) — + recommended for clusters running on EKS +- Access keys — using `ACCESS_KEY_ID` and `ACCESS_SECRET_KEY` credentials -### AWS Access key +### Access Keys -You will need the following information about your environment: +To authenticate using access keys, you’ll need: -- `ACCESS_KEY_ID`: the ID of the access key that will be used - to upload files into S3 +- `ACCESS_KEY_ID`: the public key used to authenticate to S3 +- `ACCESS_SECRET_KEY`: the corresponding secret key +- `ACCESS_SESSION_TOKEN`: (optional) a temporary session token, if required -- `ACCESS_SECRET_KEY`: the secret part of the access key mentioned above - -- `ACCESS_SESSION_TOKEN`: the optional session token, in case it is required - -The access key used must have permission to upload files into -the bucket. Given that, you must create a Kubernetes secret with the -credentials, and you can do that with the following command: +These credentials must be stored securely in a Kubernetes secret: ```sh kubectl create secret generic aws-creds \ @@ -50,11 +56,10 @@ kubectl create secret generic aws-creds \ # --from-literal=ACCESS_SESSION_TOKEN= # if required ``` -The credentials will be stored inside Kubernetes and will be encrypted -if encryption at rest is configured in your installation. +The credentials will be encrypted at rest if your Kubernetes environment +supports it. -Once that secret has been created, you can configure your cluster like in -the following example: +You can then reference the secret in your `ObjectStore` definition: ```yaml kind: ObjectStore @@ -62,7 +67,7 @@ metadata: name: aws-store spec: configuration: - destinationPath: "" + destinationPath: "s3://BUCKET_NAME/path/to/folder" s3Credentials: accessKeyId: name: aws-creds @@ -73,23 +78,16 @@ spec: [...] ``` -The destination path can be any URL pointing to a folder where -the instance can upload the WAL files, e.g. -`s3://BUCKET_NAME/path/to/folder`. - ### IAM Role for Service Account (IRSA) -In order to use IRSA you need to set an `annotation` in the `ServiceAccount` of -the Postgres cluster. - -We can configure CloudNativePG to inject them using the `serviceAccountTemplate` -stanza: +To use IRSA with EKS, configure the service account of the PostgreSQL cluster +with the appropriate annotation: ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: -[...] + [...] spec: serviceAccountTemplate: metadata: @@ -98,25 +96,31 @@ spec: [...] ``` -### S3 lifecycle policy +### S3 Lifecycle Policy -Barman Cloud writes objects to S3, then does not update them until they are -deleted by the Barman Cloud retention policy. A recommended approach for an S3 -lifecycle policy is to expire the current version of objects a few days longer -than the Barman retention policy, enable object versioning, and expire -non-current versions after a number of days. Such a policy protects against -accidental deletion, and also allows for restricting permissions to the -CloudNativePG workload so that it may delete objects from S3 without granting -permissions to permanently delete objects. +Barman Cloud uploads backup files to S3 but does not modify or delete them afterward. +To enhance data durability and protect against accidental or malicious loss, +it's recommended to implement the following best practices: -### Other S3-compatible Object Storages providers +- Enable object versioning +- Enable object locking to prevent objects from being deleted or overwritten + for a defined period or indefinitely (this provides an additional layer of + protection against accidental deletion and ransomware attacks) +- Set lifecycle rules to expire current versions a few days after your Barman + retention window +- Expire non-current versions after a longer period -In case you're using S3-compatible object storage, like **MinIO** or -**Linode Object Storage**, you can specify an endpoint instead of using the -default S3 one. +These strategies help you safeguard backups without requiring broad delete +permissions, ensuring both security and compliance with minimal operational +overhead. -In this example, it will use the `bucket` of **Linode** in the region -`us-east1`. + +### S3-Compatible Storage Providers + +You can use S3-compatible services like **MinIO**, **Linode (Akamai) Object Storage**, +or **DigitalOcean Spaces** by specifying a custom `endpointURL`. + +Example with Linode (Akamai) Object Storage (`us-east1`): ```yaml kind: ObjectStore @@ -124,15 +128,14 @@ metadata: name: linode-store spec: configuration: - destinationPath: "s3://bucket/" + destinationPath: "s3://BUCKET_NAME/" endpointURL: "https://us-east1.linodeobjects.com" s3Credentials: [...] [...] ``` -In case you're using **Digital Ocean Spaces**, you will have to use the Path-style syntax. -In this example, it will use the `bucket` from **Digital Ocean Spaces** in the region `SFO3`. +Example with DigitalOcean Spaces (SFO3, path-style): ```yaml kind: ObjectStore @@ -140,29 +143,30 @@ metadata: name: digitalocean-store spec: configuration: - destinationPath: "s3://[your-bucket-name]/[your-backup-folder]/" + destinationPath: "s3://BUCKET_NAME/path/to/folder" endpointURL: "https://sfo3.digitaloceanspaces.com" s3Credentials: [...] [...] ``` -### Using Object Storage with a private CA +### Using Object Storage with a Private CA -Suppose you configure an Object Storage provider which uses a certificate -signed with a private CA, for example when using MinIO via HTTPS. In that case, -you need to set the option `endpointCA` inside `barmanObjectStore` referring -to a secret containing the CA bundle, so that Barman can verify the certificate -correctly. -You can find instructions on creating a secret using your cert files in the -certificates.md document. -Once you have created the secret, you can populate the `endpointCA` as in the -following example: +For object storage services (e.g., MinIO) that use HTTPS with certificates +signed by a private CA, set the `endpointCA` field in the `ObjectStore` +definition. Unless you already have it, create a Kubernetes `Secret` with the +CA bundle: -``` yaml +```sh +kubectl create secret generic my-ca-secret --from-file=ca.crt +``` + +Then reference it: + +```yaml kind: ObjectStore metadata: - name: digitalocean-store + name: minio-store spec: configuration: endpointURL: @@ -174,28 +178,30 @@ spec: :::note -If you want ConfigMaps and Secrets to be **automatically** reloaded by instances, you can -add a label with key `cnpg.io/reload` to the Secrets/ConfigMaps. Otherwise, you will have to reload -the instances using the `kubectl cnpg reload` subcommand. +If you want ConfigMaps and Secrets to be **automatically** reloaded by +instances, you can add a label with key `cnpg.io/reload` to the +Secrets/ConfigMaps. Otherwise, you will have to reload the instances using the +`kubectl cnpg reload` subcommand. ::: +--- + ## Azure Blob Storage -[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) is the -object storage service provided by Microsoft. +[Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) +is Microsoft’s cloud-based object storage solution. -In order to access your storage account for backup and recovery of -CloudNativePG managed databases, you will need one of the following -combinations of credentials: +Barman Cloud supports the following authentication methods: -- [Connection String](https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string#configure-a-connection-string-for-an-azure-storage-account) -- Storage account name and [Storage account access key](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage) -- Storage account name and [Storage account SAS Token](https://docs.microsoft.com/en-us/azure/storage/blobs/sas-service-create) -- Storage account name and [Azure AD Workload Identity](https://azure.github.io/azure-workload-identity/docs/introduction.html) -properly configured. +- [Connection String](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string) +- Storage Account Name + [Access Key](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage) +- Storage Account Name + [SAS Token](https://learn.microsoft.com/en-us/azure/storage/blobs/sas-service-create) +- [Azure AD Workload Identity](https://azure.github.io/azure-workload-identity/docs/introduction.html) -Using **Azure AD Workload Identity**, you can avoid saving the credentials into a Kubernetes Secret, -and have a Cluster configuration adding the `inheritFromAzureAD` as follows: +### Azure AD Workload Identity + +This method avoids storing credentials in Kubernetes via the +`.spec.configuration.inheritFromAzureAD` option: ```yaml kind: ObjectStore @@ -209,11 +215,11 @@ spec: [...] ``` -On the other side, using both **Storage account access key** or **Storage account SAS Token**, -the credentials need to be stored inside a Kubernetes Secret, adding data entries only when -needed. The following command performs that: +### Access Key, SAS Token, or Connection String -``` sh +Store credentials in a Kubernetes secret: + +```sh kubectl create secret generic azure-creds \ --from-literal=AZURE_STORAGE_ACCOUNT= \ --from-literal=AZURE_STORAGE_KEY= \ @@ -221,11 +227,7 @@ kubectl create secret generic azure-creds \ --from-literal=AZURE_STORAGE_CONNECTION_STRING= ``` -The credentials will be encrypted at rest, if this feature is enabled in the used -Kubernetes cluster. - -Given the previous secret, the provided credentials can be injected inside the cluster -configuration: +Then reference the required keys in your `ObjectStore`: ```yaml kind: ObjectStore @@ -250,48 +252,41 @@ spec: [...] ``` -When using the Azure Blob Storage, the `destinationPath` fulfills the following -structure: +For Azure Blob, the destination path format is: -``` sh -://..core.windows.net/ +``` +://..core.windows.net// ``` -where `` is `/`. The **account name**, -which is also called **storage account name**, is included in the used host name. +### Azure-Compatible Providers -### Other Azure Blob Storage compatible providers +If you're using a different implementation (e.g., Azurite or emulator): -If you are using a different implementation of the Azure Blob Storage APIs, -the `destinationPath` will have the following structure: - -``` sh -://:// +``` +://:/// ``` -In that case, `` is the first component of the path. - -This is required if you are testing the Azure support via the Azure Storage -Emulator or [Azurite](https://github.com/Azure/Azurite). +--- ## Google Cloud Storage -Currently, the CloudNativePG operator supports two authentication methods for -[Google Cloud Storage](https://cloud.google.com/storage/): +[Google Cloud Storage](https://cloud.google.com/storage/) is supported with two +authentication modes: -- the first one assumes that the pod is running inside a Google Kubernetes Engine cluster -- the second one leverages the environment variable `GOOGLE_APPLICATION_CREDENTIALS` +- **GKE Workload Identity** (recommended inside Google Kubernetes Engine) +- **Service Account JSON key** via the `GOOGLE_APPLICATION_CREDENTIALS` environment variable -### Running inside Google Kubernetes Engine +### GKE Workload Identity -When running inside Google Kubernetes Engine you can configure your backups to -simply rely on [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity), -without having to set any credentials. In particular, you need to: +Use the [Workload Identity authentication](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) +when running in GKE: -- set `.spec.configuration.googleCredentials.gkeEnvironment` to `true` in the `ObjectStore` object store provided by Barman Cloud Plugin -- set the `iam.gke.io/gcp-service-account` annotation in the `serviceAccountTemplate` stanza of the Postgres `Cluster` +1. Set `googleCredentials.gkeEnvironment` to `true` in the `ObjectStore` + resource +2. Annotate the `serviceAccountTemplate` in the `Cluster` resource with the GCP + service account -Please use the following example as a reference: +For example, in the `ObjectStore` resource: ```yaml kind: ObjectStore @@ -299,47 +294,39 @@ metadata: name: google-store spec: configuration: - destinationPath: "gs://" + destinationPath: "gs:///" googleCredentials: gkeEnvironment: true ``` -And then: +And in the `Cluster` resource: ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster -[...] spec: - [...] serviceAccountTemplate: metadata: annotations: - iam.gke.io/gcp-service-account: [...].iam.gserviceaccount.com - [...] + iam.gke.io/gcp-service-account: [...].iam.gserviceaccount.com ``` -### Using authentication +### Service Account JSON Key -Following the [instruction from Google](https://cloud.google.com/docs/authentication/getting-started) -you will get a JSON file that contains all the required information to authenticate. +Follow Google’s [authentication setup](https://cloud.google.com/docs/authentication/getting-started), +then: -The content of the JSON file must be provided using a `Secret` that can be created -with the following command: - -```shell +```sh kubectl create secret generic backup-creds --from-file=gcsCredentials=gcs_credentials_file.json ``` -This will create the `Secret` with the name `backup-creds` to be used in the yaml file like this: - ```yaml kind: ObjectStore metadata: name: google-store spec: configuration: - destinationPath: "gs://" + destinationPath: "gs:///" googleCredentials: applicationCredentials: name: backup-creds @@ -347,42 +334,37 @@ spec: [...] ``` -Now the plugin will use the credentials to authenticate against Google Cloud Storage. - :::important -This way of authentication will create a JSON file inside the container with all the needed -information to access your Google Cloud Storage bucket, meaning that if someone gets access to the pod -will also have write permissions to the bucket. +This authentication method generates a JSON file within the container +containing all the credentials required to access your Google Cloud Storage +bucket. As a result, if someone gains access to the pod, they will also have +write permissions to the bucket. ::: +--- + + ## MinIO Gateway -Optionally, you can use MinIO Gateway as a common interface which -relays backup objects to other cloud storage solutions, like S3 or GCS. +MinIO Gateway can proxy requests to cloud object storage providers like S3 or GCS. For more information, please refer to [MinIO official documentation](https://docs.min.io/). -Specifically, the CloudNativePG cluster can directly point to a local -MinIO Gateway as an endpoint, using previously created credentials and service. +### Setup -MinIO secrets will be used by both the PostgreSQL cluster and the MinIO instance. -Therefore, you must create them in the same namespace: +Create MinIO access credentials: ```sh kubectl create secret generic minio-creds \ - --from-literal=MINIO_ACCESS_KEY= \ - --from-literal=MINIO_SECRET_KEY= + --from-literal=MINIO_ACCESS_KEY= \ + --from-literal=MINIO_SECRET_KEY= ``` :::note -Cloud Object Storage credentials will be used only by MinIO Gateway in this case. +Cloud Object Storage credentials will be used only by MinIO Gateway in this +case. ::: -:::important -In order to allow PostgreSQL to reach MinIO Gateway, it is necessary to create a -`ClusterIP` service on port `9000` bound to the MinIO Gateway instance. -::: - -For example: +Expose MinIO Gateway via `ClusterIP`: ```yaml apiVersion: v1 @@ -399,17 +381,7 @@ spec: app: minio ``` -:::warning -At the time of writing this documentation, the official -[MinIO Operator](https://github.com/minio/minio-operator/issues/71) -for Kubernetes does not support the gateway feature. As such, we will use a -`deployment` instead. -::: - -The MinIO deployment will use cloud storage credentials to upload objects to the -remote bucket and relay backup files to different locations. - -Here is an example using AWS S3 as Cloud Object Storage: +Here follows an excerpt of an example of deployment relaying to S3: ```yaml apiVersion: apps/v1 @@ -419,11 +391,10 @@ spec: containers: - name: minio image: minio/minio:RELEASE.2020-06-03T22-13-49Z - args: - - gateway - - s3 + args: ["gateway", "s3"] + ports: + - containerPort: 9000 env: - # MinIO access key and secret key - name: MINIO_ACCESS_KEY valueFrom: secretKeyRef: @@ -434,7 +405,6 @@ spec: secretKeyRef: name: minio-creds key: MINIO_SECRET_KEY - # AWS credentials - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: @@ -451,12 +421,10 @@ spec: # secretKeyRef: # name: aws-creds # key: ACCESS_SESSION_TOKEN - ports: - - containerPort: 9000 ``` -Proceed by configuring MinIO Gateway service as the `endpointURL` in the `Cluster` -definition, then choose a bucket name to replace `BUCKET_NAME`: +Proceed by configuring MinIO Gateway service as the `endpointURL` in the +`ObjectStore` definition, then choose a bucket name to replace `BUCKET_NAME`: ```yaml kind: ObjectStore @@ -476,5 +444,9 @@ spec: [...] ``` +:::important Verify on `s3://BUCKET_NAME/` the presence of archived WAL files before proceeding with a backup. +::: + +---