Configuring KubeArchive

This document explains how to configure KubeArchive so you can archive resources and query them.

Prerequisites

The KubeArchiveConfig resource

To configure KubeArchive to archive or delete resources from the cluster create a KubeArchiveConfig custom resource. KubeArchiveConfigs are limited to one per namespace and kubearchive is the only name allowed. KubeArchiveConfigs have this general form:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources: [...] (1)
1 spec.resources is a list of elements, each defining rules for a specific kind, so KubeArchive knows when to archive or delete them.

selector: Selecting Resources

The key selector within KubeArchiveConfigs define resources. It requires two keys: kind and apiVersion. Each entry on spec.resources requires a selector:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources:
    - selector:
        apiVersion: v1
        kind: Pod
      ...
    - selector:
        apiVersion: batch/v1
        kind: Job
      ...
    - selector:
        apiVersion: apps/v1
        kind: Deployment
      ...

With each entry on spec.resources KubeArchive requires one of archiveWhen, deleteWhen and archiveOnDelete. These keys accept a string which is an expression in the CEL language format. When a resource defined by selector changes or gets deleted KubeArchive evaluates the expressions. They must evaluate to either true or false.

archiveWhen: Archiving Resources

The most basic feature that KubeArchive offers is archiving. Use it with the archiveWhen key within the entry for a resource. The following example configures KubeArchive to archive pods when they match the status.phase == "Succeeded" condition:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources:
    - selector:
        apiVersion: v1
        kind: Pod
      archiveWhen: status.phase == "Succeeded"

archiveWhen: "true" is also a valid expression that configures KubeArchive to archive the resource every time its updated.

archiveWhen is processed by both the controller and vacuums. Resources processed by the controller are handled immediately after Kubernetes sends the event. Resources processed by vacuums are handled based on how often the vacuum operations are scheduled to run.

To see it in action apply the KubeArchiveConfig in your namespace and run a pod:

kubectl run fedora --image quay.io/fedora/fedora:latest --restart Never -- sleep 10

The pod does not appear in KubeArchive until it completes. After completion, it is present in KubeArchive:

$ curl --insecure \
    -H "Authorization: Bearer ${SA_TOKEN}" \
    https://localhost:8081/api/v1/namespaces/default/pods \
    | jq -r '.items[] | [.metadata.name, .metadata.uid] | @csv'

"fedora", "a3bdb6d2-b683-4913-9d24-e01af60c94e3"

The examples include jq to reduce the output length.

The pod remains in Kubernetes, occupying space:

$ kubectl get pods --namespace default
NAME     READY   STATUS      RESTARTS   AGE
fedora   0/1     Completed   0          4m50s

deleteWhen: Deleting Resources

The key feature of KubeArchive is deleting resources from the Kubernetes cluster. This feature keeps the Kubernetes cluster free of resources that are not needed anymore. To enable deletion of resources use the deleteWhen key.

KubeArchive archives resources deleted using deleteWhen.

deleteWhen is processed by the controller only. Resources are handled immediately after Kubernetes sends the event.

The following KubeArchiveConfig configures KubeArchive to delete (and archive) pods when they match status.phase == "Succeeded":

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources:
    - selector:
        apiVersion: v1
        kind: Pod
      deleteWhen: status.phase == "Succeeded"

To see it in action apply the KubeArchiveConfig in your namespace and run a pod:

kubectl run auto-deleted --image quay.io/fedora/fedora:latest --restart Never -- echo "sleep 10"

Watch pods to see that after they complete, KubeArchive removes them automatically:

$ kubectl get pods -w
NAME        READY   STATUS              RESTARTS   AGE
auto-deleted   0/1     ContainerCreating   0          2s
auto-deleted   1/1     Running             0          2s
auto-deleted   0/1     Completed           0          13s
auto-deleted   0/1     Completed           0          14s
auto-deleted   0/1     Terminating         0          14s
auto-deleted   0/1     Terminating         0          14s

After KubeArchive removes the pod from the cluster, retrieve it using the command:

$ curl --insecure \
    -H "Authorization: Bearer ${SA_TOKEN}" \
    https://localhost:8081/api/v1/namespaces/default/pods \
    | jq -r '.items[] | [.metadata.name, .metadata.uid] | @csv'

...
"auto-deleted","64c48176-ba8c-4f2a-a662-1fd660f7a3b6"

keepLastWhen: Keeping the Last N Resources

The keepLastWhen key provides a way to automatically retain only the most recent N resources that match specific criteria, while deleting older ones. This is particularly useful for managing resources like completed jobs, where you want to keep only the latest few for reference while cleaning up older ones.

KubeArchive archives resources deleted using keepLastWhen.

keepLastWhen is processed by vacuums only. Resources are handled based on how often the vacuum operations are scheduled to run, not immediately after Kubernetes sends events.

The following KubeArchiveConfig configures KubeArchive to keep only the last 3 completed jobs:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources:
    - selector:
        apiVersion: batch/v1
        kind: Job
      keepLastWhen:
        keep:
          - when: "has(status.completionTime)"
            count: 3

To see it in action, apply the KubeArchiveConfig and create several completed jobs:

kubectl create job job-1 --image=busybox -- /bin/sh -c "exit 0"
kubectl create job job-2 --image=busybox -- /bin/sh -c "exit 0"
kubectl create job job-3 --image=busybox -- /bin/sh -c "exit 0"
kubectl create job job-4 --image=busybox -- /bin/sh -c "exit 0"
kubectl create job job-5 --image=busybox -- /bin/sh -c "exit 0"

Wait for the vacuum to run. After the vacuum processes these jobs, only the 3 most recent completed jobs remain:

$ kubectl get jobs --namespace default
NAME    COMPLETIONS   DURATION   AGE
job-3   1/1           2s         3m
job-4   1/1           2s         2m
job-5   1/1           2s         1m

The older jobs (job-1 and job-2) are deleted from the cluster and archived in KubeArchive:

$ curl --insecure \
    -H "Authorization: Bearer ${SA_TOKEN}" \
    https://localhost:8081/api/v1/namespaces/default/jobs \
    | jq -r '.items[] | [.metadata.name, .metadata.uid] | @csv'

"job-1","..."
"job-2","..."

For more details about keepLastWhen configuration options, including how to override cluster-wide rules and use custom sort fields, see KubeArchiveConfig CRD.

archiveOnDelete: Archiving on Deletion From the Cluster

You can use KubeArchvie with other applications that clean up resources. This enables you to keep using a specialized tool for deletion and use KubeArchive to store the resources. The following KubeArchiveConfig configures KubeArchive to archive pods when they get deleted from the cluster only if they match the condition status.phase == "Succeeded" so failed pods that get deleted do not get archived.

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: default
spec:
  resources:
    - selector:
        apiVersion: v1
        kind: Pod
      archiveOnDelete: status.phase == "Succeeded"

archiveOnDelete is processed by the controller only. Resources are handled immediately after Kubernetes sends the event.

To see it in action apply the KubeArchiveConfig in your namespace and run a couple of pods:

kubectl run failed --image quay.io/fedora/fedora:latest --restart Never -- false
kubectl run archived-on-deletion --image quay.io/fedora/fedora:latest --restart Never -- echo "hello world"

Wait for them to fail and complete and then delete them:

kubectl delete pod archived-on-deletion
kubectl delete pod failed

Query KubeArchive to check that it only archives the pod that completed correctly (archived-on-deletion):

$ curl --insecure \
    -H "Authorization: Bearer ${SA_TOKEN}" \
    https://localhost:8081/api/v1/namespaces/default/pods \
    | jq -r '.items[] | [.metadata.name, .metadata.uid] | @csv'

...
"archived-on-deletion","2c5fd5f6-cdab-4d6b-b008-b3f5cff5df9e"

Next Steps

These are the three main functionalities KubeArchive offers related to resource archiving. Explore the documentation to learn more about KubeArchive and go to cel.dev to learn more about the expression language KubeArchive uses.