Logging Integrations

Overview

KubeArchive supports logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive integrates with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.

It is important to note that logs are tied to Pods. When a user requests the logs for a Tekton PipelineRun, what they expect to get back are the logs attached to the Pods that were part of the PipelineRun. Similar cases exist for requesting logs for Jobs and CronJobs. KubeArchive handles this seamlessly for the user.

KubeArchiveConfig Configuration

KubeArchive retrieves log URLs using the owner references field of a resource. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of Pods and from those a list of log file links. This generic approach works for any resource.

A KubeArchiveConfig needs to be configured correctly to support this, meaning it must be configured so that the initial resource and any dependent resources, all the way down to and including the Pods, are archived.

Example of KubeArchiveConfig that allows the retrieval of Job logs

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - deleteWhen: has(status.completionTime)
      selector:
        apiVersion: "batch/v1"
        kind: Job
    - archiveOnDelete: true
      selector:
        apiVersion: "v1"
        kind: Pod

In this example, the Job is configured to be archived and deleted when the status contains a "completionTime" key. When that deletion happens, kubernetes will in turn delete the associated Pod. Since we have configured archiveOnDelete for Pods to be true, KubeArchive will archive the Pod itself and generate the URLs for all the associated logs.

KubeArchive has no responsibility for sending the logs to the logging system. This is all configured elsewhere and outside of KubeArchive.
When the Pod is archived, the URL for accessing the log are generated and stored with it. There is no attempt to query the logging system to verify the existence of the log.

Example of KubeArchiveConfig allowing the retrieval of PipelineRuns and TaskRuns

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - selector:
        apiVersion: tekton.dev/v1
        kind: PipelineRun
      deleteWhen: has(status.completionTime)
    - selector:
        apiVersion: tekton.dev/v1
        kind: TaskRun
      archiveOnDelete: true
    - selector:
        apiVersion: v1
        kind: Pod
      archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])

In this example the following happens:

PipelineRuns are archived when they complete.
TaskRuns are archived when they are deleted.
Pods are archived when they are deleted and are also part of a Tekton Pipeline.

Configuration

The logging configuration is read once at startup. Changes to the logging ConfigMaps or Secret require restarting the affected components:

kubectl rollout restart deployment --selector=app=kubearchive-sink   (1)
kubectl rollout restart deployment --selector=app=kubearchive-api-server  (2)

1	Restart the Sink after changes to the writer ConfigMap.
2	Restart the API Server after changes to the reader ConfigMap or the secret.

The logging configuration is split into three Kubernetes resources:

kubearchive-logging-writer ConfigMap — used by the Sink to generate log URLs and query metadata at archival time.
kubearchive-logging-reader ConfigMap — used by the API Server to know how to query each logging backend.
kubearchive-logging Secret — used by the API Server to authenticate requests to the logging backends.

The API Server mounts the reader ConfigMap and the secret together as a projected volume. The Sink only mounts the writer ConfigMap.

Writer ConfigMap

The kubearchive-logging-writer ConfigMap is used by the Sink. It contains entries for generating log URLs and query metadata when resources are archived. The key LOG_URL is required and specifies the base URL of the logging backend. Other keys define template variables whose values are extracted from the resource using CEL expressions.

Example of kubearchive-logging-writer ConfigMap

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-writer
  namespace: kubearchive
data:
  POD_ID: "cel:metadata.uid" (1)
  QUERY: "kubernetes.pod_id:{POD_ID} AND kubernetes.container_name:{CONTAINER_NAME}" (2)
  LOG_URL: "https://my-logging-backend.example.com:9200" (3)

1	Values prefixed with `cel:` are CEL expressions evaluated against the resource body. `{POD_ID}` and `{CONTAINER_NAME}` are substituted at URL generation time.
2	`QUERY` is stored alongside the log URL and made available to the reader at query time. The `{CONTAINER_NAME}` variable is always provided by KubeArchive.
3	The base URL of the logging backend. This URL is used as a key to match a provider in the reader ConfigMap and the secret headers.

Additional supported keys include NAMESPACE, START, and END, whose values are also stored and made available for variable substitution at query time.

Reader ConfigMap

The kubearchive-logging-reader ConfigMap is used by the API Server. It contains a single key LOG_PROVIDERS with a YAML value that defines how to query each logging backend. The top-level keys in the YAML are the base URLs of the logging backends (matching the LOG_URL from the writer ConfigMap). Each backend defines a tail and/or full endpoint configuration.

Example of kubearchive-logging-reader ConfigMap

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-reader
  namespace: kubearchive
data:
  LOG_PROVIDERS: |
    https://my-logging-backend.example.com:9200: (1)
      tail: (2)
        reverse: true (3)
        path: /index/_search (4)
        method: GET (5)
        params: (6)
          q: ${QUERY}
          size: ${TAIL_LINES}
          sort: "@timestamp:desc"
        json-path: "$.hits.hits[*]._source.message" (7)
      full: (8)
        reverse: false
        path: /index/_search
        method: GET
        params:
          q: ${QUERY}
          size: 10000
          sort: "@timestamp:asc"
        json-path: "$.hits.hits[*]._source.message"

1	The base URL must match the `LOG_URL` in the writer ConfigMap.
2	`tail` defines the endpoint used when the `tailLines` query parameter is provided.
3	When `reverse` is `true`, the API Server buffers all results and reverses them before returning. Buffering also occurs when the `tailLines` query parameter is set, so that only the last N lines are returned. When neither condition applies, results are streamed directly to the client as they are read.
4	The path appended to the base URL.
5	HTTP method. Supports `GET` (with query parameters) and `POST` (with JSON body).
6	Query parameters for `GET` requests. For `POST` requests, use `body` instead. Template variables like `${QUERY}`, `${TAIL_LINES}`, `${START}`, `${END}`, and `${NAMESPACE}` are substituted at request time.
7	Optional JSONPath expression applied to the response body to extract log lines.
8	`full` defines the endpoint used when no `tailLines` parameter is provided.

Secret

Authentication headers for each logging backend are stored in a kubearchive-logging Secret under a single HEADERS key. The value is a YAML document that maps each backend base URL to its required HTTP headers.

Example of kubearchive-logging Secret

---
apiVersion: v1
kind: Secret
metadata:
  name: kubearchive-logging
  namespace: kubearchive
type: Opaque
stringData:
  HEADERS: | (1)
    https://my-logging-backend.example.com:9200: (2)
      Authorization: "Basic YWRtaW46cGFzc3dvcmQ=" (3)

1	The `HEADERS` key contains a YAML document with per-backend headers.
2	The base URL must match the `LOG_URL` in the writer ConfigMap.
3	HTTP headers sent with every request to this backend.

Supported Logging Systems

KubeArchive currently integrates with Elasticsearch, Splunk, and Loki.

Because the reader ConfigMap and the secret are keyed by base URL, multiple logging backends can coexist at the same time. This is useful when migrating from one backend to another — both can be configured in LOG_PROVIDERS and HEADERS simultaneously, allowing KubeArchive to serve logs from either backend depending on which URL was stored with the resource at archival time.

Elasticsearch

Writer ConfigMap for Elasticsearch

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-writer
data:
  POD_ID: "cel:metadata.uid"
  QUERY: "kubernetes.pod_id:{POD_ID} AND kubernetes.container_name:{CONTAINER_NAME}"
  LOG_URL: "https://kubearchive-es-http.elastic-system.svc.cluster.local:9200"

Reader ConfigMap for Elasticsearch

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-reader
data:
  LOG_PROVIDERS: |
    https://kubearchive-es-http.elastic-system.svc.cluster.local:9200:
      tail:
        reverse: true
        path: /fluentd/_search
        method: GET
        params:
          q: ${QUERY}
          _source_includes: message
          sort: "@timestamp:desc"
          size: ${TAIL_LINES}
        json-path: "$.hits.hits[*]._source.message"
      full:
        reverse: false
        path: /fluentd/_search
        method: GET
        params:
          q: ${QUERY}
          _source_includes: message
          sort: "@timestamp:asc"
          size: 10000
        json-path: "$.hits.hits[*]._source.message"

Splunk

Writer ConfigMap for Splunk

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-writer
data:
  POD_ID: "cel:metadata.uid"
  QUERY: 'search * | spath "kubernetes.pod_id" | search "kubernetes.pod_id"="{POD_ID}" | spath "kubernetes.container_name" | search "kubernetes.container_name"="{CONTAINER_NAME}" | sort time | table "message"'
  LOG_URL: "https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089"

Reader ConfigMap for Splunk

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-reader
data:
  LOG_PROVIDERS: |
    https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089:
      tail:
        reverse: false
        path: /services/search/jobs/export
        method: GET
        params:
          search: ${QUERY} | head ${TAIL_LINES}
          output_mode: json
        json-path: "$.result.message"
      full:
        reverse: false
        path: /services/search/jobs/export
        method: GET
        params:
          search: ${QUERY}
          output_mode: json
        json-path: "$.result.message"

Secret for Splunk

---
apiVersion: v1
kind: Secret
metadata:
  name: kubearchive-logging
stringData:
  HEADERS: |
    https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089:
      Authorization: "Basic YWRtaW46cGFzc3dvcmQ="

Loki

Writer ConfigMap for Loki

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-writer
data:
  NAMESPACE: "cel:metadata.namespace"
  POD_ID: "cel:metadata.uid"
  START: "cel:status.?startTime == optional.none() ? int(now()-duration('1h'))*1000000000: status.startTime"
  QUERY: '{stream="{NAMESPACE}"} | pod_id = `{POD_ID}` | container = `{CONTAINER_NAME}`'
  LOG_URL: "http://loki-gateway.grafana-loki.svc.cluster.local:80"

Reader ConfigMap for Loki

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging-reader
data:
  LOG_PROVIDERS: |
    http://loki-gateway.grafana-loki.svc.cluster.local:80:
      tail:
        reverse: true
        path: /loki/api/v1/query_range
        method: GET
        params:
          query: ${QUERY}
          start: ${START}
          limit: ${TAIL_LINES}
          direction: backward
        json-path: "$.data.result[*].values[*][1]"
      full:
        reverse: false
        path: /loki/api/v1/query_range
        method: GET
        params:
          query: ${QUERY}
          start: ${START}
          limit: 10000
          direction: forward
        json-path: "$.data.result[*].values[*][1]"

Secret for Loki

---
apiVersion: v1
kind: Secret
metadata:
  name: kubearchive-logging
stringData:
  HEADERS: |
    http://loki-gateway.grafana-loki.svc.cluster.local:80:
      X-Scope-OrgID: "kubearchive"