Logging Integrations

Overview

KubeArchive supports logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive integrates with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.

It is important to note that logs are tied to Pods. When a user requests the logs for a Tekton PipelineRun, what they expect to get back are the logs attached to the Pods that were part of the PipelineRun. Similar cases exist for requesting logs for Jobs and CronJobs. KubeArchive handles this seamlessly for the user.

KubeArchiveConfig Configuration

KubeArchive retrieves log URLs using the owner references field of a resource. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of Pods and from those a list of log file links. This generic approach works for any resource.

A KubeArchiveConfig needs to be configured correctly to support this, meaning it must be configured so that the initial resource and any dependent resources, all the way down to and including the Pods, are archived.

Here’s a sample KubeArchiveConfig as an example:

---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - deleteWhen: has(status.completionTime)
      selector:
        apiVersion: ""
        kind: CronJob
    - archiveOnDelete: true
      selector:
        apiVersion: ""
        kind: Pod

In this example, the CronJob is configured to be archived and deleted when the status contains a "completionTime" key. When that deletion happens, kubernetes will in turn delete the associated Pod. Since we have configured archiveOnDelete for Pods to be true, KubeArchive will archive the Pod itself and generate the URLs for all the associated logs.

KubeArchive has no responsibility for sending the logs to the logging system. This is all configured elsewhere and outside of KubeArchive.
When the Pod is archived, the URL for accessing the log are generated and stored with it. There is no attempt to query the logging system to verify the existence of the log.

Here’s another sample KubeArchiveConfig for PipelineRuns:

---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - selector:
        apiVersion: tekton.dev/v1
        kind: PipelineRun
      deleteWhen: has(status.completionTime)
    - selector:
        apiVersion: tekton.dev/v1
        kind: TaskRun
      archiveOnDelete: true
    - selector:
        apiVersion: v1
        kind: Pod
      archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])

In this example the following happens:

PipelineRuns are archived when they complete.
TaskRuns are archived when they are deleted.
Pods are archived when they are deleted and are also part of a Tekton Pipeline.

Configuring Log URL Generation

Logging URL generation in KubeArchive is controlled using the ConfigMap named kubearchive-logging in the KubeArchive installation namespace. URLs are generated using parameters that allow the URL to point to the specific log file associated with the given Pod. This ConfigMap requires a single entry, LOG_URL, whose string value will be interpolated using the other variables defined in the ConfigMap. For example, take this ConfigMap for Splunk:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging
  namespace: kubearchive
data:
  POD_ID: cel:metadata.uid
  POD: 'spath "kubernetes.pod_id" | search "kubernetes.pod_id"="{POD_ID}"'
  CONTAINER: 'spath "kubernetes.container_name" | search "kubernetes.container_name"="{CONTAINER_NAME}"'
  LOG_URL: http://127.0.0.1:8111/app/search/search?q=search * | {POD} | {CONTAINER}

The value of each variable is either a string or a CEL expression. A value that begins with the prefix "cel:" will be evaluated as a CEL expression against the body of the cloud event (that is the resource) to determine the real value of that variable used in the substitution. For example:

When generating a log URL to be stored when a Pod is archived, the following steps are done:

All variables containing CEL expression variables are evaluated against the Pod resource being archived.
The value for LOG_URL is then interpolated recursively using the values in the ConfigMap until no more substitutions are done, resulting in the final URL to the log in the logging system.

The variable CONTAINER_NAME is provided automatically by KubeArchive and can be used if your LOG_URL requires the name of the container.

The KubeArchive Sink is not aware of changes to the kubearchive-logging ConfigMap. After making changes to kubearchive-logging, the KubeArchive Sink must be restarted. The following command can be used to perform the restart:

kubectl rollout restart deployment --selector=app=kubearchive-sink

Supported Logging Systems

KubeArchive currently integrates with both Splunk and Elasticsearch

Elasticsearch

Following is a sample ConfigMap that generates log URLs for Elasticsearch.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging
  namespace: kubearchive
data:
  # NOTE: CONTAINER_NAME is provided at URL generation time by KubeArchive.
  POD_ID: "cel:metadata.uid"
  LOG_URL: "https://localhost:9200/fluentd/_search?_source_includes=message&size=10000&sort=_doc&q=kubernetes.pod_id:{POD_ID}%20AND%20kubernetes.container_name:{CONTAINER_NAME}"
  LOG_URL_JSONPATH: "$.hits.hits[*]._source.message"

Splunk

Following is a sample ConfigMap for generating URLs for Splunk.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging
  namespace: kubearchive
data:
  # NOTE: CONTAINER_NAME is provided at URL generation time by KubeArchive.
  POD_ID: "cel:metadata.uid"
  LOG_URL: "https://localhost:8089/services/search/jobs/export?search=search%20%2A%20%7C%20spath%20%22kubernetes.pod_id%22%20%7C%20search%20%22kubernetes.pod_id%22%3D%22{POD_ID}%22%20%7C%20spath%20%22kubernetes.container_name%22%20%7C%20search%20%22kubernetes.container_name%22%3D%22{CONTAINER_NAME}%22%20%7C%20sort%20time%20%7C%20table%20%22message%22&output_mode=json"
  LOG_URL_JSONPATH: "$[*].result.message"