Logging Integration

Overview

KubeArchive needs to support logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive will integrate with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.

It is important to note that logs are tied to Pods. When a user requests the logs for a Tekton PipelineRun, what they expect to get back are the logs attached to the Pods that were part of the PipelineRun. Similar cases exist for requesting logs for Jobs and CronJobs. KubeArchive has to be able to handle this seamlessly for the user.

Retrieving Log Information

In generic terms, this can be done using the owner reference field in a resource, a sort of backwards recursive search. When a PipelineRun is deleted, all the TaskRuns associated with the PipelineRun, and Pods associated with those TaskRuns, are deleted. This is done using the owner references.

KubeArchive can do things similarly. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of log file links. This generic approach should work for any resource.

This also implies that KubeArchive is configured correctly to support this. It must be configured so that the initial resource and any dependent resources, all the way down to and including the Pods, are archived.

Here’s a sample KubeArchiveConfig as an example:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - deleteWhen: has(status.completionTime)
      selector:
        apiVersion: ""
        kind: CronJob
    - archiveOnDelete: true
      selector:
        apiVersion: ""
        kind: Pod

So in this case, the CronJob is configured to be archived and deleted when the status contains a "completionTime" key. When that deletion happens, kubernetes will turn around and delete the associated Pod. Since we have configured archiveOnDelete for Pods to be true, KubeArchive will archive the Pod itself and generate the URLs for all the associated logs. The configuration would be similar for PipelineRuns, with the addition of the archiving of the TaskRuns.

KubeArchive has no responsibility for sending the logs to the logging system. This is all configured elsewhere and outside of KubeArchive.
When the Pod is archived, the URL for accessing the log should be generated and stored with it. There is no attempt to query the logging system to verify the existence of the log.

Here’s another sample KubeArchiveConfig for PipelineRuns:

---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
  name: kubearchive
  namespace: test
spec:
  resources:
    - selector:
        apiVersion: tekton.dev/v1
        kind: PipelineRun
      deleteWhen: has(status.completionTime)
    - selector:
        apiVersion: tekton.dev/v1
        kind: TaskRun
      archiveOnDelete: true
    - selector:
        apiVersion: v1
        kind: Pod
      archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])

In this case the following happens:

PipelineRuns are archived when they complete.
TaskRuns are archived when they are deleted.
Pods are archived when they are deleted and are also part of a Tekton Pipeline.

Configuration

The logging system is an integration in KubeArchive. The integration requires the successful generation of a URL to access the log for a specific Pod in the logging system and the credentials to access that logging system. URLs will ONLY be generated for Pods.

ConfigMap

To support multiple logging systems, the URLs must be able to be parameterized based on the logging system. This is done via a ConfigMap named kubearchive-logging. The ConfigMap contains entries that are used to generate logging URLs. The only required key in this ConfigMap is LOG_URL.

Example of kubearchive-logging ConfigMap for Splunk

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubearchive-logging
  namespace: kubearchive
data:
  # CONTAINER_NAME: "cel:spec.containers.map(m, m.name)" (1)
  POD_ID: "cel:metadata.uid" (2)
  LOG_URL: "https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089/services/search/jobs/export?search=search%20%2A%20%7C%20spath%20%22kubernetes.pod_id%22%20%7C%20search%20%22kubernetes.pod_id%22%3D%22{POD_ID}%22%20%7C%20spath%20%22kubernetes.container_name%22%20%7C%20search%20%22kubernetes.container_name%22%3D%22{CONTAINER_NAME}%22%20%7C%20sort%20time%20%7C%20table%20%22message%22&output_mode=json" (3)
  LOG_URL_JSONPATH: "$.hits.hits[*]._source.message" (4)

1	The `CONTAINER_NAME` parameter is provided at URL generation time by KubeArchive.
2	The string to replace `POD_ID` inside the `LOG_URL` template. If it’s prefixed with `cel:`, it is considered as a CEL expression to locate its value within the body of the Cloud Event where the resource is stored.
3	The template for the log URL. The `CONTAINER_NAME` variable is allowed even if it’s not defined in the `ConfigMap` as it’s provided at URL generation time.
4	Optional JSONPath Expression applied by the API Server to the output of the response body.

Secret

The credentials to authorize the access to the logging backend API are stored in a Secret named kubearhive-logging with the mandatory keys USER and PASSWORD:

Example of kubearchive-logging Secret

---
apiVersion: v1
kind: Secret
metadata:
  name: kubearchive-logging
  namespace: kubearchive
type: Opaque
stringData: (1)
  USER: user
  PASSWORD: password # notsecret

1	The user and password used for HTTP Basic Access Authentication

Implementation

Database

The KubeArchive database will have a table named log_url with three fields:

A uuid field which is a foreign key to resource.uuid.
A url field which is the URL for one of the logs.
A container_name field which indicates the container that generated the log.

The uuid field should point back to a Pod entry in the resource table.

Sink

When the sink archives a Pod, it must take the additional step go gather all the log information and generate the log URL for each. These are stored in the log_url table.

The sink should first delete any existing entries in the log_url table for the Pod being archived. Earlier archival requests may have already created records in the log_url table, and they should be removed to avoid duplicates.

The sink will mount and use the kubearchive-logging ConfigMap for logging.

When generating the logging URL to be stored when a Pod is archived, the sink does the following steps:

A map is created and populated all non-CEL expression key-value pairs from the ConfigMap.
The key CONTAINER_NAME with the value cel:spec.containers.map(m, m.name) is added to the map. If the ConfigMap contained the key CONTAINER_NAME, its value is overwritten
All variables containing CEL expression variables are added to the map, and the value for each of these variables is the value returned by evaluating the CEL expression.
The value for LOG_URL is then interpolated recursively using this map until no more substitutions are done, resulting in the final URL to the log in the logging system.

API

The API mounts and use the kubearchive-logging Secret to authenticate against the backend logging API and the ConfigMap to retrieve the optional JSON_PATH key.

The API exposes the logs under the endpoints:

/:version/namespaces/:namespace/:resourceType/:name/log for core resources
/:group/:version/namespaces/:namespace/:resourceType/:name/log for non core resources

The API traverses the owner references to gather all the logs associated with the given resource. Note that logs could be queried for any resource. Most resource will not have any logs associated with them or their descendents, but queries on logs for TaskRuns and Pods are possible.

The API perform post-processing on the output from the response to log URL in order to get the actual log output if a JSONPath expression is configured to be applied to the response body.

CLI

The CLI implements a logs command similar to kubectl logs.

ka logs resource name

This command will return the log contents for the default container in the Pod unless specified with the -c option.

Example

ka logs PipelineRun generate-logs-9fkp8 -n generate-logs-pipelines -c generate

This will return the log URLs associated with the PipelineRun named "generate-logs-9fkp8" for the container "generate".