Logging Integration
Overview
KubeArchive needs to support logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive will integrate with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.
It is important to note that logs are tied to Pods. When a user requests the logs
for a Tekton PipelineRun, what they expect to get back are the logs attached to the
Pods that were part of the PipelineRun. Similar cases exist for requesting logs for
Jobs and CronJobs. KubeArchive has to be able to handle this seamlessly for the user.
Retrieving Log Information
In generic terms, this can be done using the owner reference field in a resource, a
sort of backwards recursive search. When a PipelineRun is deleted, all the TaskRuns
associated with the PipelineRun, and Pods associated with those TaskRuns, are
deleted. This is done using the owner references.
KubeArchive can do things similarly. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of log file links. This generic approach should work for any resource.
This also implies that KubeArchive is configured correctly to support this. It must
be configured so that the initial resource and any dependent resources, all the way
down to and including the Pods, are archived.
Here’s a sample KubeArchiveConfig as an example:
---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- deleteWhen: has(status.completionTime)
selector:
apiVersion: ""
kind: CronJob
- archiveOnDelete: true
selector:
apiVersion: ""
kind: Pod
So in this case, the CronJob is configured to be archived and deleted when
the status contains a "completionTime" key. When that deletion happens,
kubernetes will turn around and delete the associated Pod. Since we have
configured archiveOnDelete for Pods to be true, KubeArchive will archive
the Pod itself and generate the URLs for all the associated logs. The
configuration would be similar for PipelineRuns, with the addition of
the archiving of the TaskRuns.
|
Here’s another sample KubeArchiveConfig for PipelineRuns:
---
apiVersion: kubearchive.org/v1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- selector:
apiVersion: tekton.dev/v1
kind: PipelineRun
deleteWhen: has(status.completionTime)
- selector:
apiVersion: tekton.dev/v1
kind: TaskRun
archiveOnDelete: true
- selector:
apiVersion: v1
kind: Pod
archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])
In this case the following happens:
-
PipelineRunsare archived when they complete. -
TaskRunsare archived when they are deleted. -
Podsare archived when they are deleted and are also part of a TektonPipeline.
Configuration
The logging system is an integration in KubeArchive.
The integration requires the successful generation of a URL to access the log for a
specific Pod in the logging system and the credentials
to access that logging system. URLs will ONLY be generated for Pods.
ConfigMap
To support multiple logging systems, the URLs must be able to be parameterized
based on the logging system. This is done via a ConfigMap named
kubearchive-logging.
The ConfigMap contains entries that are used to generate logging URLs.
The only required key in this ConfigMap is LOG_URL.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kubearchive-logging
namespace: kubearchive
data:
# CONTAINER_NAME: "cel:spec.containers.map(m, m.name)" (1)
POD_ID: "cel:metadata.uid" (2)
LOG_URL: "https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089/services/search/jobs/export?search=search%20%2A%20%7C%20spath%20%22kubernetes.pod_id%22%20%7C%20search%20%22kubernetes.pod_id%22%3D%22{POD_ID}%22%20%7C%20spath%20%22kubernetes.container_name%22%20%7C%20search%20%22kubernetes.container_name%22%3D%22{CONTAINER_NAME}%22%20%7C%20sort%20time%20%7C%20table%20%22message%22&output_mode=json" (3)
LOG_URL_JSONPATH: "$.hits.hits[*]._source.message" (4)
| 1 | The CONTAINER_NAME parameter is provided at URL generation time by KubeArchive. |
| 2 | The string to replace POD_ID inside the LOG_URL template.
If it’s prefixed with cel:, it is considered as a CEL expression to locate its value
within the body of the Cloud Event where the resource is stored. |
| 3 | The template for the log URL. The CONTAINER_NAME variable is allowed
even if it’s not defined in the ConfigMap as it’s provided at URL generation time. |
| 4 | Optional JSONPath Expression applied by the API Server to the output of the response body. |
Secret
The credentials to authorize the access to the logging backend API are stored in a Secret
named kubearhive-logging with the mandatory keys USER and PASSWORD:
---
apiVersion: v1
kind: Secret
metadata:
name: kubearchive-logging
namespace: kubearchive
type: Opaque
stringData: (1)
USER: user
PASSWORD: password # notsecret
| 1 | The user and password used for HTTP Basic Access Authentication |
Implementation
Database
The KubeArchive database will have a table named log_url with three fields:
-
A
uuidfield which is a foreign key toresource.uuid. -
A
urlfield which is the URL for one of the logs. -
A
container_namefield which indicates the container that generated the log.
The uuid field should point back to a Pod entry in the resource table.
Sink
When the sink archives a Pod, it must take the additional step go gather all
the log information and generate the log URL for each. These are stored in the
log_url table.
The sink should first delete any existing entries in the log_url table for the
Pod being archived. Earlier archival requests may have already created records
in the log_url table, and they should be removed to avoid duplicates.
The sink will mount and use the kubearchive-logging ConfigMap
for logging.
When generating the logging URL to be stored when a Pod is archived,
the sink does the following steps:
-
A map is created and populated all non-CEL expression key-value pairs from the
ConfigMap. -
The key
CONTAINER_NAMEwith the valuecel:spec.containers.map(m, m.name)is added to the map. If theConfigMapcontained the keyCONTAINER_NAME, its value is overwritten -
All variables containing CEL expression variables are added to the map, and the value for each of these variables is the value returned by evaluating the CEL expression.
-
The value for LOG_URL is then interpolated recursively using this map until no more substitutions are done, resulting in the final URL to the log in the logging system.
API
The API mounts and use the kubearchive-logging Secret to authenticate
against the backend logging API and the ConfigMap to retrieve the optional JSON_PATH
key.
The API exposes the logs under the endpoints:
-
/:version/namespaces/:namespace/:resourceType/:name/logfor core resources -
/:group/:version/namespaces/:namespace/:resourceType/:name/logfor non core resources
The API traverses the owner references to gather all the logs associated
with the given resource. Note that logs could be queried for any resource. Most
resource will not have any logs associated with them or their descendents, but queries on
logs for TaskRuns and Pods are possible.
The API perform post-processing on the output from the response to log URL in order to get the actual log output if a JSONPath expression is configured to be applied to the response body.
CLI
The CLI implements a logs command similar to kubectl logs.
ka logs resource name
This command will return the log contents for the default container in the Pod unless
specified with the -c option.
ka logs PipelineRun generate-logs-9fkp8 -n generate-logs-pipelines -c generate
This will return the log URLs associated with the PipelineRun named "generate-logs-9fkp8"
for the container "generate".