Logging Integration
Overview
KubeArchive needs to support logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive will integrate with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.
It is important to note that logs are tied to Pods
. When a user requests the logs
for a Tekton PipelineRun
, what they expect to get back are the logs attached to the
Pods
that were part of the PipelineRun
. Similar cases exist for requesting logs for
Jobs
and CronJobs
. KubeArchive has to be able to handle this seamlessly for the user.
Retrieving Log Information
In generic terms, this can be done using the owner reference field in a resource, a
sort of backwards recursive search. When a PipelineRun
is deleted, all the TaskRuns
associated with the PipelineRun
, and Pods
associated with those TaskRuns
, are
deleted. This is done using the owner references.
KubeArchive can do things similarly. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of log file links. This generic approach should work for any resource.
This also implies that KubeArchive is configured correctly to support this. It must
be configured so that the initial resource and any dependent resources, all the way
down to and including the Pods
, are archived.
Here’s a sample KubeArchiveConfig
as an example:
---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- deleteWhen: has(status.completionTime)
selector:
apiVersion: ""
kind: CronJob
- archiveOnDelete: true
selector:
apiVersion: ""
kind: Pod
So in this case, the CronJob
is configured to be archived and deleted when
the status contains a "completionTime" key. When that deletion happens,
kubernetes will turn around and delete the associated Pod
. Since we have
configured archiveOnDelete
for Pods
to be true, KubeArchive will archive
the Pod
itself and generate the URLs for all the associated logs. The
configuration would be similar for PipelineRuns
, with the addition of
the archiving of the TaskRuns
.
|
Here’s another sample KubeArchiveConfig
for PipelineRuns
:
---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- selector:
apiVersion: tekton.dev/v1
kind: PipelineRun
deleteWhen: has(status.completionTime)
- selector:
apiVersion: tekton.dev/v1
kind: TaskRun
archiveOnDelete: true
- selector:
apiVersion: v1
kind: Pod
archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])
In this case the following happens:
-
PipelineRuns
are archived when they complete. -
TaskRuns
are archived when they are deleted. -
Pods
are archived when they are deleted and are also part of a TektonPipeline
.
Configuration
The logging system is an integration in KubeArchive.
The integration requires the successful generation of a URL to access the log for a
specific Pod
in the logging system and the credentials
to access that logging system. URLs will ONLY be generated for Pods
.
ConfigMap
To support multiple logging systems, the URLs must be able to be parameterized
based on the logging system. This is done via a ConfigMap
named
kubearchive-logging
.
The ConfigMap contains entries that are used to generate logging URLs.
The only required key in this ConfigMap is LOG_URL
.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kubearchive-logging
namespace: kubearchive
data:
# CONTAINER_NAME: "cel:spec.containers.map(m, m.name)" (1)
POD_ID: "cel:metadata.uid" (2)
LOG_URL: "https://splunk-single-standalone-service.splunk-operator.svc.cluster.local:8089/services/search/jobs/export?search=search%20%2A%20%7C%20spath%20%22kubernetes.pod_id%22%20%7C%20search%20%22kubernetes.pod_id%22%3D%22{POD_ID}%22%20%7C%20spath%20%22kubernetes.container_name%22%20%7C%20search%20%22kubernetes.container_name%22%3D%22{CONTAINER_NAME}%22%20%7C%20sort%20time%20%7C%20table%20%22message%22&output_mode=json" (3)
LOG_URL_JSONPATH: "$.hits.hits[*]._source.message" (4)
1 | The CONTAINER_NAME parameter is provided at URL generation time by KubeArchive. |
2 | The string to replace POD_ID inside the LOG_URL template.
If it’s prefixed with cel: , it is considered as a CEL expression to locate its value
within the body of the Cloud Event where the resource is stored. |
3 | The template for the log URL. The CONTAINER_NAME variable is allowed
even if it’s not defined in the ConfigMap as it’s provided at URL generation time. |
4 | Optional JSONPath Expression applied by the API Server to the output of the response body. |
Secret
The credentials to authorize the access to the logging backend API are stored in a Secret
named kubearhive-logging
with the mandatory keys USER
and PASSWORD
:
---
apiVersion: v1
kind: Secret
metadata:
name: kubearchive-logging
namespace: kubearchive
type: Opaque
stringData: (1)
USER: user
PASSWORD: password # notsecret
1 | The user and password used for HTTP Basic Access Authentication |
Implementation
Database
The KubeArchive database will have a table named log_url
with three fields:
-
A
uuid
field which is a foreign key toresource.uuid
. -
A
url
field which is the URL for one of the logs. -
A
container_name
field which indicates the container that generated the log.
The uuid
field should point back to a Pod
entry in the resource
table.
Sink
When the sink archives a Pod
, it must take the additional step go gather all
the log information and generate the log URL for each. These are stored in the
log_url
table.
The sink should first delete any existing entries in the log_url
table for the
Pod
being archived. Earlier archival requests may have already created records
in the log_url
table, and they should be removed to avoid duplicates.
The sink will mount and use the kubearchive-logging
ConfigMap
for logging.
When generating the logging URL to be stored when a Pod
is archived,
the sink does the following steps:
-
A map is created and populated all non-CEL expression key-value pairs from the
ConfigMap
. -
The key
CONTAINER_NAME
with the valuecel:spec.containers.map(m, m.name)
is added to the map. If theConfigMap
contained the keyCONTAINER_NAME
, its value is overwritten -
All variables containing CEL expression variables are added to the map, and the value for each of these variables is the value returned by evaluating the CEL expression.
-
The value for LOG_URL is then interpolated recursively using this map until no more substitutions are done, resulting in the final URL to the log in the logging system.
API
The API mounts and use the kubearchive-logging
Secret
to authenticate
against the backend logging API and the ConfigMap
to retrieve the optional JSON_PATH
key.
The API exposes the logs under the endpoints:
-
/:version/namespaces/:namespace/:resourceType/:name/log
for core resources -
/:group/:version/namespaces/:namespace/:resourceType/:name/log
for non core resources
The API traverses the owner references to gather all the logs associated
with the given resource. Note that logs could be queried for any resource. Most
resource will not have any logs associated with them or their descendents, but queries on
logs for TaskRuns
and Pods
are possible.
The API perform post-processing on the output from the response to log URL in order to get the actual log output if a JSONPath expression is configured to be applied to the response body.
CLI
The CLI implements a logs
command similar to kubectl logs
.
ka logs resource name
This command will return the log contents for the default container in the Pod
unless
specified with the -c
option.
ka logs PipelineRun generate-logs-9fkp8 -n generate-logs-pipelines -c generate
This will return the log URLs associated with the PipelineRun
named "generate-logs-9fkp8"
for the container "generate".