Logging Integration
Overview
KubeArchive needs to support logging, but it is not a logging system itself and does not implement logging. Instead, KubeArchive will integrate with logging systems and provide URLs for retrieving log files from the logging system for a specific Kubernetes resource.
It is important to note that logs are tied to Pods
. When a user requests the logs
for a Tekton PipelineRun
, what they expect to get back are the logs attached to the
Pods
that were part of the PipelineRun
. Similar cases exist for requesting logs for
Jobs
and CronJobs
. KubeArchive has to be able to handle this seamlessly for the user.
Retrieving Log Information
In generic terms, this can be done using the owner reference field in a resource, a
sort of backwards recursive search. When a PipelineRun
is deleted, all the TaskRuns
associated with the PipelineRun
, and Pods
associated with those TaskRuns
, are
deleted. This is done using the owner references.
KubeArchive can do things similarly. When logs for a resource are requested, a query is made to find all the resources that have that initial resource as an owner. Then each resource returned is processed similarly, eventually building up a list of log file links. This generic approach should work for any resource.
This also implies that KubeArchive is configured correctly to support this. It must
be configured so that the initial resource and any dependent resources, all the way
down to and including the Pods
, are archived.
Here’s a sample KubeArchiveConfig
as an example:
---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- deleteWhen: has(status.completionTime)
selector:
apiVersion: ""
kind: CronJob
- archiveOnDelete: true
selector:
apiVersion: ""
kind: Pod
So in this case, the CronJob
is configured to be archived and deleted when
the status contains a "completionTime" key. When that deletion happens,
kubernetes will turn around and delete the associated Pod
. Since we have
configured archiveOnDelete
for Pods
to be true, KubeArchive will archive
the Pod
itself and generate the URLs for all the associated logs. The
configuration would be similar for PipelineRuns
, with the addition of
the archiving of the TaskRuns
.
|
Here’s another sample KubeArchiveConfig
for PipelineRuns
:
---
apiVersion: kubearchive.kubearchive.org/v1alpha1
kind: KubeArchiveConfig
metadata:
name: kubearchive
namespace: test
spec:
resources:
- selector:
apiVersion: tekton.dev/v1
kind: PipelineRun
deleteWhen: has(status.completionTime)
- selector:
apiVersion: tekton.dev/v1
kind: TaskRun
archiveOnDelete: true
- selector:
apiVersion: v1
kind: Pod
archiveOnDelete: has(body.metadata.labels["tekton.dev/pipeline"])
In this case the following happens:
-
PipelineRuns
are archived when they complete. -
TaskRuns
are archived when they are deleted. -
Pods
are archived when they are deleted and are also part of a TektonPipeline
.
Generating Log URLs
The logging system is an integration in KubeArchive. All that is required for
the integration is the successful generation of a URL to access the log for a
specific Pod
in the logging system. URLs will ONLY be generated for Pods
.
To support multiple logging systems, the URLs must be able to be parameterized
based on the logging system. This is done via a ConfigMap
named
kubearchive-logging
. This ConfigMap
requires a single entry,
LOG_URL, whose string value will be interpolated using the other variables
defined in the ConfigMap
. For example, take this ConfigMap
for Splunk:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kubearchive-logging
namespace: kubearchive
data:
# NOTE: CONTAINER_NAME is provided at URL generation time by KubeArchive.
POD_ID: "cel:metadata.uid"
LOG_URL: "https://localhost:8089/services/search/jobs/export?search=search%20%2A%20%7C%20spath%20%22kubernetes.pod_id%22%20%7C%20search%20%22kubernetes.pod_id%22%3D%22{POD_ID}%22%20%7C%20spath%20%22kubernetes.container_name%22%20%7C%20search%20%22kubernetes.container_name%22%3D%22{CONTAINER_NAME}%22%20%7C%20sort%20time%20%7C%20table%20%22message%22&output_mode=json"
LOG_URL_JSONPATH: "$.hits.hits[*]._source.message"
The value of each variable is either a string or a CEL expression. A value that begins with the prefix "cel:" will be evaluated as a CEL expression against the body of the cloud event (that is the resource) to determine the real value of that variable used in the substitution. For example:
So when generating the logging URL to be stored when a Pod
is archived,
the following steps are done:
-
A map is created and populated all non-CEL expression key-value pairs from the
ConfigMap
. -
The key
CONTAINER_NAME
with the valuecel:spec.containers.map(m, m.name)
is added to the map. If theConfigMap
contained the keyCONTAINER_NAME
, its value is overwritten -
All variables containing CEL expression variables are added to the map, and the value for each of these variables is the value returned by evaluating the CEL expression.
-
The value for LOG_URL is then interpolated recursively using this map until no more substitutions are done, resulting in the final URL to the log in the logging system.
If the |
Implementation
Database
The KubeArchive database will have a table named log_url
with three fields:
-
A
uuid
field which is a foreign key toresource.uuid
. -
A
url
field which is the URL for one of the logs. -
A
container_name
field which indicates the container that generated the log.
The uuid
field should point back to a Pod
entry in the resource
table.
Sink
When the sink archives a Pod
, it must take the additional step go gather all
the log information and generate the log URL for each. These are stored in the
log_url
table.
The sink should first delete any existing entries in the log_url
table for the
Pod
being archived. Earlier archival requests may have already created records
in the log_url
table, and they should be removed to avoid duplicates.
The sink will mount and use the kubearchive-logging
ConfigMap
for logging. Additionally, the sink will overwrite the ConfigMap
value for CONTAINER_NAME
as described in the Generating Log URLs section.
CLI
The CLI will implement a logs
command similar to kubectl logs
.
ka logs resource name
This command will return the log contents for the the default container in the Pod
.
example:
ka logs PipelineRun generate-logs-9fkp8 -n generate-logs-pipelines -c generate
This will return the log URLs associated with the PipelineRun
named "generate-logs-9fkp8"
for the container "generate".
The CLI will have to traverse the owner references to gather all of the logs associated
with the given resource. Note that logs could be queried for any resource. Most
resource will not have any logs associated with them or their descendents, but queries on
logs for TaskRuns
and Pods
are possible.
The CLI will perform post-processing on the output from the response to
log URL in order to get the actual log output. This post-processing is a JSONPath expression
to be applied to the response body. This post-processing only needs to be done if the
pre-defined entry LOG_URL_JSONPATH
is contained in the logging ConfigMap
.