Fluentd not tailing pod logs (Solved)
I have an AKS cluster with a number of pods, and I want to aggregate the logs from all of the pods to a third-party service (in my case, Papertrail) using Fluentd. So I signed up with Papertrail and followed their guide to integrate a Fluentd daemonset into the cluster and start ingesting logs into my Papertrail account.
But I had a couple of problems:
- None of the pod logs were appearing in Papertrail :(
- The Fluentd pods would crash and restart after a few minutes
This post is how I solved my problem and the diagnostic steps I took.
Table of contents
AKS cluster
My AKS cluster is using version 1.20.2 of Kubernetes. This uses containerd as the container runtime.
Using Papertrail’s Fluentd image
On Papertrail’s documentation page under the Log from all containers and systems section they provide a link to Docker image which is intended to get you up and running with minimal effort. They also provide a link to a Daemonset manifest file.
Following their steps and configuring my Papertrail host and port details, I was able to see some logs appear in Papertrail:
Apr 17 10:11:43 mycluster-aks-agentpool-31130745-vmss000001 kubelet I0417 02:11:42.563823 4782 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_kube-proxy-pqpxg_829a536d-24bd-4603-91b8-a0d4a9492397/kube-proxy/0.log"
Apr 17 10:15:33 mycluster-aks-agentpool-31130745-vmss000000 kubelet I0417 02:15:32.606277 4607 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_kube-proxy-rcph5_33226e53-894c-4153-8f05-2684d7af9043/kube-proxy/0.log"
Apr 17 10:15:34 mycluster-aks-agentpool-31130745-vmss000001 kubelet I0417 02:15:33.000796 4782 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_kube-proxy-pqpxg_829a536d-24bd-4603-91b8-a0d4a9492397/kube-proxy/0.log"
Apr 17 10:19:40 mycluster-aks-agentpool-31130745-vmss000000 kubelet I0417 02:19:39.224809 4607 logs.go:317] Finish parsing log file "/var/log/pods/kube-system_kube-proxy-rcph5_33226e53-894c-4153-8f05-2684d7af9043/kube-proxy/0.log"
But nothing from my pods was being logged. Furthermore, after 5 minutes or so the Fluent pods would crash and restart.
Checking resource usage of pods
I suspected the crashes may have had something to do with the Fluentd pod exceeding memory limits. Running kubectl top pods -n core
showed this:
C:\Users\ryan>kubectl top pods -n core
NAME CPU(cores) MEMORY(bytes)
fluentd-papertrail-tc2fq 100m 106Mi
fluentd-papertrail-tmjpt 97m 108Mi
Memory looks fine, but the CPU is right at the limits I had defined in the manifest of 100m
. Looks like it’s doing way more work than it should be.
Backslashes in Fluentd logs
Checking the logs from the Fluentd pods showed a lot of messages composed of a large number of backslashes:
2021-04-10 08:48:00 +0000 [warn]: #0 pattern not match: "2021-04-10T08:47:57.316098541Z stdout F \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ etc...
Googling this indicates that it’s likely caused by Fluentd tailing its own logs. Checking the Fluentd logs right after a pod restart confirmed this:
2021-04-10 08:47:56 +0000 [info]: #0 following tail of /var/log/containers/fluentd-papertrail-7dzdt_core_fluentd-57857a9c614ff5a34be5a910af6ab0191337f5480f4bcfeba6c5650405d62944.log
Configuring Fluentd to ignore its own logs
To ignore Fluentd from tailing its own logs, I learned about the environment variable <code>FLUENT_CONTAINER_TAIL_EXCLUDE_PATH</code> which is used to ignore certain container logs. So I added this to my daemonset manifest under the env
section:
- name: FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
value: /var/log/containers/fluent*
On redeploying to the cluster, nothing had changed. Fluentd was still tailing logs with a bunch of backslashes. Then I noticed this note in the Fluentd Github regarding the FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
environment variable:
Since v1.9.3 or later images.
Suspecting the Papertrail image might have been created using an earlier version of Fluentd, I changed the Daemonset manifest to use the latest image from the Fluentd project:
image: fluent/fluentd-kubernetes-daemonset:v1.11-debian-papertrail-1
After deploying, things were much quieter! No more logs with backslashes.
However there were still no logs from my pods coming through in Papertrail, and I started seeing entries like this in the Fluentd logs:
2021-04-17 07:05:41 +0000 [warn]: #0 [in_tail_container_logs] pattern not matched: "2021-04-17T07:05:40.347800353Z stdout F 100.101.102.103 - - [17/Apr/2021:07:05:40 +0000] \"GET / HTTP/2.0\" 200 316 \"https://mywebapp.com\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0\" 59 0.137 [core-mywebapp-api-80] [] 10.240.0.26:3001 316 0.136 200 34d669e9-cd7b-4d2b-8cd4-e1045930c917"
This is a log entry from a pod for my web application, so it’s definitely tailing the logs now but for some reason is not matching the log format.
Configuring container tail parser type
Whilst I was googling how to prevent Fluentd from tailing its own logs, I had come across a post that mentioned the environment variable FLUENT_CONTAINER_TAIL_PARSER_TYPE
. The author also mentioned they were using containerd instead of docker, and containerd is the runtime used by AKS in versions >= 1.20…🤔
After a bit of research it turns out we need to tell Fluentd how to parse containerd logs, and one way is to add this to the Daemonset manifest in the env
section:
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
After deploying this latest change, Papertrail logs started appearing for my pods:
Running top to see the resource usage:
C:\Users\ryan>kubectl top pods -n core
NAME CPU(cores) MEMORY(bytes)
fluentd-papertrail-lq4vx 6m 109Mi
fluentd-papertrail-mn8qm 4m 110Mi
That’s looking much healthier, CPU usage is way down from the 100m it was previously. I haven’t seen any crashes either, so hopefuly that issue has been resolved but I’ll continue to monitor to look out for pod crashes in the future.
Summary
- If you see a bunch of backslashes in the Fluentd logs, it’s tailing its own logs so you will want to suppress that. I used the
FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
environment variable to solve that. - If your cluster is running containerd as the container runtime, you will need to tell Fluentd to recognise the log format. I used the
FLUENT_CONTAINER_TAIL_PARSER_TYPE
environment variable to solve that.