Monitoring a Docker Homelab with Coroot

When running a home server consisting of one or more nodes, with some or all services in Docker you may find yourself wanting to monitor your environment, or even better, have full observability.

The often written possibility for this is a combination of Prometheus with Grafana. A solution that requires a lot of work on setting this up fully, and requires work on ones applications and detailed setup for full visibility. Another possibility is to use the free tire of NewRelic that has the advantage of remote insights on metrics and logs but requires work on containers or applications to have a more refined visibility on your services.

Monitoring with Beszel

First and simple runner up to make this possible is Beszel. Beszel can be run as a local service or in docker and consists of a web frontend and and an agent that can be used on multiple systems and even supports Windows and MacOS. Installation is an easy job in docker and once it's running there is insightful information with system metrics and docker services and even some logs.

Observability with Coroot

My personal choice on monitoring a home server system is Coroot. In my current setup with a Rocky Linux 9.X system Coroot runs on a Clickhouse server to store metrics, logs, traces and profiles, the Coroot service, a node-agent and cluster-agent service. The node agent automatically collects all metrics and logs of the present services thru eBPF and the cluster agent is needed if one wants detailed information on databases like MySQL, Postgres or Redis.

AI Root cause analyses

Another advantage one has with Coroot is the use of AI Root cause analyses that can provide helpful insights while investigating incidents. With a Coroot Cloud account you have ten helpful analyses for free each month. Even without AI, the data presented with Coroot with standard alerts based on best practices on metrics is pretty insightful and helps to make your setup even better.

Clichouse as a local service

Because of the control I want over Clickhouse it is running as a local service for better convenience. The control is because of scaling down memory usage of Clickhouse, scaling down logging on disk and the database, and more easy changes on the data. The only downside is updating Clickhouse manually with yum/dnf.

The Coroot services run in docker thru a docker-compose file. In a normal setup Prometheus is required, in this setup Clickhouse is used as a supported alternative.

Installing Clickhouse

Installing Clickhouse is easily done by adding the repo, installing Clickhouse, making some adjustments before starting it up.

sudo dnf install -y yum-utils
sudo dnf-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo

sudo dnf install -y clickhouse-server clickhouse-client

Before staring the service create a file: /etc/clickhouse-server/config.d/z_log_disable.xml and put the following contend in the file:

<?xml version="1.0"?>
<clickhouse>
<asynchronous_metric_log remove="1"/>
<metric_log remove="1"/>
<latency_log remove="1"/>
<query_thread_log remove="1" />
<query_log remove="1" />
<query_views_log remove="1" />
<part_log remove="1"/>
<session_log remove="1"/>
<text_log remove="1" />
<trace_log remove="1"/>
<crash_log remove="1"/>
<opentelemetry_span_log remove="1"/>
<zookeeper_log remove="1"/>
</clickhouse>

After this adjust cache sizes in /etc/clickhouse-server/config.xml:

<mark_cache_size>268435456</mark_cache_size>

<index_mark_cache_size>67108864</index_mark_cache_size>

<uncompressed_cache_size>16777216</uncompressed_cache_size>

Adjust memory usage ratio in /etc/clickhouse-server/config.xml:

<max_server_memory_usage_to_ram_ratio>0.7</max_server_memory_usage_to_ram_ratio>

Decreese the tread pool size in /etc/clickhouse-server/config.xml:

<max_thread_pool_size>2000</max_thread_pool_size>

Decreese max_connections & concurrent queries in /etc/clickhouse-server/config.xml:

<max_connections>512</max_connections

<max_concurrent_queries>128</max_concurrent_queries>

And stating things up:

sudo systemctl deamon-reload
sudo systemctl enable clickhouse-server
sudo systemctl start clickhouse-server

Installing Coroot

Before installing Coroot take a look if the requirements are met. This is at least kernel 5.1 although 4.2 is also supported. This installation is different from the original docker-compose file. Prometheus is not used in this setup, and Clickhouse runs as a local service. Another distinction is the retention of the data, that's normally seven days for traces, logs, profiles and metrics and Coroot having an own local cache for metrics for 30 days. In this setup the data retention stored in Clickhouse is set up for 14 days. With eighteen local and docker services the amount of data kept for all of this is 3GB on average in my system.

Coroot, the node-agent and cluster-agent run as a docker service with docker-compose that you have to create locally with the following content in docker-compose.yaml that you have to create locally.

name: coroot

volumes:

node_agent_data: {}

cluster_agent_data: {}

coroot_data: {}

services:

coroot:

restart: always

image: ghcr.io/coroot/coroot${LICENSE_KEY:+-ee} # set 'coroot-ee' as the image if LICENSE_KEY is defined

pull_policy: always

user: root

volumes:

- coroot_data:/data

ports:

- 8080:8080

command:

- '--data-dir=/data'

- '--bootstrap-refresh-interval=15s'

- '--bootstrap-clickhouse-address=127.0.0.1:9000'

- '--bootstrap-prometheus-url=http://127.0.0.1:9090'

- '--global-prometheus-use-clickhouse'

- '--global-prometheus-url=http://127.0.0.1:9090'

- '--global-refresh-interval=15s'

- '--cache-ttl=31d'

- '--traces-ttl=21d'

- '--logs-ttl=21d'

- '--profiles-ttl=21d'

- '--metrics-ttl=21d'

environment:

- LICENSE_KEY=${LICENSE_KEY:-}

- GLOBAL_PROMETHEUS_USE_CLICKHOUSE

- CLICKHOUSE_SPACE_MANAGER_USAGE_THRESHOLD=75 # Set cleanup threshold to 75%

- CLICKHOUSE_SPACE_MANAGER_MIN_PARTITIONS=2 # Always keep at least 2 partitions

network_mode: host

node-agent:

restart: always

image: ghcr.io/coroot/coroot-node-agent

pull_policy: always

privileged: true

pid: "host"

volumes:

- /sys/kernel/tracing:/sys/kernel/tracing

- /sys/kernel/debug:/sys/kernel/debug

- /sys/fs/cgroup:/host/sys/fs/cgroup

- node_agent_data:/data

command:

- '--collector-endpoint=http://192.168.1.160:8080'

- '--cgroupfs-root=/host/sys/fs/cgroup'

- '--wal-dir=/data'

cluster-agent:

restart: always

image: ghcr.io/coroot/coroot-cluster-agent

pull_policy: always

volumes:

- cluster_agent_data:/data

command:

- '--coroot-url=http://192.168.1.160:8080'

- '--metrics-scrape-interval=15s'

- '--metrics-wal-dir=/data'

depends_on:

- coroot

After creating this file, and making some adjustments to your own likings and network preferences do a docker compose up -d and go to your IP address on port 8080 and you have acces to Coroot where is askes you to give in the admin credentials.

In my setup Watchtower takes care of updating docker containers and this works well for the Coroot services.

Happy observability :-)

sidenote:

There are already some helpful hints and pointers present within Coroot for setting things up. In my case this was with observing a Postgress database. Do not forget to use the given commands as the admin/postgres user to make it work :-)

Documentation on Coroot: docs.coroot.com/

Writings about IT adventures and life

Zoeken in deze blog