2026-03-20 22:42:20 -04:00
2026-03-20 22:42:20 -04:00
2026-03-20 22:42:20 -04:00
2026-03-20 22:42:20 -04:00
2026-03-20 20:17:47 -04:00

Infrastructure Monitoring & Edge Routing Stack

Overview

This repository provides a standardized, portable Docker-based stack for:

  • Host-level monitoring
  • Metrics collection and visualization
  • Service uptime tracking
  • Reverse proxy routing for internal services

It is designed to be deployed consistently across multiple nodes, with minimal per-host customization via environment variables.


Core Components

Component Purpose
Traefik Edge router and reverse proxy with automatic TLS
Prometheus Metrics collection and storage
Grafana Metrics visualization and dashboards
Node Exporter Host-level metrics (CPU, memory, disk, etc.)
Uptime Kuma Service uptime monitoring and alerting

Architecture Summary

  • Each host runs:
    • A local monitoring stack
    • A Traefik instance for routing
  • Services are exposed internally and routed via Traefik using host-based rules
  • TLS certificates are automatically managed via Cloudflare DNS
  • Configuration is environment-driven for portability

Repository Structure

infra/ docker/ monitoring/ docker-compose.yml prometheus.yml traefik/ docker-compose.yml traefik.yml middlewares.yml env/ example.env prod/ node1.env node2.env scripts/ bootstrap.sh Makefile


Prerequisites

  • Docker + Docker Compose

  • A configured external Docker network: docker network create frontend

  • Cloudflare API token (for TLS certificate provisioning)

  • Basic understanding of Docker and reverse proxies


Configuration

All configuration is driven via .env files.

Setup

  1. Copy the example environment file:
cp env/example.env .env
  1. Modify values as needed:
  • Domains (e.g., grafana.vpn.savant.io)
  • Ports
  • Credentials
  • File paths

Deployment

Start Traefik
docker compose -f docker/traefik/docker-compose.yml up -d
Start Monitoring Stack
docker compose -f docker/monitoring/docker-compose.yml up -d

Access Points

Service URL
Traefik UI https://traefik.vpn.savant.io
Grafana https://grafana.vpn.savant.io
Prometheus http://:9090
Uptime Kuma http://:3001

Customization

Adding a New Service Behind Traefik

Add labels to any container:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.myapp.rule=Host(`myapp.example.com`)"
  - "traefik.http.routers.myapp.entrypoints=websecure"
  - "traefik.http.routers.myapp.tls=true"
  - "traefik.http.services.myapp.loadbalancer.server.port=3000"

Per-Node Configuration

Each node can override:

  • Domain names
  • Ports
  • Credentials
  • File paths

Using its own .env file:

env/prod/nodeX.env

Operational Notes

  • Do not expose Grafana or Traefik publicly without authentication
  • Change default credentials immediately
  • Ensure acme.json has correct permissions:
chmod 600 /opt/traefik/acme.json
  • node-exporter runs in host mode and requires elevated visibility into the system

Scaling Strategy

For multiple nodes:

  • Maintain a single Git repository
  • Use per-node .env files
  • Deploy via:
    • Manual git pull + docker compose
    • Or automation tools (e.g., Ansible)

Future Improvements

  • Centralized logging (Loki / ELK)
  • Alerting integration (Alertmanager)
  • SSO / authentication layer (e.g., OAuth / Authelia)
  • GitOps-based deployment model

Quick Start (TL;DR)

git clone <repo>
cd infra

cp env/example.env .env
# edit .env
vim .env

docker network create frontend

docker compose -f docker/traefik/docker-compose.yml up -d
docker compose -f docker/monitoring/docker-compose.yml up -d

Purpose

This project aims to provide:

  • A repeatable infrastructure baseline
  • Immediate visibility into host and service health
  • A clean entry point for expanding internal services

It is intentionally minimal, composable, and environment-driven.

Description
Code/services we expect to exist on all nodes managed by Savant Coding
Readme 28 KiB