first commit
This commit is contained in:
179
README.md
Normal file
179
README.md
Normal file
@ -0,0 +1,179 @@
|
|||||||
|
# Infrastructure Monitoring & Edge Routing Stack
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
This repository provides a standardized, portable Docker-based stack for:
|
||||||
|
|
||||||
|
- Host-level monitoring
|
||||||
|
- Metrics collection and visualization
|
||||||
|
- Service uptime tracking
|
||||||
|
- Reverse proxy routing for internal services
|
||||||
|
|
||||||
|
It is designed to be deployed consistently across multiple nodes, with minimal per-host customization via environment variables.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Components
|
||||||
|
|
||||||
|
| Component | Purpose |
|
||||||
|
|-----------------|--------|
|
||||||
|
| Traefik | Edge router and reverse proxy with automatic TLS |
|
||||||
|
| Prometheus | Metrics collection and storage |
|
||||||
|
| Grafana | Metrics visualization and dashboards |
|
||||||
|
| Node Exporter | Host-level metrics (CPU, memory, disk, etc.) |
|
||||||
|
| Uptime Kuma | Service uptime monitoring and alerting |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Summary
|
||||||
|
|
||||||
|
- Each host runs:
|
||||||
|
- A local monitoring stack
|
||||||
|
- A Traefik instance for routing
|
||||||
|
- Services are exposed internally and routed via Traefik using host-based rules
|
||||||
|
- TLS certificates are automatically managed via Cloudflare DNS
|
||||||
|
- Configuration is environment-driven for portability
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
infra/
|
||||||
|
docker/
|
||||||
|
monitoring/
|
||||||
|
docker-compose.yml
|
||||||
|
prometheus.yml
|
||||||
|
traefik/
|
||||||
|
docker-compose.yml
|
||||||
|
traefik.yml
|
||||||
|
middlewares.yml
|
||||||
|
env/
|
||||||
|
example.env
|
||||||
|
prod/
|
||||||
|
node1.env
|
||||||
|
node2.env
|
||||||
|
scripts/
|
||||||
|
bootstrap.sh
|
||||||
|
Makefile
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- `Docker + Docker` Compose
|
||||||
|
- A configured external Docker network:
|
||||||
|
`docker network create frontend`
|
||||||
|
|
||||||
|
- Cloudflare API token (for TLS certificate provisioning)
|
||||||
|
- Basic understanding of Docker and reverse proxies
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
All configuration is driven via `.env` files.
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
1. Copy the example environment file:
|
||||||
|
```bash
|
||||||
|
cp env/example.env .env
|
||||||
|
```
|
||||||
|
2. Modify values as needed:
|
||||||
|
- Domains (e.g., ```grafana.vpn.savant.io```)
|
||||||
|
- Ports
|
||||||
|
- Credentials
|
||||||
|
- File paths
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
##### Start Traefik
|
||||||
|
```bash
|
||||||
|
docker compose -f docker/traefik/docker-compose.yml up -d
|
||||||
|
```
|
||||||
|
##### Start Monitoring Stack
|
||||||
|
```bash
|
||||||
|
docker compose -f docker/monitoring/docker-compose.yml up -d
|
||||||
|
```
|
||||||
|
### Access Points
|
||||||
|
|
||||||
|
| Service | URL |
|
||||||
|
|-------------|-------------------------------|
|
||||||
|
| Traefik UI | https://traefik.vpn.savant.io |
|
||||||
|
| Grafana | https://grafana.vpn.savant.io |
|
||||||
|
| Prometheus | http://<host>:9090 |
|
||||||
|
| Uptime Kuma | http://<host>:3001 |
|
||||||
|
|
||||||
|
### Customization
|
||||||
|
|
||||||
|
#### Adding a New Service Behind Traefik
|
||||||
|
|
||||||
|
Add labels to any container:
|
||||||
|
```
|
||||||
|
labels:
|
||||||
|
- "traefik.enable=true"
|
||||||
|
- "traefik.http.routers.myapp.rule=Host(`myapp.example.com`)"
|
||||||
|
- "traefik.http.routers.myapp.entrypoints=websecure"
|
||||||
|
- "traefik.http.routers.myapp.tls=true"
|
||||||
|
- "traefik.http.services.myapp.loadbalancer.server.port=3000"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Per-Node Configuration
|
||||||
|
Each node can override:
|
||||||
|
|
||||||
|
- Domain names
|
||||||
|
- Ports
|
||||||
|
- Credentials
|
||||||
|
- File paths
|
||||||
|
|
||||||
|
Using its own `.env` file:
|
||||||
|
```bash
|
||||||
|
env/prod/nodeX.env
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Operational Notes
|
||||||
|
- Do not expose Grafana or Traefik publicly without authentication
|
||||||
|
- Change default credentials immediately
|
||||||
|
- Ensure `acme.json` has correct permissions:
|
||||||
|
```bash
|
||||||
|
chmod 600 /opt/traefik/acme.json
|
||||||
|
```
|
||||||
|
- `node-exporter` runs in host mode and requires elevated visibility into the system
|
||||||
|
|
||||||
|
#### Scaling Strategy
|
||||||
|
For multiple nodes:
|
||||||
|
|
||||||
|
- Maintain a single Git repository
|
||||||
|
- Use per-node .env files
|
||||||
|
- Deploy via:
|
||||||
|
- Manual git pull + docker compose
|
||||||
|
- Or automation tools (e.g., Ansible)
|
||||||
|
|
||||||
|
### Future Improvements
|
||||||
|
- Centralized logging (Loki / ELK)
|
||||||
|
- Alerting integration (Alertmanager)
|
||||||
|
- SSO / authentication layer (e.g., OAuth / Authelia)
|
||||||
|
- GitOps-based deployment model
|
||||||
|
|
||||||
|
### Quick Start (TL;DR)
|
||||||
|
```bash
|
||||||
|
git clone <repo>
|
||||||
|
cd infra
|
||||||
|
|
||||||
|
cp env/example.env .env
|
||||||
|
# edit .env
|
||||||
|
vim .env
|
||||||
|
|
||||||
|
docker network create frontend
|
||||||
|
|
||||||
|
docker compose -f docker/traefik/docker-compose.yml up -d
|
||||||
|
docker compose -f docker/monitoring/docker-compose.yml up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Purpose
|
||||||
|
This project aims to provide:
|
||||||
|
|
||||||
|
- A repeatable infrastructure baseline
|
||||||
|
- Immediate visibility into host and service health
|
||||||
|
- A clean entry point for expanding internal services
|
||||||
|
|
||||||
|
It is intentionally minimal, composable, and environment-driven.
|
||||||
Reference in New Issue
Block a user