first commit
This commit is contained in:
179
README.md
Normal file
179
README.md
Normal file
@ -0,0 +1,179 @@
|
||||
# Infrastructure Monitoring & Edge Routing Stack
|
||||
|
||||
## Overview
|
||||
This repository provides a standardized, portable Docker-based stack for:
|
||||
|
||||
- Host-level monitoring
|
||||
- Metrics collection and visualization
|
||||
- Service uptime tracking
|
||||
- Reverse proxy routing for internal services
|
||||
|
||||
It is designed to be deployed consistently across multiple nodes, with minimal per-host customization via environment variables.
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
|
||||
| Component | Purpose |
|
||||
|-----------------|--------|
|
||||
| Traefik | Edge router and reverse proxy with automatic TLS |
|
||||
| Prometheus | Metrics collection and storage |
|
||||
| Grafana | Metrics visualization and dashboards |
|
||||
| Node Exporter | Host-level metrics (CPU, memory, disk, etc.) |
|
||||
| Uptime Kuma | Service uptime monitoring and alerting |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Summary
|
||||
|
||||
- Each host runs:
|
||||
- A local monitoring stack
|
||||
- A Traefik instance for routing
|
||||
- Services are exposed internally and routed via Traefik using host-based rules
|
||||
- TLS certificates are automatically managed via Cloudflare DNS
|
||||
- Configuration is environment-driven for portability
|
||||
|
||||
---
|
||||
|
||||
## Repository Structure
|
||||
|
||||
infra/
|
||||
docker/
|
||||
monitoring/
|
||||
docker-compose.yml
|
||||
prometheus.yml
|
||||
traefik/
|
||||
docker-compose.yml
|
||||
traefik.yml
|
||||
middlewares.yml
|
||||
env/
|
||||
example.env
|
||||
prod/
|
||||
node1.env
|
||||
node2.env
|
||||
scripts/
|
||||
bootstrap.sh
|
||||
Makefile
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `Docker + Docker` Compose
|
||||
- A configured external Docker network:
|
||||
`docker network create frontend`
|
||||
|
||||
- Cloudflare API token (for TLS certificate provisioning)
|
||||
- Basic understanding of Docker and reverse proxies
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
All configuration is driven via `.env` files.
|
||||
|
||||
### Setup
|
||||
|
||||
1. Copy the example environment file:
|
||||
```bash
|
||||
cp env/example.env .env
|
||||
```
|
||||
2. Modify values as needed:
|
||||
- Domains (e.g., ```grafana.vpn.savant.io```)
|
||||
- Ports
|
||||
- Credentials
|
||||
- File paths
|
||||
|
||||
### Deployment
|
||||
##### Start Traefik
|
||||
```bash
|
||||
docker compose -f docker/traefik/docker-compose.yml up -d
|
||||
```
|
||||
##### Start Monitoring Stack
|
||||
```bash
|
||||
docker compose -f docker/monitoring/docker-compose.yml up -d
|
||||
```
|
||||
### Access Points
|
||||
|
||||
| Service | URL |
|
||||
|-------------|-------------------------------|
|
||||
| Traefik UI | https://traefik.vpn.savant.io |
|
||||
| Grafana | https://grafana.vpn.savant.io |
|
||||
| Prometheus | http://<host>:9090 |
|
||||
| Uptime Kuma | http://<host>:3001 |
|
||||
|
||||
### Customization
|
||||
|
||||
#### Adding a New Service Behind Traefik
|
||||
|
||||
Add labels to any container:
|
||||
```
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.myapp.rule=Host(`myapp.example.com`)"
|
||||
- "traefik.http.routers.myapp.entrypoints=websecure"
|
||||
- "traefik.http.routers.myapp.tls=true"
|
||||
- "traefik.http.services.myapp.loadbalancer.server.port=3000"
|
||||
```
|
||||
|
||||
#### Per-Node Configuration
|
||||
Each node can override:
|
||||
|
||||
- Domain names
|
||||
- Ports
|
||||
- Credentials
|
||||
- File paths
|
||||
|
||||
Using its own `.env` file:
|
||||
```bash
|
||||
env/prod/nodeX.env
|
||||
```
|
||||
|
||||
#### Operational Notes
|
||||
- Do not expose Grafana or Traefik publicly without authentication
|
||||
- Change default credentials immediately
|
||||
- Ensure `acme.json` has correct permissions:
|
||||
```bash
|
||||
chmod 600 /opt/traefik/acme.json
|
||||
```
|
||||
- `node-exporter` runs in host mode and requires elevated visibility into the system
|
||||
|
||||
#### Scaling Strategy
|
||||
For multiple nodes:
|
||||
|
||||
- Maintain a single Git repository
|
||||
- Use per-node .env files
|
||||
- Deploy via:
|
||||
- Manual git pull + docker compose
|
||||
- Or automation tools (e.g., Ansible)
|
||||
|
||||
### Future Improvements
|
||||
- Centralized logging (Loki / ELK)
|
||||
- Alerting integration (Alertmanager)
|
||||
- SSO / authentication layer (e.g., OAuth / Authelia)
|
||||
- GitOps-based deployment model
|
||||
|
||||
### Quick Start (TL;DR)
|
||||
```bash
|
||||
git clone <repo>
|
||||
cd infra
|
||||
|
||||
cp env/example.env .env
|
||||
# edit .env
|
||||
vim .env
|
||||
|
||||
docker network create frontend
|
||||
|
||||
docker compose -f docker/traefik/docker-compose.yml up -d
|
||||
docker compose -f docker/monitoring/docker-compose.yml up -d
|
||||
```
|
||||
|
||||
### Purpose
|
||||
This project aims to provide:
|
||||
|
||||
- A repeatable infrastructure baseline
|
||||
- Immediate visibility into host and service health
|
||||
- A clean entry point for expanding internal services
|
||||
|
||||
It is intentionally minimal, composable, and environment-driven.
|
||||
Reference in New Issue
Block a user