Hello, community! Today I’m going to share a project I developed to create a complete and isolated monitoring lab with Zabbix, running on a Docker Swarm cluster inside a local virtual machine. Using Vagrant for VM provisioning and Ansible for automation, the environment includes Traefik as a reverse proxy with TLS, PostgreSQL with TimescaleDB, and Grafana for advanced metrics visualization. This setup is ideal for testing, learning, and developing monitoring solutions without relying on external infrastructure.
The lab simulates a full monitoring environment, allowing exploration from data collection to dashboard presentation, all within a controlled and easily reproducible setup. Let’s dive into the technical details and the role of each component.
Stack e Tecnologias
The Lab integrates several open-source technologies, each with a specific role in the laboratory's architecture. Here, I briefly explain each component and its contribution:
Vagrant + VirtualBox
Vagrant is an infrastructure-as-code (IaC) tool that automates the creation and configuration of virtual machines. In the lab, it provisions a VM on VirtualBox, allocating resources such as 2 CPUs and 4 GB of RAM (configurable in settings.yaml). Its role is to isolate the monitoring environment from the local host, ensuring consistency and easy teardown. VirtualBox acts as the hypervisor, running the VM with exposed ports for external access (e.g., 80 and 443 for web, 10050–10052 for Zabbix).
Ansible
A YAML-based automation framework responsible for all post-VM provisioning. In the lab, Ansible installs Docker, configures the Swarm, deploys stacks via Docker Compose, and generates self-signed TLS certificates. It ensures idempotency executing only the necessary changes making deployments reliable and repeatable. Playbooks such as deploy-full-zabbix.yml orchestrate tasks in parallel, reducing setup time.
Docker Swarm
Swarm orchestrates multiple containers in a lightweight cluster. Here, Swarm manages the Zabbix stack as replicable services, with internal load balancing and automatic service discovery. This allows components (e.g., proxies) to scale dynamically, simulating a distributed environment. Persistent volumes (/data/zabbix-prod) ensure data survives restarts.
Zabbix (Server, Frontend, Proxies)
The core of the monitoring stack. The server collects and processes metrics via agents or proxies; the frontend provides a web interface for configuration and visualization; proxies distribute load and monitor remote networks (configurable via proxy_count).
PostgreSQL + TimescaleDB
A relational database with a time-series extension. PostgreSQL stores Zabbix configuration and metadata, while TimescaleDB optimizes inserts and queries for historical metrics, compressing older data for efficiency.
Traefik
A modern reverse proxy with automatic service discovery via Docker labels. In the lab, it routes traffic to the Zabbix Frontend, Grafana, and APIs, applying TLS with self-signed certificates generated by Ansible. This simulates a secure ingress gateway with HTTPS redirection and load balancing.
Grafana
An open-source visualization platform integrated with Zabbix via a plugin. It creates interactive dashboards with historical metrics, enabling advanced analysis such as alerts and custom panels. In the lab, it complements Zabbix by providing a rich presentation layer for the collected data. This combination creates a cohesive ecosystem: Vagrant and Ansible provision the foundation, Docker Swarm orchestrates services, Zabbix handles monitoring, PostgreSQL/TimescaleDB stores data, Traefik secures and routes traffic, and Grafana delivers visualization.
How to Use (Quick Guide):
Implement the environment:
make deployStop the VM (maintain state):
make stop
Destroy everything (remove VM and data):
make destroy
Main Settings
Customize the lab by editing main.yml:
- zabbix_image_version and grafana_image_version: Control versions for compatibility (e.g.,
latestfor bleeding-edge). - zabbix_domain: Defines URLs (e.g.,
lab.local), impacting Traefik routing and certificates. - zabbix_stack_name: Swarm stack name, useful for isolation in multi-stack hosts.
- timescaledb_image: Optimized image for time-series data (e.g.,
timescale/timescaledb:latest-pg16). - proxy_count, proxy_base_port, proxy_hostname_prefix: Proxy scalability to simulate distributed environments.
- db_max_connections: Limits database connections to avoid overload during heavy testing.
- firewall_zone and docker_min_version: Ensure security and compatibility.
The VM IP is configured in settings.yaml via network.control_ip (default: 10.0.2.15).
Important Files
- Vagrantfile: Defines the VM (Ubuntu, resources, exposed ports such as 2377/tcp for Swarm).
- Scripts (
deploy.sh,stop.sh,destroy.sh): Encapsulate Vagrant and Ansible commands. - main.yml: Ansible variables for customization.
- docker-compose.prod.yaml.j2: Jinja2 template for the Zabbix stack, rendered with variables.
- docker-compose.traefik.yaml: Traefik static configuration with labels for service discovery.
Extras
- Automated TLS: Ansible generates certificates via OpenSSL and mounts them in Traefik for secure HTTPS.
- Idempotency and Efficiency: Deployment only recreates stacks when hash changes occur, using
force_hash_check. - Persistence: Docker volumes preserve database data and configurations, simulating a production environment.
- Scalability: Adjust
proxy_countto test load balancing.
Common Problems and Solutions
- Outdated Docker version: Update it via Ansible or set
docker_min_version. - Port conflicts: Check with
netstaton the host and adjustdocker_swarm_tcp_ports. - Firewall blocking traffic: Run
firewall-cmd --zone=public --add-port=...manually. - Database connection failure: Validate
dbzbx_prod.envfor correct credentials and Swarm network settings. - Traefik not resolving: Confirm the domain in
zabbix_domainand regenerate certificates.
Final Considerations
This lab is a powerful tool for deepening your knowledge in DevOps, monitoring, and containerization. Explore the code on GitHub, contribute, and share your experiences! If you have any questions, feel free to leave a comment.Final Considerations
Repository
Access to the repository: Zabbix LAB
This material is for educational purposes only and should not be used in production environments without the necessary adjustments and best practices.
Referências
- Zabbix Official Documentation
- Docker Swarm Guide
- Vagrant Documentation
- Ansible User Guide
- Traefik Documentation
- TimescaleDB Documentation
- Grafana Documentation
💡 Who am I?
I'm Gabriel Carmo, passionate about technology (especially Open Source). I have experience in Cloud, Kubernetes, OpenShift, Zabbix, Dynatrace and much more! Always exploring new technologies and sharing knowledge. 🚀
📬 Let's connect?
🔗 LinkedIn
🐙 GitHub
🦊 GitLab
🏅 Credly
📧 Contato: contato@gabrielandre.com.br