Grafana Architect - Multi-Cloud & On-Prem Observability and Monitoring

Basking Ridge, New Jersey

Highbrow Technology Inc

Job Expired - Click here to search for similar jobs

Job Title: Grafana Architect - Multi-Cloud & On-Prem Observability and Monitoring

Location: Basking Ridge, NJ (Onsite)

Employment Type: C2H

Job Summary

We are seeking a seasoned Grafana Architect with strong expertise in designing and implementing observability and monitoring solutions across multi-cloud (AWS, Azure, GCP) and on-premise environments. The ideal candidate will have deep hands-on experience with Grafana, Prometheus, Loki, Tempo, and integrations with various telemetry sources. You will be responsible for end-to-end observability strategy, architectural governance, implementation, and evangelizing best practices across teams.

Key Responsibilities

Architect and implement scalable observability solutions across hybrid/multi-cloud and on-premise environments using Grafana OSS/Enterprise.

Define monitoring strategies, SLOs/SLIs, dashboards, alerts, and reporting mechanisms for infrastructure, applications, and services.

Integrate Grafana with Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, cloud-native tools (e.g., AWS CloudWatch, Azure Monitor, GCP Operations Suite), and on-prem systems.

Lead design and implementation of custom plugins, data sources, and dashboards for cross-platform observability.

Build and standardize templates, alerting rules, and RBAC models within Grafana Enterprise.

Collaborate with DevOps, SRE, Cloud, and App teams to define observability needs and onboard them into the platform.

Define and implement monitoring as code (MaC) practices using Terraform/Ansible for observability infrastructure.

Govern and optimize telemetry collection (logs, metrics, traces) for performance, cost, and usability.

Lead capacity planning, HA/DR design, performance tuning, and upgrades for Grafana stack.

Provide thought leadership on OpenTelemetry, distributed tracing, log aggregation, and AIOps capabilities.

Conduct training, documentation, and internal community engagement around observability tools.

Required Skills & Experience

5+ years of hands-on experience with Grafana, including dashboard design, plugin development, and user management.

Strong expertise with Prometheus, Loki, Tempo, Alertmanager, and OpenTelemetry.

Proven experience designing multi-cloud (AWS, Azure, GCP) observability frameworks.

Experience integrating with on-premise systems (e.g., vSphere, bare-metal monitoring, SNMP, legacy tools).

Hands-on with Terraform, Helm, Ansible, GitOps practices for monitoring infrastructure.

Strong scripting and automation skills (Python, Bash, etc.).

In-depth knowledge of monitoring standards, telemetry formats (Prometheus metrics, OTLP, JSON logs).

Proficient in SRE principles (SLOs, SLIs, error budgets, alerting strategy).

Experience with RBAC, LDAP/SAML integration, Grafana Enterprise features.

Strong troubleshooting skills in distributed systems and observability pipelines.

Excellent communication, stakeholder management, and leadership skills.

Nice to Have

Experience with AIOps/ML-based anomaly detection in observability.

Knowledge of security and compliance considerations in monitoring (e.g., SOC2, PCI).

Exposure to SIEM tools like Splunk, Chronicle, or Elastic Security.

Experience with Kafka, Fluent Bit, Vector, or similar log forwarding pipelines.

Certifications (Preferred)

Grafana Certified Observability Professional

AWS/GCP/Azure Solution Architect Associate or Professional

Certified Kubernetes Administrator (CKA)

Interested Please share your Resume to

Date Posted: 27 May 2025

Job Expired - Click here to search for similar jobs