Devops/Site Reliability Engineer

Birmingham, Alabama

Saxon Global
Apply for this Job
Must Have Technical Skills:
  • Open Shift or GKC (Google Kubernetes Engine)
  • Expertise in SRE principles and know how to apply them to infrastructure (bridge between infrastructure and dev)
  • SRE > reactionary, dealing with optimizations and issues once the applications are running
  • Prometheus

Job Description

SUMMARY
The Site Reliability Engineer (SRE) is responsible for improving system reliability and
resilience. This role focuses on building automation to reduce manual effort and prevent
service-impacting incidents. The SRE combines software and systems engineering to
build and support large-scale, distributed, fault-tolerant systems. This role ensures that
critical platforms are available, reliable, and able to support a fast rate of improvement.
This role relies on monitoring platforms and is continually taking a holistic view of system
health and performance. The SRE will enhance and support cloud-based
transformations, and is focused on pushing capabilities forward, staying ahead of
customer needs and innovating for continuous improvement. The SRE provides
operational support and engineering for multiple large-scale distributed software
applications

JOB DUTIES

• Gathers and analyzes metrics from monitoring platforms to assist in performance tuning
and fault tolerance.

• Partners with development teams to improve services through testing and release
procedures.

• Participates in system design, platform management and capacity planning.

• Balances feature development speed and reliability with service-level objectives.

• Works closely with the incident response team and restoring service to normal operation.

• Understands debugging and applying troubleshooting skills.

• Investigates, blocks and rate-limits unwanted traffic.

• Utilizes monitoring systems and dashboards for proactive changes and alerting.

• Establishes continuous process improvement cycles where the process, performance,
and supporting technologies are reviewed and enhanced where applicable.

• Performs other duties as assigned.

EDUCATION & EXPERIENCE
Typically requires a bachelor's degree and five (5) to seven (7) years of experience in a
technology and/or software engineering role or an equivalent combination.

KNOWLEDGE, SKILLS, ABILITIES

• Understanding of Kubernetes, containers, clusters and elastic scalability.

• Expertise in SRE principles.

• Mindset of continually finding ways to drive scalability, stability, and performance.

• Cloud Services experience with Google Cloud Platform (GCP).

• Experience with API, service-based or microservice-based architecture.

• Proficiency in infrastructure, network, database, operating systems or security
troubleshooting and remediation.

• Architecture-level knowledge of Windows and Linux and Infrastructure systems.

• Experience with production deployment, monitoring and operational support for enterprise-class applications (Dynatrace a plus).

• Experience working with Continuous Integration/ Continuous Deployment tools.

• Experience in performance diagnostics, capacity planning, performance architecture
design, performance tuning and performance monitoring.

• A strong mix of software engineering and operational support skills.

• Knowledge of web technologies - HTTP, proxy, java, etc.

• Experience with Azure DevOps (ADO), Dynatrace, Prometheus, Terraform and Grafana.
Date Posted: 07 April 2025
Apply for this Job