Site Reliability Engineer

Los Angeles, California

Diverse Lynx

Job Expired - Click here to search for similar jobs

Site Reliability Engineer
Remote
Fulltime Opportunity

Job Description
Site Reliability Engineer

Must Have Technical/Functional Skills

• Experience in Cloud platforms (AWS, Azure, Google Cloud) and hybrid environments.

• Proficiency in container technologies (Docker, Container, Podman).

• Strong knowledge of Linux administration and networking concepts.

• Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, Helm, or Pulumi.

• Monitoring and logging expertise using Prometheus, Grafana, ELK, Datadog, or Splunk.

• Hands-on experience with CI/CD pipelines and DevOps tools (Jenkins, GitHub Actions, GitLab CI, ArgoCD).

• Proficiency in scripting/programming (Python, Bash, Go) for automation.

• Strong troubleshooting and incident management skills.

Roles & Responsibilities
We are seeking a highly skilled - Site Reliability Engineer (SRE) to manage, optimize, and ensure the reliability of infrastructure. The ideal candidate will have deep expertise in ELK, Dynatrace Pagerduty. Powershell, container orchestration, cloud infrastructure, and automation, along with a strong focus on reliability, scalability, and performance. Good to have Logic Monitor and Python knowledge

• Reliability & Performance: Implement best practices to ensure high availability, scalability, and performance of containerized applications.

• Monitoring & Incident Response: Set up monitoring (Prometheus, Grafana, ELK, Dynatrace, Pagerduty, Powershell etc.), troubleshoot issues, and lead incident resolution.

• Automation & Infrastructure as Code (IaC): Develop and maintain Terraform, Helm charts, and Kubernetes manifests for automation.

• CI/CD & DevOps Integration: Work with DevOps teams to optimize CI/CD pipelines for Kubernetes deployments (Jenkins, ArgoCD, FluxCD, etc.).

• Security & Compliance: Implement security best practices for containerized workloads, RBAC, network policies, and vulnerability scanning.

• Capacity Planning & Optimization: Analyze resource usage and optimize infrastructure costs and performance.

• Disaster Recovery & Backup: Implement backup and disaster recovery strategies for Kubernetes workloads.

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.

Date Posted: 11 May 2025

Job Expired - Click here to search for similar jobs