Site Reliability Engineer

Irving, Texas

Mindlance
Apply for this Job
Job Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Systems Operations Engineering. Review and analyze complex multi-faceted, larger scale or longer-term Systems Operations Engineering challenges that require in-depth evaluation of multiple factors including intangibles or unprecedented factors. Contribute to the resolution of complex and multi-faceted situations requiring solid understanding of the function, policies, procedures, and compliance requirements that meet deliverables. Strategically collaborate and consult with client personnel. Required Qualifications: 5+ years of Systems Engineering or Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work or consulting experience, training, military experience, education.

We are looking for a highly skilled Site Reliability and operations Engineer (SRE) with extensive experience in Kubernetes-based distributed caching and compute grid solutions. This role requires a strong foundation in software development, infrastructure automation, and reliability engineering. You will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.

Development & Implementation:

• Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift

• Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm.

• Implement high-throughput compute grid solutions using IBM Spectrum Symphony, Tibco Grid Server or similar technologies.

• Optimize application performance by leveraging parallel compute strategies, load balancing, and efficient data distribution.

Site Reliability Engineering (SRE):

• Ensure high availability, scalability, and reliability of distributed systems.

• Implement observability, logging, and monitoring using tools like Prometheus, Grafana, ELK, or OpenTelemetry.

• Automate infrastructure provisioning and deployments using Ansible, and Helm Charts.

• Understanding of CI/CD pipelines for seamless software deployment.

• Troubleshoot and resolve incidents related to platform, infrastructure and distributed compute platforms, ensuring minimal downtime.

Required Skills & Qualifications:

• Strong experience in Kubernetes (OpenShift and on-prem/cloud clusters).


• Understanding of programming languages like Java, Go, or Python.

• Experience with containerization technologies (Docker, Helm, etc.).

• Strong knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions).

• Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger).

• Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes.

• Experience with multi-cluster and hybrid cloud Kubernetes deployments.
EEO:

"Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of - Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans."

Date Posted: 21 April 2025
Apply for this Job