Apply for this Job
Job Description: In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Systems Operations Engineering. Review and analyze complex multi-faceted, larger scale or longer-term Systems Operations Engineering challenges that require in-depth evaluation of multiple factors including intangibles or unprecedented factors. Contribute to the resolution of complex and multi-faceted situations requiring solid understanding of the function, policies, procedures, and compliance requirements that meet deliverables. Strategically collaborate and consult with client personnel. Required Qualifications: 5+ years of Systems Engineering or Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work or consulting experience, training, military experience, education.
We are looking for a highly skilled Site Reliability and operations Engineer (SRE) with extensive experience in Kubernetes-based distributed caching and compute grid solutions. This role requires a strong foundation in software development, infrastructure automation, and reliability engineering. You will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.
Development & Implementation:
• Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift
• Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm.
• Implement high-throughput compute grid solutions using IBM Spectrum Symphony, Tibco Grid Server or similar technologies.
• Optimize application performance by leveraging parallel compute strategies, load balancing, and efficient data distribution.
Site Reliability Engineering (SRE):
• Ensure high availability, scalability, and reliability of distributed systems.
• Implement observability, logging, and monitoring using tools like Prometheus, Grafana, ELK, or OpenTelemetry.
• Automate infrastructure provisioning and deployments using Ansible, and Helm Charts.
• Understanding of CI/CD pipelines for seamless software deployment.
• Troubleshoot and resolve incidents related to platform, infrastructure and distributed compute platforms, ensuring minimal downtime.
Required Skills & Qualifications:
• Strong experience in Kubernetes (OpenShift and on-prem/cloud clusters).
•
• Understanding of programming languages like Java, Go, or Python.
• Experience with containerization technologies (Docker, Helm, etc.).
• Strong knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions).
• Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger).
• Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes.
• Experience with multi-cluster and hybrid cloud Kubernetes deployments.
EEO:
"Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of - Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans."
Date Posted: 21 April 2025
Apply for this Job