Job Description:
Job Title - Software Engineer (Observability Platform Engineer)
Location - Pleasanton, CA. 94588 (Hybrid - onsite 50% of the time)
Duration - 6 months contract with possibility of extension
About the role:
The team deploys and operates Observability cloud infrastructure (Kubernetes, CI/CD tooling, networking, security) for the observability teams. Additional engineers are needed to handle the transition to full availability of all Observability services and the now-growing scalability and performance challenges.
Team is a modern full-service DevOps team responsible for the development, creation, and operation of Client's Observability Services (Metrics, Logs, Traces, Stats & other Services). As a team member and specialist in Cloud Platform Engineering, you will play a key role in developing the agile automation that underpins and enables our next-generation observability platform. As the common cloud platform infrastructure team, we own everything from virtualized compute and storage to critical infrastructure observability to governance to CI/CD.
Responsibilities:
- You will collaborate with the security and networking teams to configure and deploy defense-in-depth security controls and highly performant networking infrastructure.
- Develop an architecture, configuration standards, and automation to enable low-risk agile service deployment by all DPOE teams.
- Create infrastructure monitoring and tooling for platform performance tuning and debugging.
- Drive problems impacting critical systems to solution and implement automation to prevent reoccurrence.
- Participate in the on-call rotation to support DPOE critical systems.
- Research, evaluate, and develop new open source and cloud native tools and technologies as needed to meet new requirements.
Basic Qualifications:
- 3+ years of software engineering experience using one or more of the following: Java, Python, Golang.
- 3+ years of AWS public cloud engineering DevOps experience with extensive expertise in networking and security architecture, configuration, and deployment
- 3+ years of Design, Build & Maintain large scale Kubernetes clusters
- 3+ years with container orchestration platforms like - EKS/ECS/GKE etc
- 3+ years hands on experience with Terraform and/or Ansible and/or Pulumi and/or other IaC technologies.
- 3+ years in Cloud native Open Source tools
- 3+ years of Hands-on experience with ArgoCD for continuous deployment and orchestration
Other Qualifications
- Expertise in TCP/IP protocol debugging, IP routing configuration, and firewall/proxy/load balancer configuration
- Deep knowledge in the configuration of security services such as key management systems, AWS IAM policies, Kerberos, and LDAP/AD integration
- Experience with public cloud provider technology stacks at scale, especially VM provisioning, Kubernetes, data storage, and stream processing services
- Distributed system performance analysis and optimization experience
- Development experience with a wide range of programming languages, for example, Java, Python, Golang, etc.
- Experience learning complex open-source service internals via code inspection.
- Experience with modern software development tools including CI/CD and methodologies like Agile.
- Experience with Linux system internals and tuning.
- Strong written and oral communication skills and the ability to explain esoteric technical details clearly to engineers without a similar background.