Lead DevOps Engineer - Shape the Future of Our Platform
Chicago - On site
What You'll Do:
- Architect and Scale: Design and implement a forward-thinking infrastructure strategy that seamlessly scales to meet the evolving demands of our web services.
- Cloud Mastery: Architect, manage, and maintain our cloud infrastructure (primarily AWS, including services like Lambda, EC2, EKS, DynamoDB, and Aurora) across all environments, with a laser focus on reliability, availability, and scalability.
- Empower Engineering: Collaborate closely with software engineering teams to understand their needs, enabling rapid iteration, efficient testing, and seamless deployments.
- Automate Everything: Develop and refine automation tools and workflows to streamline deployment and operational processes, boosting efficiency and reducing manual effort.
- Champion Reliability (SRE): Implement and evangelize Site Reliability Engineering (SRE) principles, defining and tracking SLAs, SLOs, and SLIs to ensure optimal service performance and reliability.
- Drive Observability: Define and maintain comprehensive observability standards through robust monitoring, alerting, logging, and distributed tracing systems.
- Build Developer-Friendly Tools: Create intuitive internal platforms and tooling that empower developers with increased automation, productivity, and deployment confidence.
- Enable Innovation: Ensure our infrastructure and tools are adaptable and supportive of the adoption of new technologies and features as we grow.
- Secure and Compliant: Develop and implement robust security and IT standards to effectively manage risks and ensure adherence to company policies.
- Lead and Resolve: Take ownership of incident response efforts, driving thorough blameless postmortems and implementing systemic fixes to prevent recurrence.
- Translate Vision into Reality: Translate business requirements into scalable, resilient, and secure technical solutions, while maintaining robust high availability and disaster recovery capabilities.
- Build and Mentor: Lead and grow a team of talented DevOps professionals, scaling the organization's reliability operations in line with our expanding needs.
What You'll Bring:
- Minimum of 5 years of hands-on experience in infrastructure roles, with significant expertise in AWS cloud environments (Lambda, EC2, EKS, DynamoDB, Aurora, etc.).
- Bachelor's or Master's degree in Computer Science, Information Systems, or a related field.
- Proven experience in managing high-scale, high-throughput environments with distributed microservices architectures.
- Deep proficiency in Infrastructure as Code (IaC) using tools like Terraform.
- Expertise in networking, virtualization, and container orchestration technologies, particularly Docker and Kubernetes.
- Solid understanding of CI/CD tools and modern observability solutions.
- Exceptional problem-solving, strategic thinking, and communication skills, coupled with a strong passion for system design and operational excellence.