Site Reliability Engineering Manager

Redmond, Washington

Denali Advanced Integration
Job Expired - Click here to search for similar jobs
Summary of Position: As the Site Reliability Engineering (SRE) Manager you will be responsible for leading a team of Cloud Operations, Site Reliability Engineers, Cloud Security, and Cloud Architecture experts in AWS supporting technologies. This position will ensure that the team delivers value to internal development teams, cloud security, operations, and manages operations to maximize uptime, availability, and system stability. Essential Functions: Lead a team involved with Cloud Operations, Site Reliability Engineering, Cloud Infrastructure, Cloud Security, and Cloud Architecture using AWS and other supporting technologies. Provide thought leadership on SRE topics including definition of metrics, production system management, change management, continuous deployment management, and incident response. Define and implement best practices for Site Reliability including Observability, Resiliancy, Automation and Security. Provide infrastructure as code and operational support for full-stack software applications Collaborate with development staff to create, monitor, and troubleshoot the system infrastructure Increase system resilience and serve larger customer volumes with expert-level infrastructure as code, bulletproof release, and change management skills Deliver on Commitments on time with high quality and sense of ownership and accountability. Ensure that the team delivers value to delivery teams, cloud security, operations, and manages operations to maximize uptime, availability, and system stability. Manage AWS and on-premise infrastructure and services, including EC2, CloudWatch, S3, Redshift, RDS, ECS, EKS, On-Premise VMWare, CloudFormation, Terraform, Ansible, Lambda, CloudWatch Alarms, Alerts, and Automation, VPC Management, network security groups, Network Access Control List, VerneMQ, Kafka, OpenStack, Kubernetes, Docker, Java, and Ruby on Rails. Work closely with the team and client stakeholders to prevent incidents by delivering robust solutions, as well as resolve priority incidents and other production issues. Ensure the resolution of issues using professionalism, technical skills, common sense, and leadership. Reduce the cycle time between problem or incident identification and problem or incident resolution following policies and processes set by our team, the client, and industry best practices. Create, maintain, publish and delivery of the technology roadmap. Develop, maintain and communicate detailed engineering resource plans and schedules. Consult with business decision-makers and senior engineering staff to inform discussions related to technical decisions. Serve as primary point of contact for reviewing user and/or technical requirements, scope estimating, identifying tasks, assigning and coordinating technical resources. Proactively identify project dependencies and develop and implement strategies to mitigate risks. Facilitate strong communication amongst team members and cross-functional leads. Competencies: Ensures Accountability Tech Savvy Communicates Effectively Values Differences Customer Focus Resourcefulness Drives Results Plans and Prioritizes Decision Quality Self-Development Work Environment: This position requires occasional onsite work at the client. This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets, and fax machines. Physical Demands: The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. While performing the duties of this job, the employee in this position frequently communicates with other co-workers/clients who have inquiries about the various projects and other needs. Must be able to exchange accurate information in these situations. The employee must be able to remain in a stationary position 75% of the time. The employee in this position needs to occasionally move about inside the office to access file cabinets, office machinery, etc. Constantly operate a computer and office machinery such as a calculator, keyboard, copy machine, and printer. Frequently moves boxes with equipment weighing up to 25lbs across the building and/or to other offsite buildings for various project needs. Required Education and Experience: Bachelor's degree in computer science or a related field 10-15 Years of Experience Qualifications: At least 8 years of experience in managing AWS cloud infrastructure and services Experience with DevOps and Site Reliability Engineering practices Strong leadership and management skills with experience leading and managing a 24x7 team Experience in Incident, Problem, and Change Management Experience in managing cloud security and architecture Strong problem-solving skills and a quality focus Strong communication and interpersonal skills Experience working in a fast-paced, dynamic environment Relevant industry certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or similar AAP/EEO Statement: 3MD Inc. is an equal opportunity employer and does not discriminate based on gender, sex, age, race and color, religion, marital status, national origin, disability, sexual orientation, gender identity or expression, veteran status, or any other category that is protected by applicable law. Other Duties: Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties, or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
Date Posted: 18 May 2024
Job Expired - Click here to search for similar jobs