Title:
Senior BizOps Engineer (Site Reliability Engineer) Location: O'Fallon, Missouri (Main Campus) - Local Candidates HIGHLY PREFERRED, however, open to relocation candidates from DAY 1.
- As always, local will get first preference.
Duration: 24 months Glider: BizOps Engineering ExpertNotes - This is a Senior SRE position
- Must have strong communication skills
- Cloud experience would be a plus
- CI/CD
- Advanced Bash Scripting skills
- Python programming skills
- Experience with monitoring tools such as Splunk, Dynatrace
- Automation with APIs
- Must have Java experience to handle some Java applications
- We would like to see someone with some ITSM best practices (change management, incident management, etc.)
- We are looking for someone who can do some production support with on-call
- On-call shift will be 1 week every 5 week rotation
- In the past, some candidates could not talk about what was on their resume and could not answer simple questions, these types of candidates will be automatically disqualified
- Candidates who can speak what is on their resume and communicate their answers in a effective manner are the ones we have hired in the past
- What are your top 3 required technical skills?
- DevOps CI/CD, Site Reliability Engineering
- Linux Operating, bash scripting, Python Programming skills
- Monitoring and Alerting Dashboard - API integration/Automation
- What soft skills would you like to see in a candidate?
Any CI/CD tool Site Reliability Engineering experienceJob Description Summary Overview:
The Client BizOps team is looking for a Site Reliability Engineer who can help us solve problems, build our pipelines and lead Client in automation and best practices.
- Are you a born problem solver who loves to figure out how something works?
- Are you a geek who loves all things automation?
- Do you have a low tolerance for manual work and look to automate everything you can?
Business Operations is leading the DevOps transformation at Client through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Mission
The role of business operations is to be the production readiness steward for the platform. This is accomplished by closely partnering with developers to design, build, implement, and support technology services. A business operations engineer will ensure operational criteria like system availability, capacity, performance, monitoring, self-healing, and deployment automation are implemented throughout the delivery process. Business Operations plays a key role in leading the DevOps transformation at client through our tooling and by being an advocate for change and standards throughout the development, quality, release, and product organizations.
We accomplish this transformation through supporting daily operations with a hyper focus on triage and then root cause by understanding the business impact of our products. The goal of every BizOps team is to shift left to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Biz Ops teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. A BizOps focus is also on streamlining and standardizing traditional application specific support activities and centralizing points of interaction for both internal and external partners by communicating effectively with all key stakeholders.
Ultimately, the role of BizOps is to align Product and Customer Focused priorities with Operational needs. We regularly review our run state not only from an internal perspective, but also understanding and providing the feedback loop to our development partners on how we can improve the customer experience of our applications.
Responsibilities
• Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation, and refinement.
• Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
• Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency, and overall system health with automated alerts.
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
• Practice sustainable incident response and detailed postmortems.
• Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
• Work with a global team spread across tech hubs in multiple geographies and time zones
• Share knowledge and mentor junior resources
Qualifications
• BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
• Experience with algorithms, data structures, scripting, pipeline management, and software design.
• Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
• Ability to help debug, optimize code, and automate routine tasks.
• We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
• Experience in one or more of the following is preferred: Python, Go, Bash Scripting.
• Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
• We need team members with an appetite for change and pushing the boundaries of what can be done with automation.
• Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
• For work on our ops team, engineer with experience in industry standard CI/CD tools like Git/BitBucket, Jenkins, and Chef. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.
• Required Bash and Python scripting experience.
All About You
• Must be high-energy, detail-oriented, and proactive
• Must have the ability to function under pressure in an independent environment
• Must provide the necessary skills to have a high degree of initiative and self-motivation to drive results
• Excellent interpersonal and problem-solving skills
• Excellent written and verbal communication skills
• Expert knowledge of some of the following technologies: Directory Services, Authentication/Authorization, Access Provisioning, Public Key Infrastructure (PKI), Controls and compliance
• Knowledge of with SOAP and REST web service.
- Duration of assignment? Is there an anticipated opportunity for conversion or extension? 2 years
- What is the name of your group? How does it fit into the overall Client organization? Program: Employee Access BizOps, we manage the internal employees' access platforms.
- What program will this person be supporting? Will this person be a part of a Guild? If so, which one and how will they be contributing? Program: Employee Access management, hiored cadoidate will be supporting on-going privileged access management and employee access projects on site reliability opetaions.
- What is your team's main responsibility BizOps, SRE (Site Reliability Engineering) - Create/enhance Monitoring Alerts, product/platform maintenance operations, ITSM-CRQS, Workorders, On-Call support.
- What will a typical work day look like for this contractor? What are their expected hours? Regular BizOps work with product/platform maintenance operations, ITSM-CRQS, Workorders, On-Call support 8-5pm CST
- Will there be any travel involved? No
- What kind of accesses will this person need? Network & Badge or just Network - Network & Badge