Role : SRE Engineer - Ops support
Location: Bellevue , WA
Duration: Long Term
Detailed JD (Roles and Responsibilities)
Skills
• SRE Mindset in Production support : Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance
• Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.
• Communication : Excellent communicator who could interact with Director/Sr. Director and above.
Technical expertise
• Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes
• Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix
• Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, GCP
• ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
• APM, NMON , Wireshark usage and analysis
• Experience in UEM and synthetic monitoring tools
Responsibilities
• Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR
• Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution.
• Flexibility to work in 24 X 7 environment