API Production Support

Ashburn, Virginia

System One
Apply for this Job
API Production Support

(Several Openings. 24x7 Production Support Team)

Location: REMOTE

Pay Rate: Open to W2 and C2C options

Position Type: Multiyear Contract

Requirements
  • 24x7, Level 2 API support and incident response service team
  • Expertise in MuleSoft API troubleshooting and support
  • Experience using monitoring tools for API management like Azure Monitor, Splunk and Dynatrace
  • Familiarity with ServiceNow tools for incident tracking and documentation
  • Ability to use enterprise runbooks and wiki documentation for issue resolution
  • Ability to collaborate with multiple internal and external stakeholders, including the Tier 3 team and Support Lead
  • Preferably a Java background to understand stack traces, logs in order to pinpoint root cause
  • Experience with SOAP/REST APIs with Spring Boot and Java microservices
  • Experience with MuleSoft AnyPoint Platform including Exchange and monitoring
  • Use Azure, Splunk and Dynatrace-based dashboards for monitoring and resolution
  • Conduct root cause analysis, escalate issues to internal Tier 3 team as necessary, and engage multiple vendors for resolution when required
  • Use enterprise runbooks, wiki documentation, and collaboration with the Tier 3 team or Support Lead
  • Provide 24x7 on-call support as a primary or secondary contact (rotation basis)
  • Serve as API support on least one major incident call per day, averaging 2 hours
  • API-related incidents through ServiceNow and based on Moogsoft tickets
  • Troubleshoot and resolve issues within L2 incident criteria
  • Ensure timely response and resolution of API-related incidents per agreed SLAs
  • Perform initial triage, log analysis, and impact assessment
  • Ensure monitoring and alerts are accurate, current, and functional
  • Utilize enterprise runbooks and wiki documentation for troubleshooting and resolution
  • Participate in Problem and Knowledge Management process as requested
  • Observability support for incident management to proactively identify, diagnose and resolve issues
  • Conduct detailed RCA (Root Cause Analysis) for recurring or high-impact incidents
  • Provide RCA reports with contributing factors, corrective actions, and long-term recommendations
  • Work with internal teams to implement preventative measures
  • Collaborate with the Tier 3 team or support lead when necessary to resolve complex issues
  • Maintain documentation of escalations, including logs, timestamps and resolution progress
  • After RCA, determine and contact relevant vendors required for issue resolution
  • Provide necessary logs, issue descriptions, and troubleshooting details to vendors
Track vendor resolution progress, coordinate efforts, and update stakeholders Crital, No n-Critical

Ref: (ALTA IT)

System One, and its subsidiaries including Joulé, ALTA IT Services, CM Access, TPGS, and MOUNTAIN, LTD., are leaders in delivering workforce solutions and integrated services across North America. We help clients get work done more efficiently and economically, without compromising quality. System One not only serves as a valued partner for our clients, but we offer eligible full-time employees health and welfare benefits coverage options including medical, dental, vision, spending accounts, life insurance, voluntary plans, as well as participation in a 401(k) plan.

System One is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, age, national origin, disability, family care or medical leave status, genetic information, veteran status, marital status, or any other characteristic protected by applicable federal, state, or local law.

Date Posted: 22 March 2025
Apply for this Job