Site Reliability Engineer

Rancho Cordova, California

TEKsystems

C2C OR VISA SPONSORSHIP IS NOT AVAILABLE FOR THIS POSITION
Skills needed:

Bachelor's Degree plus 4 years of related functional experience (or 8 total years if no degree)
Experience with both Windows Administration and Linux, as well as containerization software products
Functional with continuous integration and continuous delivery
Experience with automation and orchestration using Chef, Puppet, Ansible and containers
Coding skills beyond simple scripts and knowledge of application architecture
Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C /C , Ruby, and JavaScript
Understanding of distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (OpenShift, Kubernetes, Yarn)
Skilled in spotting problems and identifying performance bottlenecks, leading to problem and root cause analysis and risk mitigation
Capacity monitoring and performance planning experience with cloud solutions like AWS using applications such as Dynatrace, New Relic, App Dynamics
Preferred Skills:

MS Windows Administration
MS Active Directory
MS PowerShell
Chef/Puppet/Ansible Shell Scripting
.NET Common Language Runtime (CLR)
Red Hat Enterprise Linux
Amazon Web Services (AWS)
Python, Java, C/C /C , Ruby, and JavaScript
OpenShift, Kubernetes, Yarn
Linux Containers
Dynatrace, Splunk, New Relic, App Dynamic
"Lift and Shift" experience in taking legacy workloads and migrating them to modern solutions.
Description
Use engineering design concepts to recommend design or test methods for attaining or improving operational reliability in support of business objectives. Develop and implement high-reliability tools, systems, and services using engineering methodologies and tools. Determine reliability requirements and deliver insights from massive scale data in real time. Propose changes in design or formulation to improve system and/or process reliability. Utilize best practices and work with cross-functional teams to provide solutions and a positive user experience.

Improve reliability, quality, and time-to-market for suite of software solutions, through effective hosting, monitoring, operations, and automation
Develop proprietary tools to improve system reliability and mitigate weaknesses in incident management or software delivery
Collaborate with team members to troubleshoot and fix issues utilizing knowledge of problems to route support escalation issues to the appropriate teams
Add automation for improved collaborative response in real-time, updates documentation, runbook tools, and modules to prepare teams for incidents
Support optimizing the software development life cycle to boost service reliability, based on post-incident reviews
Support system cost modeling for all hosted systems
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Deliver primary operational support and engineering for distributed software applications
Implement guidelines and plans for automated systems delivery maintaining system and data security
Assist with impact analysis regarding enterprise-wide technology
Perform capacity monitoring with various monitoring tools (Splunk, Dynatrace, etc.) and make recommendations
Gather and analyze metrics from both operating systems and applications to assist in performance tuning, fault finding, and corrective action planning
Support system integration, software, and hardware at enterprise level for optimum performance
Partner with development teams to improve services through rigorous automated testing and release procedures
Contribute to system architecture planning, and policies and procedures surrounding enterprise-wide technology
Participate in system design consulting, platform management, and capacity planning
Stay abreast of new technologies; introduce applicable technology in alignment with business goals and for creative solutions
Pay and Benefits
The pay range for this position is $47.00 - $65.00/hr.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:

Medical, dental & vision
Critical Illness, Accident, and Hospital
401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
Life Insurance (Voluntary Life & AD&D for the employee and dependents)
Short and long-term disability
Health Spending Account (HSA)
Transportation benefits
Employee Assistance Program
Time Off/Leave (PTO, Vacation or Sick Leave)
Workplace Type
This is a fully remote position.
Application Deadline
This position is anticipated to close on Apr 30, 2025.

About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Date Posted: 23 April 2025

Apply for this Job

Show me similar jobs

Send me jobs by email