A startup based in Arlington VA is seeking a Site Reliability Engineer to support the deployment, performance, and resilience of a cutting-edge platform within a secure, high-stakes environment. This role is ideal for someone who thrives in operationally sensitive environments, is deeply technical, and wants to make a real-world impact supporting national security missions.
What You'll Do: - Maintain and scale production infrastructure within a secure on-prem environment
- Automate deployments, monitoring, and maintenance to support high availability and performance
- Debug complex infrastructure and application issues under tight operational constraints
- Collaborate with software engineers to improve reliability, observability, and platform performance
- Monitor system health, develop runbooks, and ensure disaster recovery and backup processes are in place
- Work hands-on with classified systems, ensuring compliance with all security requirements
Who You Are: - Experienced SRE, DevOps Engineer, or Infrastructure Engineer with hands-on systems and operations experience
- Strong background with Linux systems, containers (Docker, Kubernetes), and scripting (Python, Bash, etc.)
- Familiarity with on-prem deployments and air-gapped environments
- Skilled in monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK, etc.)
- Active TS/SCI clearance
- Comfortable working full-time, on-site at Fort Meade
Preferred Qualifications: - Familiarity with CI/CD pipelines, infrastructure-as-code (Terraform, Ansible), and security hardening
- Understanding of network protocols and secure system architecture
Posted by: Patrick Fuller
Specialization : - Cloud Security
- Site Reliability Engineer
- DevOps
- Cloud Engineer