Location: India (Remote - 24/7 Shift)
Full-Time Immediate Start Preferred
We're building the infrastructure that powers cutting-edge AI workloads at scale - and we're looking for a Senior Linux Administrator to help lead the charge. If you live and breathe Linux, love optimizing bare metal with containers and virtualization, and can scale infrastructure through automation, this role is for you.
️ What You'll Do
- Build Scalable AI Infrastructure: Architect and manage high-performance environments combining bare metal, LXC, KVM/QEMU, and GPU networking.
- Orchestrate Deployments: Use MAAS, Terraform, and Chef to automate lifecycle of compute infrastructure.
- Enable Containerized Compute: Design LXC environments on Linux with fine-grained control over resource allocation and isolation.
- Virtualization & Storage: Work with libvirt/KVM/QEMU, optimize disk and block storage via CEPH.
- Infra as Code: Build and maintain reproducible environments using Git and GitHub Actions.
- Monitor & Maintain: Integrate Victoria Metrics and Elasticsearch to monitor and troubleshoot system performance, alerts, and anomalies.
- Network Engineering (Desirable): Leverage VyOS and GPU network fabric topologies to support scalable AI training workloads.
- Cross-Functional Collaboration: Partner with infra, platform, and ML teams via Slack, Jira, and GSuite to deliver production-ready systems.
Your Toolbelt
- Languages & Scripting: Python, Bash
- Operating System: Advanced Linux (Ubuntu/HWE kernel familiarity a plus)
- Containers & Virtualization: LXC, KVM, QEMU, libvirt
- Storage Systems: CEPH (block/object), RAID, partitions
- Infra Automation: MAAS, Terraform, Chef
- Monitoring & Metrics: VictoriaMetrics, Elasticsearch
- Networking: VyOS, GPU network fabric (rail-optimized, E/W & N/S topologies)
- DevOps & Collaboration: Git, GitHub Actions, Jira, Slack, GSuite
What Sets You Apart
- Experience building large-scale compute platforms for AI/ML workloads
- Deep knowledge of virtualization, containers, and cloud-init provisioning
- Passion for observability, reliability, and clean automation
- Strong documentation practices and cross-functional communication