Founding Cloud Infrastructure Engineer

San Mateo, California

zaimler
Apply for this Job
About Us

We are on a mission to bridge the gap between enterprise business knowledge and data, democratizing data discovery and curation to prepare organizations for the era of generative AI. Today's data tools are overly complex, poorly integrated, and siloed, forcing AI Practitioners and data scientists alike to spend more time wrestling with tools, relying on tribal knowledge, and navigating data lakes rather than doing meaningful data science work. The current landscape of data tools and processes is heavily manual and needs to catch up with the vast amount of data generated daily. With the advent of Gen AI and multi-modality, this challenge has only grown more complex and broken.

Backed by top VC funds, we are committed to making enterprise data AI-ready faster, more reliably, and with a stronger foundation of factual semantic knowledge. This leads to more accurate models, superior outcomes, and better business results. Our team of seasoned data infrastructure and machine learning experts (from LinkedIn, Visa, Truera, Hive, and Branch) has spent the past two decades building bespoke systems to solve these very challenges.

Join our growing team of ML research and data infrastructure experts. We're committed to empowering AI and data scientists to seamlessly integrate semantic learning with generative AI. Be part of our journey to shape the future of enterprise AI.

About the Job

We are seeking a Senior Cloud Infrastructure Engineer to join our team. The ideal candidate will have expertise in Kubernetes, Docker, Terraform, and Ansible and experience in Ray, Kafka, and GPU operations. In this role, you'll be critical in architecting, deploying, and managing the cloud infrastructure that underpins our advanced data and AI platform. We're looking for someone with a strong background in cloud infrastructure and a proven track record of handling complex, large-scale deployments.

What You Will Be Doing
    • Design and Develop: Architect and implement scalable, fault-tolerant cloud infrastructure solutions optimized for performance and resource efficiency.
    • Containerization and Orchestration: Utilize Kubernetes and Docker to manage containerized applications, ensuring seamless deployment and scalability.
    • Infrastructure as Code: Employ Terraform and Ansible to automate infrastructure provisioning and configuration management.
    • Distributed Systems Management: Oversee and optimize distributed systems, including technologies like Ray and Kafka, to handle concurrency, scalability, and reliability.
    • GPU Resource Management: Manage and optimize GPU resources to support high-performance computing tasks.
    • Collaborate: Work closely with product, data science, and engineering teams to align technical solutions with business needs.
    • Stay Current: Research and implement the latest advancements in cloud infrastructure and distributed systems.
Prior Experience
    • 7 - 10+ years of experience in cloud infrastructure, DevOps, or SRE roles.
    • Deep expertise in Kubernetes, Docker, Terraform, and Ansible.
    • Strong experience with GPU operations and distributed computing frameworks (Ray, Dask, or similar).
    • Hands-on experience managing Kafka or other message queue systems in production.
    • Proficiency in cloud environments (AWS, GCP, or Azure).
    • Strong understanding of cloud-native networking, security best practices, and observability tools
    • Passion for solving complex infrastructure challenges and improving developer experience.
    • Strong scripting skills (Python, Bash, or Golang preferred).
Nice to Have
    • Experience in building AI/ML infrastructure and ML production systems at scale.
    • Hands-on experience with Linux and other containerization technologies.
    • Prior experience at an early-stage startup, developing systems and processes from scratch.

Why Join Us?

We're a fast-moving, well-funded startup based in San Mateo, working onsite with flexible hours because the best ideas happen when smart people collaborate in person. We take ownership of our work, move with urgency while maintaining quality, and focus on delivering real results-not just effort. We offer competitive compensation, equity, full benefits (Medical, Dental, Vision, 401k), and a workspace built for collaboration, transparency, and deep technical problem-solving.

We sponsor H-1B visas and assist with immigration processes to bring the best minds together.
Date Posted: 10 March 2025
Apply for this Job