Get My SaaSTech Lead - DevOps Engineer & Architect
Position Type: Full-Time
Location: Remote
Reports To: Founder & Tech Advisors
1. Role Summary
This role is pivotal in architecting, planning, and executing the backend infrastructure of an AI-powered SaaS marketplace. The scope includes both SaaS operations and AI/ML pipelines. The candidate must possess hands-on expertise in Infrastructure as Code (IaC), cloud-native architectures, DevOps automation, and cross-functional team leadership. Ownership, innovation, and delivery excellence in a dynamic startup environment are non-negotiable.
2. Engagement Scope
2.1 Type of Engagement
- Full-time role with flexible hours.
- Remote-first and async collaboration culture.
- Performance will be evaluated through weekly deliverables and progress checkpoints.
- Requires structured autonomy and tight coordination with engineering and AI/ML teams.
2.2 Work Nature
- Deliverable-based execution (not hourly tracking).
- Requires proactive ownership, clear communication, and occasional contributions outside core responsibilities due to startup constraints.
3. Key Responsibilities3.1 Initial Phase: Planning & Architecture (2 Weeks)
- Gather technical requirements across AI/ML, data, and backend teams.
- Finalize cloud architecture (AWS-centric) with cost-efficient, open-source tooling where applicable.
- Design infrastructure encompassing:
- SaaS backend services
- ML training/inference pipelines
- Monitoring and observability stack
- CI/CD systems
- Security/compliance (IAM, VPC, WAF)
- Develop a roadmap using Terraform or other IaC tools (Pulumi, CDK).
- Evaluate modern ISE platforms (e.g., System Initiative, Cortex, Morpheus Data) for integration potential.
3.2 Development Phase (Weeks 4-8)
- Provision AWS resources using Terraform and integrate GitOps pipelines.
- Set up AI/ML infrastructure with AWS SageMaker, Bedrock, and open-source alternatives.
- Implement DevOps automation:
- CI/CD pipelines (CodePipeline, GitOps)
- Serverless APIs with Lambda/API Gateway
- Containerized deployments (Fargate, ECS, EKS)
- Deploy observability tools:
- Logging (OpenTelemetry)
- Monitoring (Prometheus, Grafana)
- Alerting and incident management
- Ensure production-grade reliability with autoscaling, failover, and DR mechanisms.
3.3 Cross-Functional Leadership (Ongoing)
- Lead infrastructure planning and execution with:
- AI/ML Engineers (for model training/inference)
- Data Engineers (ETL, pipelines)
- Full-stack Developers (backend APIs, DB infrastructure)
- Drive hiring/resource planning for backend and DevOps functions.
- Own infrastructure-level decisions and performance/cost optimizations.
3.4 Continuous Responsibilities (Weekly)
- Lead weekly sprints and 1:1 check-ins.
- Maintain technical documentation (architecture, configs, deployment workflows).
- Monitor system health and proactively resolve issues.
- Improve DevOps maturity via internal tooling, workflow automation, and incident response practices.
- Stay current with trends in MLOps, AIOps, and serverless Ops; recommend new tools or upgrades.
4. Required Technical Competencies
- Cloud Infrastructure: Expert-level AWS (EC2, Lambda, EKS, SageMaker, IAM, VPC).
- IaC & GitOps: Deep experience with Terraform, Pulumi, AWS CDK, GitOps-based automation.
- AI/ML Infrastructure: Familiarity with ML pipelines, GPU workloads, AutoML, inference serving, model versioning.
- DevOps/MLOps: Strong in Docker, Kubernetes, CI/CD best practices, AIOps tooling.
- Observability: Experience deploying logging, metrics, alerting systems; focus on reliability.
- Tool Evaluation: Ability to assess and implement cost-effective, open-source, enterprise-grade solutions.
5. Required Soft Skills & Attributes
- Proven leadership in managing technical teams.
- Ability to convert business goals into execution-ready tech plans.
- Strong communication and collaboration skills.
- High ownership and bias toward action.
- Precision in planning, execution, and documentation.
- Startup mindset: adaptable, proactive, outcome-driven.
6. Collaboration & Reporting
- Interfaces with:
- AI/ML Engineers
- Data Engineers
- Backend Developers
- Project Manager (Founder)
- Tools:
- Communication, Documentation & PM: Using a project & work scope tracker and communicator
- Cadence:
- Weekly sprint planning & retrospectives
- Asynchronous daily updates
7. Deliverables & KPIs
- Deliverables and KPIs will be finalized post the initial 2-week planning phase.
- Development roadmap must incorporate inputs from technical advisors and will be used to track performance.
- Milestones will align with sprint-based execution and weekly progress reviews.
8. Special Notes
- Supported by a core team including:
- AI/ML Engineer
- Backend and frontend developers
- Data Engineer
- While the company is bootstrapped, support is available - but initiative and adaptability are essential.
- This role is foundational to establishing scalable, secure, and reliable infrastructure.