Responsibilities: We are seeking an
AWS ML Cloud Engineer to design, deploy, and optimize cloud-native machine-learning systems that power our next-generation predictive-automation platform. You will blend deep ML expertise with hands-on AWS engineering, turningdata into low-latency, high-impact insights. The ideal candidate commands statistics, coding, and DevOps-and thrives on shipping secure, cost-efficient solutions at scale.
Objectives of this role: - Design and productionize cloud ML pipelines (SageMaker, Step Functions, EKS) that advance predictive-automation roadmap
- Integrate foundation models via Bedrock and Anthropic LLM APIs to unlock generative-AI capabilities
- Optimize and extend existing ML libraries / frameworks for multi-region, multi-tenant workloads
- Partner cross-functionally with data scientists, data engineers, architects, and security teams to deliver end-to-end value
- Detect and mitigate data-distribution drift to preserve model accuracy in real-world traffic
- Stay current on AWS, MLOps, and generative-AI innovations; drive continuous improvement
Responsibilities: - Transform data-science prototypes into secure, highly available AWS services; choose and tune the appropriate algorithms, container images, and instance types
- Run automated ML tests/experiments; document metrics, cost, and latency outcomes
- Train, retrain, and monitor models with SageMaker Pipelines, Model Registry, and CloudWatch alarms
- Build and maintain optimized data pipelines (Glue, Kinesis, Athena, Iceberg) feeding online/offline inference
- Collaborate with product managers to refine ML objectives and success criteria; present results to executive stakeholders
- Extend or contribute to internal ML libraries, SDKs, and infrastructure-as-code modules (CDK / Terraform)
Skills and qualifications: Primary technical skills: - AWS SDK, SageMaker, Lambda, Step Functions
- Machine-learning theory and practice (supervised / deep learning)
- DevOps & CI/CD (Docker, GitHub Actions, Terraform/CDK)
- Cloud security (IAM, KMS, VPC, GuardDuty)
- Networking fundamentals
- Java, Springboot, JavaScript/TypeScript & API design (REST, GraphQL)
- Linux administration and scripting
- Bedrock & Anthropic LLM integration
Secondary / tool skills: - Advanced debugging and profiling
- Hybrid-cloud management strategies
- Large-scale data migration
- Impeccable analytical and problem-solving ability; strong grasp of probability, statistics, and algorithms
- Familiarity with modern ML frameworks (PyTorch, TensorFlow, Keras)
- Solid understanding of data structures, modeling, and software architecture
- Excellent time-management, organizational, and documentation skills
- Growth mindset and passion for continuous learning
Preferred qualifications: - 10+ years of Software Experience
- 3+ years in an ML-engineering or cloud-ML role (AWS focus)
- Proficient in Python (core), with working knowledge of Java or R
- Outstanding communication and collaboration skills; able to explain complex topics to non-technical peers
- Proven record of shipping production ML systems or contributing to OSS ML projects
- Bachelor's (or higher) in Computer Science, Data Engineering, Mathematics, or a related field
- AWS Certified Machine Learning - Specialty and/or AWS Solutions Architect - Associate a strong plus