About the Role
We are looking for a Staff Backend Platform Engineer with deep expertise in distributed systems, microservices architecture, and cloud-native backend engineering. This role requires strong operational excellence, ensuring system reliability, scalability, and observability while driving best practices in incident management, performance tuning, and automation.
Key Responsibilities
System Architecture & Scalability
- Design & Build: Architect and develop scalable, fault-tolerant backend systems
- Microservices Development: Implement microservices using Go, Java, or Python, ensuring high availability and resilience.
- Cloud & Kubernetes: Deploy and manage applications on AWS, GCP, or Azure with Kubernetes (EKS, GKE, AKS).
- Event-Driven Architectures: Work with Kafka, Pulsar, RabbitMQ for distributed messaging and streaming workloads.
Operational Excellence & Incident Management
- Reliability & Resilience: Implement best practices for graceful degradation, retries, circuit breakers, and auto-scaling.
- Incident Response & On-Call Management: Define SLAs/SLIs/SLOs, set up robust alerting & escalation processes for incident handling.
- Postmortems & RCA (Root Cause Analysis): Lead post-incident analysis, drive corrective actions, and improve system reliability.
- Observability & Monitoring: Define and implement logging, monitoring, and distributed tracing using Prometheus, OpenTelemetry, Grafana, Datadog.
Performance Optimization & Security
- Performance Tuning: Diagnose and optimize latency, throughput, and memory utilization for large-scale distributed systems.
- Multithreading & Concurrency: Design and implement highly concurrent, multithreaded backend services for parallel processing.
- Database & Storage Optimization: Improve performance of SQL (PostgreSQL, MySQL) and NoSQL (Cassandra, DynamoDB, Redis, MongoDB) solutions.
- Security & Compliance: Implement API security, authentication, authorization, and ensure compliance with SOC2, ISO 27001, PCI DSS.
Leadership & Collaboration
- Mentorship & Code Reviews: Guide engineers in best practices for platform engineering, microservices, and distributed systems.
- Cross-Team Collaboration: Work with cloud engineering, security, and product engineering teams to align platform capabilities with business needs.
Key Qualifications
- 10+ years of experience in backend platform engineering, distributed systems, and microservices.
- Strong programming expertise in Go, Java, or Python, with a focus on multithreading and concurrency.
- Expertise in Kubernetes, service meshes (Istio, Linkerd), and cloud infrastructure.
- Deep understanding of gRPC, REST APIs, GraphQL, and API performance tuning.
- Hands-on experience with CI/CD and infrastructure automation (Terraform, Pulumi).
- Proven ability to manage production incidents and other operatonal excellence practices.
- Excellent debugging and problem-solving skills in complex, distributed environments.