Data Architect

Palo Alto, California

Georgia IT Inc

Data Architect
Location: Palo Alto, CA (Hybrid)
Duration: Long Term
Rate: DOE

Key Responsibilities:

Data Orchestration:

Design, implement, and manage data workflows using Airflow to automate and orchestrate data processing tasks.
Optimize Airflow DAGs (Directed Acyclic Graphs) for performance and scalability.

Task Management:

Develop and maintain distributed task processing using Celery and ensure robust task queue management with Redis or RabbitMQ.

Database Management:

Design and manage databases using Cosmos DB, MongoDB, and PostgreSQL.
Develop and maintain efficient data models and ensure data consistency and integrity.

API and Webhooks:

Implement and manage FastAPI webhooks to handle data ingestion and integration tasks.
Develop and maintain Azure Functions to support webhook operations and integrate with cloud services.

Streaming Data:

Implement and manage Kafka Streams to handle real-time data processing and streaming requirements.

Data Lake Management:

Work with Iceberg to manage and optimize large-scale data lake storage and querying.

Collaboration and Communication:

Collaborate with data scientists, engineers, and business analysts to understand data requirements and provide technical solutions.
Document processes, architectures, and configurations to ensure knowledge sharing and compliance with best practices.

Required Skills and Qualifications:

Experience and Knowledge:

Proven experience with Airflow for data orchestration and workflow management.
Hands-on experience with Celery for task management and Redis or RabbitMQ for messaging.
Proficiency with Cosmos DB, MongoDB, and PostgreSQL for data storage and management.
Experience developing and managing webhooks using FastAPI and integrating with Azure Functions.
Knowledge of Kafka Streams for real-time data processing.
Familiarity with Iceberg for data lake management and optimization.
Healthcare domain experience is good to have .

Technical Skills:

Strong understanding of data pipelines, ETL processes, and data integration.
Proficient in Python, with experience in building and maintaining data-oriented applications.
Ability to work with large datasets and optimize performance across distributed systems

Soft Skills:

Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.
Ability to work independently and manage multiple priorities in a fast-paced environment.

Date Posted: 14 May 2025