GCP Data Engineer

Raritan, New Jersey

Tekfortune Inc
Apply for this Job
Role - GCP Data Engineer
Location - USA Remote


The GCP Data Engineer will be responsible for constructing and developing large-scale cloud data processing systems within the Google Cloud Platform (GCP). This role involves curating a comprehensive data set that includes information about users, groups, and their permissions to various data sets. The engineer will redesign and implement a scalable data pipeline to ensure timely updates and transparency in data access.

REQUIRED SKILLS:
5+ years of experience in an engineering role using Python, Java, Spark, and SQL.
5+ experience working as a Data Engineer in GCP
Demonstrated proficiency with Google's Identity and Access Management (IAM) API
Demonstrated proficiency with Airflow
Key Responsibilities:

• Design, develop, and implement scalable, high-performance data solutions on GCP.

• Ensure that changes to data access permissions are reflected in the Tableau dashboard within 24 hours.

• Collaborate with technical and business users to share and manage data sets across multiple projects.

• Utilize GCP tools and technologies to optimize data processing and storage.

• Re-architect the data pipeline that builds the BigQuery dataset used for GCP IAM dashboards to make it more scalable.

• Run and customize DLP scans.

• Build bidirectional integrations between GCP and Collibra.

• Explore and potentially implement Dataplex and custom format-preserving encryption for de-identifying data for developers in lower environments.
Qualifications - Required:

• Bachelor's degree in Computer Engineering or a related field.

• 5+ years of experience in an engineering role using Python, Java, Spark, and SQL.

• 5+ years of experience working as a Data Engineer in GCP.

• Proficiency with Google's Identity and Access Management (IAM) API.

• Strong Linux/Unix background and hands-on knowledge.

• Experience with big data technologies such as HDFS, Spark, Impala, and Hive.

• Experience with Shell scripting and bash.

• Experience with version control platforms like GitHub.

• Experience with unit testing code.

• Experience with development ecosystems including Jenkins, Artifactory, CI/CD, and Terraform.

• Demonstrated proficiency with Airflow.

• Ability to advise management on approaches to optimize for data platform success.

• Ability to effectively communicate highly technical information to various audiences, including management, the user community, and less-experienced staff.

• Proficiency in multiple programming languages, frameworks, domains, and tools.

• Coding skills in Scala.

• Experience with GCP platform development tools such as Pub/Sub, Cloud Storage, Bigtable, BigQuery, Dataflow, Dataproc, and Composer.

• Knowledge in Hadoop and cloud platforms and surrounding ecosystems.

• Experience with web services and APIs (RESTful and SOAP).

• Ability to document designs and concepts.

• API Orchestration and Choreography for consumer apps.

• Well-rounded technical expertise in Apache packages and hybrid cloud architectures.

• Pipeline creation and automation for data acquisition.

• Metadata extraction pipeline design and creation between raw and transformed datasets.

• Quality control metrics data collection on data acquisition pipelines.

• Experience contributing to and leveraging Jira and Confluence.

• Strong experience working with real-time streaming applications and batch-style large-scale distributed computing applications using tools like Spark, Kafka, Flume, Pub/Sub, and Airflow.

• Ability to work with different file formats like Avro, Parquet, and JSON

• Hands-on experience in Analysis, Design, Coding, and Testing phases of the Software Development Life Cycle (SDLC)
Date Posted: 13 April 2025
Apply for this Job