Sr Hadoop Lead

Minnetonka, Minnesota

C4 Technical Services
Apply for this Job
Senior Hadoop Lead
Location: no location requirement, CST and EST preferred (Chicago would be nice)

Key Areas of Responsibility:
  • Experience in architecting of MapR and ODP Apache Hadoop Cluster with Ambari with deep understanding on following components
  • HDFS , MapReduce, TEZ, Hive, Yarn, Sqoop, Oozie, HBase, Scala, Spark, Kafka in Hadoop ecosystem
  • Ambari and MCS ( MapR ) for monitoring and administrating the Hadoop Multi Node Clusters
  • ODP Apache Hadoop Cluster with Ambari, DR and MapR.
  • MapR Certified Administration Skills
  • Overview of MapR components and its architecture.
  • Understanding & working of MapR Control System, MapR Volumes, snapshots & Mirrors.
  • Planning a cluster in context of MapR.
  • Comparison of MapR with other distributions and Apache Hadoop.
  • MapR installation and cluster deployment, Upgrade and Data Migrations skills
  • Unix administration skills to manage the cluster with root and basic knowledge of core Linux File Systems
  • Configure Cluster Resources
  • How to provide Data Access and Protection of Data
  • Disk and Node Maintenance
  • Monitoring, Managing, and Troubleshooting the Cluster
  • Managing services, nodes, snapshots, mirror volumes and remote clusters.
  • Understanding and managing Nodes.
  • Understanding of Hadoop components, Installing Hadoop components alongside MapR Services.
  • Accessing Data on cluster including via NFS Managing services & nodes.
  • Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing, and monitoring metrics to monitor performance, configuring and administering MapR security.
  • Cluster configuration and tuning for optimum performance.
  • In and out Hive components knowledge
  • In and out Spark components knowledge
  • Python and Java skills
  • Troubleshooting and Debugging Skills when any job failure occurs, or cluster slowness reported.
  • Deployment of Hadoop Clusters
  • Monitoring multiple Hadoop Clusters environments using Ambari Metrics, MCS and Grafana
  • Configuring Name Node, Data Node, Resource Manager and HiveServer2 High Availability and Clusters service coordination using Zookeeper
  • Configuring Kerberos for Authentication
  • Performing Minor and Major Upgrade
  • Integrating Hadoop Cluster components with LDAP, Active Directory and enabling SSL for Hadoop Cluster Component
  • Kerberized Hadoop Cluster using Grafana
  • Working knowledge of Python and Jav
  • Nice to have experience with docker based Hadoop clusters
  • Nice to have experience with Data Science leveraging Hadoop/Spa
  • Nice to have experience withcloud (Azure, GCP, OCI, AWS) based Hadoop implementation
  • Experience with large scale and high concurrency environments
  • Strong working knowledge of queries in SQL, Hive Query Language HiveQL.
  • knowledge on Hadoop Application Frameworks like Hive and Spark.
  • Experience with System Integration, Capacity Planning, Performance Tuning, System Monitoring, System Security and Load Balancing.
  • Architecting Backup, Disaster Recovery, Root Cause Analysis for Hadoop, Troubleshooting Hadoop Cluster issues
Key Desired Skills, Experience, & Knowledge:
  • 10+ years of Hadoop experience
  • Demonstrated success in communication, collaboration and motivation of cross-functional departments to achieve exceptional service.
  • Strategic thinking, complex problem solving and analytical capabilities.
  • Excellent Communication, diagnostic and issue resolution skills
  • Ability to see the big-picture and take a holistic approach to problem-solving.
  • Ability to take complex information and communicate in a manner that is understandable to all audiences.
  • Demonstrated understanding of the evolving landscape of technology.
  • Ability to effectively network with colleagues to share knowledge and gain new perspectives.
  • Ability to manage global teams
  • Hadoop Distribution: MapR, Apache Hadoop (ODP), Cloudera/Hortonworks.
  • Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Sqoop, Zookeeper, HBase, Spark.
  • Configuration Management: Ambari, MapR, and Cloudera Manager.
  • Security: LDAP, AD SSL, and Kerberos.
  • Languages, Shell Scripting, bash, Spark SQL, HiveQL, Python, Java
  • Monitoring Tools: Ambari, Grafana, Kibana.
  • Package Management: RPM, YUM.
  • Networking and Protocols: LAN, WAN, TCP/IP, NFS, LDAP, DNS.
  • Operating Systems: RHEL, Oracle Enterprise Linux, Cent OS, Ubuntu.
  • Job Scheduling Tools: Crontab, Airflow, Oozie.

Date Posted: 23 April 2025
Apply for this Job