Senior Hadoop Lead Location: no location requirement, CST and EST preferred (Chicago would be nice)
Key Areas of Responsibility: - Experience in architecting of MapR and ODP Apache Hadoop Cluster with Ambari with deep understanding on following components
- HDFS , MapReduce, TEZ, Hive, Yarn, Sqoop, Oozie, HBase, Scala, Spark, Kafka in Hadoop ecosystem
- Ambari and MCS ( MapR ) for monitoring and administrating the Hadoop Multi Node Clusters
- ODP Apache Hadoop Cluster with Ambari, DR and MapR.
- MapR Certified Administration Skills
- Overview of MapR components and its architecture.
- Understanding & working of MapR Control System, MapR Volumes, snapshots & Mirrors.
- Planning a cluster in context of MapR.
- Comparison of MapR with other distributions and Apache Hadoop.
- MapR installation and cluster deployment, Upgrade and Data Migrations skills
- Unix administration skills to manage the cluster with root and basic knowledge of core Linux File Systems
- Configure Cluster Resources
- How to provide Data Access and Protection of Data
- Disk and Node Maintenance
- Monitoring, Managing, and Troubleshooting the Cluster
- Managing services, nodes, snapshots, mirror volumes and remote clusters.
- Understanding and managing Nodes.
- Understanding of Hadoop components, Installing Hadoop components alongside MapR Services.
- Accessing Data on cluster including via NFS Managing services & nodes.
- Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/ analyzing, and monitoring metrics to monitor performance, configuring and administering MapR security.
- Cluster configuration and tuning for optimum performance.
- In and out Hive components knowledge
- In and out Spark components knowledge
- Python and Java skills
- Troubleshooting and Debugging Skills when any job failure occurs, or cluster slowness reported.
- Deployment of Hadoop Clusters
- Monitoring multiple Hadoop Clusters environments using Ambari Metrics, MCS and Grafana
- Configuring Name Node, Data Node, Resource Manager and HiveServer2 High Availability and Clusters service coordination using Zookeeper
- Configuring Kerberos for Authentication
- Performing Minor and Major Upgrade
- Integrating Hadoop Cluster components with LDAP, Active Directory and enabling SSL for Hadoop Cluster Component
- Kerberized Hadoop Cluster using Grafana
- Working knowledge of Python and Jav
- Nice to have experience with docker based Hadoop clusters
- Nice to have experience with Data Science leveraging Hadoop/Spa
- Nice to have experience withcloud (Azure, GCP, OCI, AWS) based Hadoop implementation
- Experience with large scale and high concurrency environments
- Strong working knowledge of queries in SQL, Hive Query Language HiveQL.
- knowledge on Hadoop Application Frameworks like Hive and Spark.
- Experience with System Integration, Capacity Planning, Performance Tuning, System Monitoring, System Security and Load Balancing.
- Architecting Backup, Disaster Recovery, Root Cause Analysis for Hadoop, Troubleshooting Hadoop Cluster issues
Key Desired Skills, Experience, & Knowledge: - 10+ years of Hadoop experience
- Demonstrated success in communication, collaboration and motivation of cross-functional departments to achieve exceptional service.
- Strategic thinking, complex problem solving and analytical capabilities.
- Excellent Communication, diagnostic and issue resolution skills
- Ability to see the big-picture and take a holistic approach to problem-solving.
- Ability to take complex information and communicate in a manner that is understandable to all audiences.
- Demonstrated understanding of the evolving landscape of technology.
- Ability to effectively network with colleagues to share knowledge and gain new perspectives.
- Ability to manage global teams
- Hadoop Distribution: MapR, Apache Hadoop (ODP), Cloudera/Hortonworks.
- Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Sqoop, Zookeeper, HBase, Spark.
- Configuration Management: Ambari, MapR, and Cloudera Manager.
- Security: LDAP, AD SSL, and Kerberos.
- Languages, Shell Scripting, bash, Spark SQL, HiveQL, Python, Java
- Monitoring Tools: Ambari, Grafana, Kibana.
- Package Management: RPM, YUM.
- Networking and Protocols: LAN, WAN, TCP/IP, NFS, LDAP, DNS.
- Operating Systems: RHEL, Oracle Enterprise Linux, Cent OS, Ubuntu.
- Job Scheduling Tools: Crontab, Airflow, Oozie.