Hadoop Admin
Location: On-Site, Houston, TX

Job Description:

Required Skills/Responsibilities:

?Expertise and knowledge: Cloudera Data Platform, Oozie, Hive, Spark, Spark Streaming and Presto

Data Pipeline Development:
 

  • Design, develop, and implement scalable data pipelines using Cloudera tools like Hadoop, Spark, Hive, Impala, and HDFS.
     
  • Write and optimize ETL processes to extract, transform, and load data into data lakes or warehouses.
     

Big Data Application Development:
 

  • Develop applications to process large datasets efficiently using frameworks such as Apache Spark and MapReduce.
     
  • Build solutions for batch and real-time data processing.
     

Cluster Management:
 

  • Work with Cloudera Manager for cluster setup, configuration, monitoring, and performance optimization.
     
  • Ensure high availability and scalability of Cloudera clusters.
     
  • -??System dimensioning (computational resources/Storage/Networks).
     

    -??System reconfiguration in case of HW extension and/or replacement.
     

    -??OS and Cloudera Software upgrades.
     

    -??Cloudera SW vulnerabilities and patching management.
     

    -??Access and permission management.
     

    -??Installation of any other Cloudera application if needed.
     

Data Storage and Management:
 

  • Design and implement data storage strategies using HDFS, HBase, and other Cloudera-supported tools.
     
  • Optimize data storage and retrieval processes to improve performance.
     

Performance Tuning:
 

  • Monitor and optimize the performance of Hadoop and Spark jobs.
     
  • Troubleshoot and resolve performance bottlenecks in data pipelines.
     
  • -??Assist in Designing scalable architectures for high volume data.
     

    -??Ensure E2E pipeline stability for already developed and future use cases.
     

    -??Performance tuning of Spark workflows.
     

Integration and Collaboration:
 

  • Integrate Cloudera solutions with external systems, databases, and APIs.
     
  • Collaborate with data scientists, analysts, and other teams to understand requirements and deliver data solutions.
     



Key Skills:

  • Cloudera Data Platform, Oozie, Hive, Spark, Spark Streaming and Presto