This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are seeking a highly skilled and motivated Data Engineer specializing in Production Support for AWS EMR (Elastic MapReduce) with spark, scala, Talend or any ETL tool knowledge to join our dynamic team. The ideal candidate will ensure the smooth operation, performance, and stability of large-scale distributed data processing pipelines and applications deployed on AWS EMR. This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.
Job Responsibility:
Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
Investigate and debug data processing failures, latency issues, and performance bottlenecks
Provide support for mission-critical production systems as part of an on-call rotation
Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
Ensure effective resource utilization and cost optimization of clusters
Apply patches and upgrades to EMR clusters and software components as needed
Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency
Identify and address inefficiencies in data storage and access patterns
Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance
Develop alerting mechanisms and dashboards for proactive issue identification
Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues
Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows
Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices
Requirements:
Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
Familiarity with data loading tools like Talend, Sqoop
Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
Knowledge of workflow/schedulers like Oozie or Apache AirFlow
Strong knowledge of Shell Scripting, python or Java for scripting and automation
Familiarity with SQL and query optimization techniques
Experience in production support for large-scale distributed systems or data platforms
Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
Implement data modelling concepts, methodologies to optimize data warehouse solutions
Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
Strong analytical skills to debug complex systems and resolve performance bottlenecks
Effective communication skills to coordinate with cross-functional teams
A proactive and customer-focused attitude to provide excellent production support
Bachelor’s degree in computer science, Engineering, or a related field
10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role
Nice to have:
Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments
Familiarity with container orchestration tools (e.g., Kubernetes, Docker)
Knowledge of data governance, security, and compliance in cloud environments
Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect)