CrawlJobs Logo

Data Engineer (Production Support) for AWS EMR

nttdata.com Logo

NTT DATA

Location Icon

Location:
China , Shangai

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

Not provided

Job Description:

We are seeking a highly skilled and motivated Data Engineer specializing in Production Support for AWS EMR (Elastic MapReduce) with spark, scala, Talend or any ETL tool knowledge to join our dynamic team. The ideal candidate will ensure the smooth operation, performance, and stability of large-scale distributed data processing pipelines and applications deployed on AWS EMR. This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.

Job Responsibility:

  • Monitor, troubleshoot, and resolve issues in real-time for AWS EMR clusters and associated data pipelines
  • Investigate and debug data processing failures, latency issues, and performance bottlenecks
  • Provide support for mission-critical production systems as part of an on-call rotation
  • Manage AWS EMR cluster lifecycle, including creation, scaling, termination, and optimization
  • Ensure effective resource utilization and cost optimization of clusters
  • Apply patches and upgrades to EMR clusters and software components as needed
  • Maintain and support ETL/ELT pipelines built on tools such as Apache Spark, Hive, or Presto running on EMR
  • Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift,Mysql or Snowflake
  • Implement and monitor automated workflows using AWS tools like Step Functions, Lambda, and CloudWatch
  • Analyze and optimize EMR job performance by tuning Spark/Hive configurations and improving query efficiency
  • Identify and address inefficiencies in data storage and access patterns
  • Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance
  • Develop alerting mechanisms and dashboards for proactive issue identification
  • Provide daily/weekly monitoring reports on job status and alert on any long running/resource consuming issues
  • Collaborate with software developers, data scientists, and DevOps teams to resolve issues and optimize workflows
  • Maintain comprehensive documentation for troubleshooting guides, operational workflows, and best practices

Requirements:

  • Proficiency in managing AWS services, particularly EMR, S3, Lambda, Step Functions, and CloudWatch
  • Hands-on experience with distributed data processing frameworks like Apache Spark, Hive, or Presto
  • Experience on Kafka, NiFi, Amazon Web Service (AWS), Maven, Ambari-TEZ, Stash and Bamboo
  • Familiarity with data loading tools like Talend, Sqoop
  • Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
  • Knowledge of workflow/schedulers like Oozie or Apache AirFlow
  • Strong knowledge of Shell Scripting, python or Java for scripting and automation
  • Familiarity with SQL and query optimization techniques
  • Experience in production support for large-scale distributed systems or data platforms
  • Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios
  • Implement data modelling concepts, methodologies to optimize data warehouse solutions
  • Manage detailed Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
  • Strong analytical skills to debug complex systems and resolve performance bottlenecks
  • Effective communication skills to coordinate with cross-functional teams
  • A proactive and customer-focused attitude to provide excellent production support
  • Bachelor’s degree in computer science, Engineering, or a related field
  • 10+ years of experience with atleast 3-5 years on AWS Cloud platform experience in data engineering, production support, or a similar role

Nice to have:

  • Experience with CI/CD tools like Jenkins or GitLab for pipeline deployments
  • Familiarity with container orchestration tools (e.g., Kubernetes, Docker)
  • Knowledge of data governance, security, and compliance in cloud environments
  • Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect)

Additional Information:

Job Posted:
February 14, 2026

Work Type:
Remote work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for Data Engineer (Production Support) for AWS EMR

Software Engineer (Data Engineering)

We are seeking a Software Engineer (Data Engineering) who can seamlessly integra...
Location
Location
India , Hyderabad
Salary
Salary:
Not provided
nstarxinc.com Logo
NStarX
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 4+ years in Data Engineering and AI/ML roles
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field
  • Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas
  • Apache Spark on EMR (driver/executor model, sizing, dynamic allocation)
  • Amazon S3 (Parquet) with lifecycle management to Glacier
  • AWS Glue Catalog and Crawlers
  • AWS Step Functions, AWS Lambda, Amazon EventBridge
  • CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK)
  • Amazon Redshift and Redshift Spectrum
  • IAM (least privilege), Secrets Manager, SSM
Job Responsibility
Job Responsibility
  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing
  • Develop and optimize data architectures supporting analytics and ML workflows
  • Ensure data integrity, security, and compliance with organizational and industry standards
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments
  • Build predictive and prescriptive models leveraging AI and ML techniques
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn
  • Perform feature engineering, statistical analysis, and data preprocessing
  • Continuously monitor and optimize models for accuracy and scalability
  • Integrate AI-driven insights into business processes and strategies
  • Serve as the technical liaison between NStarX and client teams
What we offer
What we offer
  • Competitive salary and performance-based incentives
  • Opportunity to work on cutting-edge AI and ML projects
  • Exposure to global clients and international project delivery
  • Continuous learning and professional development opportunities
  • Competitive base + commission
  • Fast growth into leadership roles
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Senior Data Engineer position at Checkr, building the data platform to power saf...
Location
Location
United States , San Francisco
Salary
Salary:
162000.00 - 190000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of development experience in the field of data engineering
  • 5+ years writing PySpark
  • Experience building large-scale (100s of Terabytes and Petabytes) data processing pipelines - batch and stream
  • Experience with ETL/ELT, stream and batch processing of data at scale
  • Strong proficiency in PySpark and Python
  • Expertise in understanding of database systems, data modeling, relational databases, NoSQL (such as MongoDB)
  • Experience with big data technologies such as Kafka, Spark, Iceberg, Datalake and AWS stack (EKS, EMR, Serverless, Glue, Athena, S3, etc.)
  • Knowledge of security best practices and data privacy concerns
  • Strong problem-solving skills and attention to detail
Job Responsibility
Job Responsibility
  • Create and maintain data pipelines and foundational datasets to support product/business needs
  • Design and build database architectures with massive and complex data, balancing with computational load and cost
  • Develop audits for data quality at scale, implementing alerting as necessary
  • Create scalable dashboards and reports to support business objectives and enable data-driven decision-making
  • Troubleshoot and resolve complex issues in production environments
  • Work closely with product managers and other stakeholders to define and implement new features
What we offer
What we offer
  • Learning and development reimbursement allowance
  • Competitive compensation and opportunity for professional and personal advancement
  • 100% medical, dental, and vision coverage for employees and dependents
  • Additional vacation benefits of 5 extra days and flexibility to take time off
  • Reimbursement for work from home equipment
  • Lunch four times a week
  • Commuter stipend
  • Abundance of snacks and beverages
  • Fulltime
Read More
Arrow Right

Staff Data Engineer

Checkr is hiring an experienced Staff Data Engineer to join their Data Platform ...
Location
Location
United States , San Francisco; Denver
Salary
Salary:
166000.00 - 230000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years of designing, implementing and delivering highly scalable and performant data platform
  • experience building large-scale (100s of Terabytes and Petabytes) data processing pipelines - batch and stream
  • experience with ETL/ELT, stream and batch processing of data at scale
  • expert level proficiency in PySpark, Python, and SQL
  • expertise in data modeling, relational databases, NoSQL (such as MongoDB) data stores
  • experience with big data technologies such as Kafka, Spark, Iceberg, Datalake, and AWS stack (EKS, EMR, Serverless, Glue, Athena, S3, etc.)
  • an understanding of Graph and Vector data stores (preferred)
  • knowledge of security best practices and data privacy concerns
  • strong problem-solving skills and attention to detail
  • experience/knowledge of data processing platforms such as Databricks or Snowflake.
Job Responsibility
Job Responsibility
  • Architect, design, lead and build end-to-end performant, reliable, scalable data platform
  • monitor, investigate, triage, and resolve production issues as they arise for services owned by the team
  • mentor, guide and work with junior engineers to deliver complex and next-generation features
  • partner with engineering, product, design, and other stakeholders in designing and architecting new features
  • create and maintain data pipelines and foundational datasets to support product/business needs
  • experiment with rapid MVPs and encourage validation of customer needs
  • design and build database architectures with massive and complex data
  • develop audits for data quality at scale
  • create scalable dashboards and reports to support business objectives and enable data-driven decision-making
  • troubleshoot and resolve complex issues in production environments.
What we offer
What we offer
  • A fast-paced and collaborative environment
  • learning and development allowance
  • competitive cash and equity compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • up to $25K reimbursement for fertility, adoption, and parental planning services
  • flexible PTO policy
  • monthly wellness stipend
  • home office stipend
  • in-office perks such as lunch four times a week, a commuter stipend, and an abundance of snacks and beverages.
  • Fulltime
Read More
Arrow Right

Senior Principal Data Platform Software Engineer

We’re looking for a Sr Principal Data Platform Software Engineer (P70) to be a k...
Location
Location
Salary
Salary:
239400.00 - 312550.00 USD / Year
https://www.atlassian.com Logo
Atlassian
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 15+ years in Data Engineering, Software Engineering, or related roles, with substantial exposure to big data ecosystems
  • Demonstrated experience building and operating data platforms or large‑scale data services in production
  • Proven track record of building services from the ground up (requirements → design → implementation → deployment → ongoing ownership)
  • Hands‑on experience with AWS, GCP (e.g., compute, storage, data, and streaming services) and cloud‑native architectures
  • Practical experience with big data technologies, such as Databricks, Apache Spark, AWS EMR, Apache Flink, or StarRocks
  • Strong programming skills in one or more of: Kotlin, Scala, Java, Python
  • Experience leading cross‑team technical initiatives and influencing senior stakeholders
  • Experience mentoring Staff/Principal engineers and lifting the technical bar for a team or org
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
Job Responsibility
Job Responsibility
  • Design, develop and own delivery of high quality big data and analytical platform solutions aiming to solve Atlassian’s needs to support millions of users with optimal cost, minimal latency and maximum reliability
  • Improve and operate large‑scale distributed data systems in the cloud (primarily AWS, with increasing integration with GCP and Kubernetes‑based microservices)
  • Drive the evolution of our high-performance analytical databases and its integrations with products, cloud infrastructures (AWS and GCP) and isolated cloud environments
  • Help define and uplift engineering and operational standards for petabyte scale data platforms, with sub‑second analytic queries and multi‑region availability (coding guidelines, code review practices, observability, incident response, SLIs/SLOs)
  • Partner across multiple product and platform teams (including Analytics, Marketplace/Ecosystem, Core Data Platform, ML Platform, Search, and Oasis/FedRAMP) to deliver company‑wide initiatives that depend on reliable, high‑quality data
  • Act as a technical mentor and multiplier, raising the bar on design quality, code quality, and operational excellence across the broader team
  • Design and implement self‑healing, resilient data platforms with strong observability, fault tolerance, and recovery characteristics
  • Own the long‑term architecture and technical direction of Atlassian’s product data platform with projects that are directly tied to Atlassian’s company-level OKRs
  • Be accountable for the reliability, cost efficiency, and strategic direction of Atlassian’s product analytical data platform
  • Partner with executives and influence senior leaders to align engineering efforts with Atlassian’s long-term business objectives
What we offer
What we offer
  • health and wellbeing resources
  • paid volunteer days
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

We are looking for a Senior Data Engineer (SDE 3) to build scalable, high-perfor...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
https://cogoport.com/ Logo
Cogoport
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of experience in data engineering, working with large-scale distributed systems
  • Strong proficiency in Python, Java, or Scala for data processing
  • Expertise in SQL and NoSQL databases (PostgreSQL, Cassandra, Snowflake, Apache Hive, Redshift)
  • Experience with big data processing frameworks (Apache Spark, Flink, Hadoop)
  • Hands-on experience with real-time data streaming (Kafka, Kinesis, Pulsar) for logistics use cases
  • Deep knowledge of AWS/GCP/Azure cloud data services like S3, Glue, EMR, Databricks, or equivalent
  • Familiarity with Airflow, Prefect, or Dagster for workflow orchestration
  • Strong understanding of logistics and supply chain data structures, including freight pricing models, carrier APIs, and shipment tracking systems
Job Responsibility
Job Responsibility
  • Design and develop real-time and batch ETL/ELT pipelines for structured and unstructured logistics data (freight rates, shipping schedules, tracking events, etc.)
  • Optimize data ingestion, transformation, and storage for high availability and cost efficiency
  • Ensure seamless integration of data from global trade platforms, carrier APIs, and operational databases
  • Architect scalable, cloud-native data platforms using AWS (S3, Glue, EMR, Redshift), GCP (BigQuery, Dataflow), or Azure
  • Build and manage data lakes, warehouses, and real-time processing frameworks to support analytics, machine learning, and reporting needs
  • Optimize distributed databases (Snowflake, Redshift, BigQuery, Apache Hive) for logistics analytics
  • Develop streaming data solutions using Apache Kafka, Pulsar, or Kinesis to power real-time shipment tracking, anomaly detection, and dynamic pricing
  • Enable AI-driven freight rate predictions, demand forecasting, and shipment delay analytics
  • Improve customer experience by providing real-time visibility into supply chain disruptions and delivery timeline
  • Ensure high availability, fault tolerance, and data security compliance (GDPR, CCPA) across the platform
What we offer
What we offer
  • Work with some of the brightest minds in the industry
  • Entrepreneurial culture fostering innovation, impact, and career growth
  • Opportunity to work on real-world logistics challenges
  • Collaborate with cross-functional teams across data science, engineering, and product
  • Be part of a fast-growing company scaling next-gen logistics platforms using advanced data engineering and AI
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Big Data

Checkr is looking for a Senior Engineering Manager to lead the Criminal Data tea...
Location
Location
United States , San Francisco
Salary
Salary:
238000.00 - 280000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years as an engineering manager
  • 8+ years as an engineer
  • Exceptional verbal and written communication skills
  • Unparalleled bar for quality (data quality metrics, QC gates, data governance, automated regression test suites, data validations, etc)
  • Experience working on data products at scale and understanding the legal, human impact, and technical nuances of supporting a highly regulated product
  • Experience designing and maintaining: Real-time & batch processing data pipelines serving up billions of data points
  • Normalizing and cleansing data across a medallion lakehouse architecture
  • Systems that rely on high-volume, low-latency messaging infrastructure (e.g. Kafka or similar)
  • Highly tolerant production systems with streamlined operations (data lineage, logging, telemetry, alerting, etc.)
  • Familiarity with AWS Glue, OpenSearch, EMR, etc
Job Responsibility
Job Responsibility
  • Drive a motivating technical vision for the team
  • Partner closely with product management to solve business problems
  • Work with the team to build a world-class architecture that can scale into the next phase of Checkr’s growth
  • Hire the best talent and continue to raise the bar for the team
  • Represent the team in planning and product meetings
  • Optimize engineering processes and policies to drive velocity and quality
What we offer
What we offer
  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • Up to 25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend
  • Fulltime
Read More
Arrow Right

Senior Engineering Manager, Big Data

Checkr is looking for a Senior Engineering Manager to lead the Criminal Data tea...
Location
Location
United States , Denver
Salary
Salary:
201000.00 - 237000.00 USD / Year
https://checkr.com Logo
Checkr
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years as an engineering manager
  • 8+ years as an engineer
  • Exceptional verbal and written communication skills
  • Unparalleled bar for quality (data quality metrics, QC gates, data governance, automated regression test suites, data validations, etc)
  • Experience working on data products at scale and understanding the legal, human impact, and technical nuances of supporting a highly regulated product
  • Experience designing and maintaining: Real-time & batch processing data pipelines serving up billions of data points
  • Normalizing and cleansing data across a medallion lakehouse architecture
  • Systems that rely on high-volume, low-latency messaging infrastructure (e.g. Kafka or similar)
  • Highly tolerant production systems with streamlined operations (data lineage, logging, telemetry, alerting, etc.)
  • Familiarity with AWS Glue, OpenSearch, EMR, etc
Job Responsibility
Job Responsibility
  • Drive a motivating technical vision for the team
  • Partner closely with product management to solve business problems
  • Work with the team to build a world-class architecture that can scale into the next phase of Checkr’s growth
  • Hire the best talent and continue to raise the bar for the team
  • Represent the team in planning and product meetings
  • Optimize engineering processes and policies to drive velocity and quality
What we offer
What we offer
  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • Up to 25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend
  • Fulltime
Read More
Arrow Right

Senior Data Engineer

Location
Location
Salary
Salary:
Not provided
kloud9.nyc Logo
Kloud9
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of experience in developing scalable Big Data applications or solutions on distributed platforms
  • 4+ years of experience working with distributed technology tools, including Spark, Python, Scala
  • Working knowledge of Data warehousing, Data modelling, Governance and Data Architecture
  • Proficient in working on Amazon Web Services(AWS) mainly S3, Managed Airflow, EMR/ EC2, IAM etc.
  • Experience working in Agile and Scrum development process
  • 3+ years of experience in Amazon Web Services (AWS) mainly S3, Managed Airflow, EMR/ EC2, IAM etc.
  • Experience architecting data product in Streaming, Serverless and Microservices Architecture and platform
  • 3+ years of experience working with Data platforms, including EMR, Airflow, Databricks (Data Engineering & Delta)
  • Experience with creating/configuring Jenkins pipeline for smooth CI/CD process for Managed Spark jobs, build Docker images, etc.
  • Working knowledge of Reporting & Analytical tools such as Tableau, Quicksite etc.
Job Responsibility
Job Responsibility
  • Design and develop scalable Big Data applications on distributed platforms to support large-scale data processing and analytics needs
  • Partner with others in solving complex problems by taking a broad perspective to identify innovative solutions
  • Build positive relationships across Product and Engineering
  • Influence and communicate effectively, both verbally and written, with team members and business stakeholders
  • Quickly pick up new programming languages, technologies, and frameworks
  • Collaborate effectively in a high-speed, results-driven work environment to meet project deadlines and business goals
  • Utilize Data Warehousing tools such as SQL databases, Presto, and Snowflake for efficient data storage, querying, and analysis
  • Demonstrate experience in learning new technologies and skills.
What we offer
What we offer
  • Kloud9 provides a robust compensation package and a forward-looking opportunity for growth in emerging fields.
Read More
Arrow Right