This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Lead PySpark Engineer, you will design, develop, and fix complex data processing solutions using PySpark on AWS. You will work hands-on with code, modernising legacy data workflows and supporting large-scale SAS-to-PySpark migrations. The role requires strong engineering discipline, deep data understanding, and the ability to deliver production-ready data pipelines in a financial services environment.
Job Responsibility:
Design, develop, and fix complex data processing solutions using PySpark on AWS
Work hands-on with code, modernising legacy data workflows and supporting large-scale SAS-to-PySpark migrations
Deliver production-ready data pipelines in a financial services environment
Requirements:
Minimum 5+ years of hands-on PySpark experience
SAS to Pyspark migration experience
Proven ability to write production-ready PySpark code
Strong understanding of data and data warehousing concepts, including: ETL/ELT, Data models, Dimensions and facts, Data marts, SCDs
Strong knowledge of Spark execution concepts, including partitioning, optimisation, and performance tuning
Experience troubleshooting and improving distributed data processing pipelines
Strong Python coding skills with the ability to refactor, optimise, and stabilise existing codebases
Experience implementing parameterisation, configuration, logging, exception handling, and modular design
Strong foundation in SAS (Base SAS, SAS Macros, SAS DI Studio)
Experience understanding, debugging, and modernising legacy SAS code
Ability to understand end-to-end data flows, integrations, orchestration, and CDC
Experience writing and executing data and ETL test cases
Ability to build unit tests, comparative testing, and validate data pipelines
Proficiency in Git-based workflows, branching strategies, pull requests, and code reviews
Ability to document code, data flows, and technical decisions clearly
Exposure to CI/CD pipelines for data engineering workloads
Strong understanding of core AWS services, including: S3, EMR / Glue, Workflows, Athena, IAM
Experience building and operating data pipelines on AWS
Big data processing on cloud platforms
Nice to have:
Experience in banking or financial services
Experience working on SAS modernisation or cloud migration programmes
Familiarity with DevOps practices and tools
Experience working in Agile/Scrum delivery environments