This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Improve BlackRock’s ability to enhance our retail sales distribution capabilities and services suite by creating, expanding and optimizing our data and data pipeline architecture. You will create and operationalize data pipelines to enable squads to deliver high quality data-driven product. You will be accountable for managing high-quality datasets exposed for internal and external consumption by downstream users and applications.
Job Responsibility:
Lead in the creation and maintenance of optimized data pipeline architectures on large and complex data sets
Assemble large, complex data sets that meet business requirements
Act as lead to identify, design, and implement internal process improvements and relay to relevant technology organization
Work with stakeholders to assist in data-related technical issues and support their data infrastructure needs
Automate manual ingest processes and optimize data delivery subject to service level agreements
work with infrastructure on re-design for greater scalability
Keep data separated and segregated according to relevant data policies
Demonstrated ability to join a complex global team, collaborate crossfunctionally (data scientists, platform engineers, business stakeholders), and take ownership of major components of the data platform ecosystem and develop data ready tools to support their job
Be up-to-date with the latest tech trends in the big-data space and recommend them as needed
Identify, investigate, and resolve data discrepancies by finding the root cause of issues
work with partners across various cross-functional teams to prevent future occurrences
Requirements:
Overall 4+ years of hands-on experience in computer/software engineering with majority in big data engineering
4+ years of strong Python or Scala programming skills (Core Python and PySpark) including hands-on experience creating and supporting UDFs and modules like pytest
4+ years of experience with building and optimizing ‘big data’ pipelines, architectures, and data sets. Familiarity with data pipeline and workflow management tools (e.g., Airflow, DBT, Kafka)
4+ years of hands-on experience on developing on Spark in a production environment. Expertise on parallel execution, deciding resources and different modes of executing jobs is required
4+ years of experience using Hive (on Spark), Yarn (logs, DAG flow diagrams), Sqoop. Proficiency bucketing, partitioning, tuning and handling different file formats (ORC, PARQUET & AVRO)
4+ years of experience using Transact SQL (e.g., MS SQ Server, MySQL), No-SQL and GraphQL
Strong experience implementing solutions on Snowflake
Experience with data quality and validation frameworks, especially Great Expectations for automated testing
Strong understanding and use of Swagger/OpenAPI for designing, documenting, and testing RESTful APIs
Experience in deployment, maintenance, and administration tasks related to Cloud (AWS, Azure Preferred), OpenStack, Docker, Kafka and Kubernetes. Familiarity with CI/CD pipelines for data pipeline automation and deployment (Jenkins, GitLab CI, Azure DevOps)
Experience with data governance, metadata management, and data lineage using tools like Axon and Unity Catalog. Expertise in managing business glossaries, data access control, auditing, and ensuring centralized governance across data assets in both cloud and hybrid environments
Hands-on experience with Databricks, including notebooks, workflows, and ML integrations
Experience with working with global teams across different time zones
Nice to have:
Experience with Machine Learning and Artificial Intelligence
Experience with Generative Artificial Intelligence