This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
We are looking for an experienced Lead Data Engineer to oversee the design, implementation, and management of advanced data infrastructure in Houston, Texas. This role requires expertise in architecting scalable solutions, optimizing data pipelines, and ensuring data quality to support analytics, machine learning, and real-time processing. The ideal candidate will have a deep understanding of Lakehouse architecture and Medallion design principles to deliver robust and governed data solutions.
Job Responsibility:
Develop and implement scalable data pipelines to ingest, process, and store large datasets using tools such as Apache Spark, Hadoop, and Kafka
Utilize cloud platforms like AWS or Azure to manage data storage and processing, leveraging services such as S3, Lambda, and Azure Data Lake
Design and operationalize data architecture following Medallion patterns to ensure data usability and quality across Bronze, Silver, and Gold layers
Build and optimize data models and storage solutions, including Databricks Lakehouses, to support analytical and operational needs
Automate data workflows using tools like Apache Airflow and Fivetran to streamline integration and improve efficiency
Lead initiatives to establish best practices in data management, facilitating knowledge sharing and collaboration across technical and business teams
Collaborate with data scientists to provide infrastructure and tools for complex analytical models, using programming languages like Python or R
Implement and enforce data governance policies, including encryption, masking, and access controls, within cloud environments
Monitor and troubleshoot data pipelines for performance issues, applying tuning techniques to enhance throughput and reliability
Stay updated with emerging technologies in data engineering and advocate for improvements to the organization's data systems
Requirements:
Bachelor’s degree in Computer Science, Engineering, or a related field with 10+ years of experience in data engineering, or a Master’s degree with 5+ years of relevant experience
Proven expertise in designing and implementing Medallion Architecture within a Databricks Lakehouse environment
Proficiency in big data technologies such as Apache Spark, Hadoop, and Kafka
Extensive experience with cloud platforms like AWS and Azure, including integration of storage and compute services
Strong programming skills in Python, Java, or Scala, with hands-on experience in data modeling and stored procedures
Knowledge of tools and platforms like Apache Airflow, Databricks, and Dataiku
Familiarity with ETL processes and machine learning model deployment
Excellent problem-solving skills and ability to optimize data systems for performance and scalability