This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Bioptimus is building the first universal AI foundation model for biology to fuel breakthrough discoveries and accelerate innovation in biomedicine. We are looking for a meticulous and detail-oriented Biology Data Quality Engineer to ensure the integrity and usability of the various and complex datasets that are central to our mission. In this critical role, you'll leverage your expertise in biology, data science, and machine learning to ensure the quality and consistency of biological data used to train and evaluate our foundation models.
Job Responsibility:
Data Validation Pipeline Development: Develop and implement comprehensive data validation protocols for diverse biological datasets (histology, omics, clinical)
Ensure data integrity, consistency, and accuracy through rigorous quality checks
Design and implement automated data quality pipelines
Data Curation & Standardization: Establish and enforce data standardization practices
Curate datasets to enhance their usability for machine learning
Collaboration & Communication: Work closely with the R&D team to understand data requirements and address data quality concerns
Communicate data quality findings and recommendations effectively
Communicate and synchronize with external data providers
Documentation & Reporting: Maintain a detailed documentation of the data-quality assessment procedures, validation results, and data specifications
Generate regular reports on data quality metrics and trends
Data Source Evaluation: Evaluate and validate external public data sources
Continuous Improvement: Stay up-to-date with the latest data quality best practices and tools
Propose and implement improvements to our data-quality assessment processes and pipelines
Requirements:
MSc in Biology, Computational Biology, Bioinformatics
Deep understanding of transcriptomics data types (bulk, single-cell, spatial) and their specific quality considerations
Good knowledge of genomics and proteomics data
Proven experience in implementing data quality control procedures and pipelines
Familiarity with data validation tools and techniques
Strong analytical and problem-solving skills
Proficiency in Python
Good knowledge of data visualization libraries (e.g. matplotlib)
Excellent written and verbal communication skills
Nice to have:
Computational Pathology Data Expertise: Experience in machine learning analysis of histology images
Cloud expertise: Experience working with AWS
Data Annotation Experience: Experience with developing and implementing data annotation guidelines and processes
Experience with data ontologies
Proven experience building or contributing to large-scale data collections (e.g. Human Cell Atlas)
Spatial alignment of multimodal datasets (e.g. alignment between different imaging modalities)
What we offer:
Competitive salary and equity package
Flexible work arrangements, including remote options
Opportunities for professional growth and leadership development