This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
As a Staff Machine Learning Engineer, you will be the overall tech lead of a single AI/Machine Learning team, responsible for the tech design and tech health of the team. You will build and architect scalable and reliable AIML solutions that align with the company's tech paved path and stakeholder requirements. This role requires a minimum of 6 years of relevant experience.
Job Responsibility:
Architect scalable and reliable AIML solutions that align with the company's tech paved path and stakeholder requirements
Develop and implement Software Development Lifecycle (SDLC) best practices for machine learning projects, ensuring scalable, secure, and reliable systems from model development to production deployment
Define the product roadmap for machine learning solutions and establish feature backlogs
Prioritize key ML features in collaboration with product managers, aligning them with business objectives and technical feasibility
Debug and troubleshoot model performance issues, track key metrics, and continuously enhance model reliability, speed, and efficiency in production environments
Own the complete lifecycle of ML models, including monitoring, retraining, finetuning and managing versions of models to ensure they continue to meet business needs over time
Guide and mentor machine learning engineers, promote best practices in software engineering, model development, and deployment
Lead technical decision-making processes and foster collaboration within the team
Requirements:
Bachelor’s degree in Machine Learning, Computer Science, Statistics, Mathematics, or a related field
an advanced degree (master’s or Ph.D.) is highly desirable
At least 6 years of hands-on experience in machine learning and software engineering
Deep proficiency in programming languages such as Python, Java, or similar, with a strong emphasis on coding excellence
Proficiency in AIML frameworks such as TensorFlow, PyTorch, Scikit-learn, Langchain, langraph, etc.
Experience with SQL, Spark, and scripting languages such as Python for data processing and model development
Expertise in cloud platforms (AWS, Azure, GCP) and containerization technologies such as Docker, as well as orchestration tools like Kubernetes
Proven experience in deploying machine learning systems in a production environment, ensuring scalability, reliability, and high availability
Extensive experience with object-oriented design (OOD), design patterns, and writing clean, maintainable code
Solid understanding of distributed systems and the challenges associated with scaling machine learning models in production
Expertise in implementing MLOps practices, including setting up continuous integration (CI), continuous delivery (CD), automated testing, and deployment pipelines for machine learning models
Strong understanding of system architecture, performance optimization, and the ability to design fault-tolerant systems that handle large-scale data and high-volume requests
Experience designing, building, and maintaining ETL pipelines, streamlining data collection, transformation, and storage for model development
Proficient in containerizing applications using Docker and managing deployment and scaling using Kubernetes or similar orchestrators
Experience setting up monitoring and logging systems for tracking model performance in production environments and ensuring efficient resource utilization
Nice to have:
3 years interfacing directly with internal business stakeholders and/or external stakeholders on AIML initiatives
Working experience with cloud provider solutions such as Azure and AWS
Experience utilizing both open source (e.g. llama, Qwen, Mistral) and proprietary (e.g. GPT, Claude) LLMs for appropriate tasks
Experience with tools that power LLM-based AI agents: eval frameworks, agent tooling, RAG pipelines, prompt engineering, etc.
Experience building LLM-based AI agent workflows via both no code/low code and traditional high-code development environments
Experience in ideating, integrating, and designing applications and frontends using React or similar
What we offer:
Comprehensive Total Rewards program that offers personalized coverage tailor-made for you and your family’s overall well-being
Financial benefits including market-competitive compensation
a 401K savings plan vested from day one that offers a 6% match
performance and recognition-based incentives
and tuition assistance
Access to additional benefits like mental healthcare as well as fertility and adoption assistance
Supports flexibility- We provide workplace flexibility as well as our GEICO Flex program, which offers the ability to work from anywhere in the US for up to four weeks per year