CrawlJobs Logo

AIOps Automation Engineering Lead

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

The Engineering Lead Analyst is a senior level position responsible for leading a variety of engineering activities including the design, acquisition and deployment of hardware, software and network infrastructure in coordination with the Technology team. The position is within the Production Management AIOps Organization that is at the forefront of transforming production management and operations through cutting-edge technologies. The incumbent will lead the efforts to automate the routine production tasks, enhance predictive capabilities, reduce manual intervention and ensure integration of AI into existing operational workflows.

Job Responsibility:

  • Serve as a technology subject matter expert for internal and external stakeholders and provide direction for all firm mandated controls and compliance initiatives, all projects within the group and in creating a technology domain roadmap
  • ensure that all integration of functions meet business goals
  • define necessary system enhancements to deploy new products and process enhancements
  • recommend product customization for system integration
  • identify problem causality, business impact and root causes
  • exhibit knowledge of how own specialty area contributes to the business and apply knowledge of competitors, products and services
  • advise or mentor junior team members
  • impact the engineering function by influencing decisions through advice, counsel or facilitating services
  • drive and implement rigorous quality standards for all aspects of the automation delivery from initial concept to final implementation
  • continually evolve the working practices within and services provided by Production Management (regionally and globally) to improve efficiency and productivity
  • continuous forward compatibility and acquisition of competency around automation, Artificial Intelligence, Robotics Process Automation, predictive analytics, etc.
  • decision analytics and technology platforms to deliver immediate results and long-term business impact
  • develop predictive models that will form the basis of information-driven strategies executed with respect to services provided by Production Management

Requirements:

  • 10+ years of relevant experience in an Engineering role
  • experience working in Financial Services or a large complex and/or global environment
  • project management experience
  • J2EE/microservices development experience of running applications in cloud native environments (Google Cloud, AWS, API Gateway technologies)
  • strong proficiency in JavaScript, including experience with ReactJS and NodeJS
  • experience with MongoDB or other NoSQL databases
  • solid understanding of Python and experience with relevant libraries
  • experience with version control systems like Git
  • knowledge of CI/CD pipelines and DevOps practices is a plus
  • consistently demonstrates clear and concise written and verbal communication
  • comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • demonstrated analytic/diagnostic skills
  • ability to work in a matrix environment and partner with virtual teams
  • ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • proven track record of operational process change and improvement

Nice to have:

  • knowledge of CI/CD pipelines and DevOps practices
  • project management experience
What we offer:
  • Equal opportunity employer
  • consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law

Additional Information:

Job Posted:
May 03, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AIOps Automation Engineering Lead

New

MTS, Systems Architecture Engineering

The System Architecture Engineer's role is to develop and evolve technical netwo...
Location
Location
United States , Bellevue; Overland Park; Frisco
Salary
Salary:
142800.00 - 257600.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s/Advanced degree in Computer Science, Engineering, or related field. Equivalent experience considered
  • 7–10 years in system, network, or reliability engineering roles
  • Deep expertise in network infrastructure (Cisco, Juniper, Check Point, F5, A10, Infoblox, BIND, DNS)
  • Hands-on experience with observability tools: Dynatrace, ThousandEyes, SevOne, Splunk, ServiceNow AIOps, OTEL
  • Proficiency with automation tools (Terraform, Ansible, Chef, Puppet) and cloud deployments (AWS preferred)
  • Programming/scripting in Python, Go, or Shell
  • Experience with CI/CD pipelines, Kubernetes, and containerized environments
  • Communication
  • Technical Writing
  • Analytics
Job Responsibility
Job Responsibility
  • Develop and evolve technical network and service architectures and design strategies
  • Improve and protect the software, infrastructure, and network systems that power T-Mobile’s IT and customer-facing services
  • Ensure scalability, availability, performance, security, and reliability across applications and networks
  • Proactively identify and prevent network issues before they impact customers
  • Play a critical role in outage bridges, leveraging KPIs, telemetry, and AI-driven analytics to pinpoint problems
  • Create new designs, architectures, and standards for delivering software and network services
  • Improve scalability, latency, and efficiency of T-Mobile’s applications and network services
  • Contribute to cloud enablement, containerization, and microservices reliability
  • Manage improvement work, PoCs, and future automation projects
  • Diagnose and resolve complex issues in routers, firewalls, load balancers, DNS, and global traffic managers
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Principal Customer Success Manager

The Customer Success Architect position is a technical champion within the Custo...
Location
Location
United States , New York
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years experience, preferably in the IT management (ITOM)/APM fields
  • At least 5+ years experience in senior customer-facing positions as an Implementation Architect, Service Delivery Architect, or Lead Solution Architect
  • In-depth knowledge and hands-on experience in one or more of the following: Observability, Process Automation, Patching, AIOps
  • An in-depth understanding of infrastructure management and intelligent automation is preferred
  • Familiarity with cloud-native design patterns, microservices, and modern web-scale architectures
  • Excellent written and oral communication skills, analytical, self-motivated, and quick on-the-job learning skills
  • Effectively multitask between initiatives with minimal oversight and provide a positive customer service attitude.
Job Responsibility
Job Responsibility
  • Being the trusted partner for the customer on use-case and product functionality
  • Lead customers in the application of OpsRamp products and services offerings to meet their Business Outcomes
  • Develop a deep understanding of OpsRamp IT Operations Platform, architecture, and its capabilities through training and hands-on experience
  • Build on the technical design and architecture developed during the implementation phase to maintain a point-in-time architecture for each customer
  • Serve as an important source for information regarding the customer’s technical needs and provide customer feedback
  • Perform and own the health checks during the customer success engagement lifecycle in a client environment
  • Understand and document client use cases and build best practice enablement and content packs for the various use cases
  • Track support and feature requirements and interface with the Product and Engineering team where required
  • Establish technical authority quickly with executive technical customer stakeholders
  • Invest time in documenting best practices, capturing and disseminating knowledge, and other initiatives.
What we offer
What we offer
  • Flexibility to manage work and personal needs
  • Health and emotional wellbeing support
  • Personal and professional development programs
  • Unconditional inclusion
  • Career growth and skill application programs.
  • Fulltime
Read More
Arrow Right
New

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
United States
Salary
Salary:
150000.00 - 225000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • At least 3+ years in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
What we offer
What we offer
  • Equity
  • Generous benefits program
  • Fulltime
Read More
Arrow Right

Director, Product Management (AIOps)

PagerDuty is looking for a Director of Product Management, AIOps, to lead the ch...
Location
Location
Canada
Salary
Salary:
156000.00 - 237000.00 CAD / Year
https://www.pagerduty.com Logo
PagerDuty
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Minimum of 10 years in product management, with a proven track record of success in the software industry
  • Experience applying data science and AI to deliver intelligent applications
  • Understanding of monitoring and observability, or event intelligence products and markets, including customer needs, key trends, and the competitive landscape
  • Industry experience with DevOps engineers, SREs, platform engineers and ITOps teams
  • Proven ability to build and scale product ecosystems and integrations with internal platforms and 3rd party systems
  • Self-motivated user of AI tools for personal productivity and the craft of product, a bias towards amplifying team output with AI over adding headcount to scale
  • Experience managing teams of product managers: hiring for intelligence, curiosity, and collaboration and focusing on coaching, empowerment, and accountability
  • Exceptional analytical, strategic thinking, and problem-solving abilities
Job Responsibility
Job Responsibility
  • Develop unique, market-leading AIOps capabilities that leverage PagerDuty’s enriched event streams from a robust and growing set of integrations
  • Ensure AIOps integrates seamlessly with AI & Automation and Incident Management to drive customer outcomes
  • Drive the strategy for expanding the ecosystem of integrations, ensuring seamless ingestion, enrichment, and actionability of signals from a diverse set of sources
  • Drive product decisions balancing customer value (measured through product engagement and customer feedback) with financial impact (measured by win rates, retention and ARR)
  • Lead and grow a world-class team of Product Managers, with diverse backgrounds and perspectives, developing talent through both formal training and hands-on experience
  • Guide the cultural transformation to usage-based pricing by setting product engagement targets to measure product success and iterating on packaging and pricing models
  • Maintain a balanced roadmap that optimizes for functional, architectural, interoperability, customer value, and revenue goals and ensures effective allocation of resources
  • Clearly articulate the vision and product plans and drive alignment between product development, go-to-market teams, executives, and the rest of the company
  • Partner closely with solutions consulting, product marketing and sales enablement to ensure effective training and messaging about features and their value is delivered at scale
  • Drive the success of the product in terms of customer satisfaction and business results while monitoring the market for shifts in customer needs, technology, and competitive dynamics
What we offer
What we offer
  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)
  • Fulltime
Read More
Arrow Right

Technology Outbound Product Manager

Join the innovators of OpsRamp as its technology product management leader, resp...
Location
Location
India , Bangalore
Salary
Salary:
Not provided
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor's degree in marketing, engineering, computer science, or a related field
  • MBA or advanced technical degree preferred
  • 4+ years of experience in technical marketing, product marketing, or product management, or pre-sales in observability, ITOM, log management, SaaS and enterprise software, or IT infrastructure industries
  • Knowledge/experience with SaaS software preferred
  • Public cloud experience is a plus
  • Knowledge of application modernization (e.g., Kubernetes), automation (python, pipelines, PowerShell, etc.) is a plus
  • Proven track record of developing and executing successful GTM strategies and campaigns that drive awareness, demand generation, and market leadership
  • Excellent written and verbal communication skills, with the ability to distill complex technical concepts into clear, concise, and compelling messaging and content
  • Strong analytical skills and experience conducting market and competitive analysis to identify key trends, insights, and opportunities
  • Ability to work effectively in a fast-paced, dynamic environment with cross-functional teams and multiple stakeholders
Job Responsibility
Job Responsibility
  • Develop and execute technical evangelizing strategies to drive awareness, demand generation, and market leadership for OpsRamp solutions
  • Collaborate with product management and engineering teams to deeply understand product features, capabilities, and roadmaps, and translate them into compelling value propositions, messaging, and content
  • Create and maintain a wide range of technical collateral, including whitepapers, solution briefs, presentations, videos, demos, and blog posts
  • Drive the creation and delivery of technical enablement materials to support technical sales, partners, and customers, including training presentations, FAQs, and technical guides
  • Conduct market and competitive analysis to identify key trends, insights, and opportunities to differentiate OpsRamp in the ITOM market
  • Serve as a technical evangelist and spokesperson for OpsRamp at industry events, conferences, webinars, and customer meetings
  • Collaborate with product marketing and corporate marketing teams to develop technical content that drives engagement, leads, and pipeline
  • Gather key customer and target audience insights to inform product positioning and messaging as well as the product roadmap
  • Contribute to GTM strategy and messaging, and help maintain technical accuracy of marketing messages.
What we offer
What we offer
  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion
  • Fulltime
Read More
Arrow Right

Intermediate Software Engineer SRE – AI

At PointClickCare our mission is simple: to help providers deliver exceptional c...
Location
Location
Canada , Mississauga
Salary
Salary:
115000.00 - 128000.00 CAD / Year
pointclickcare.com Logo
PointClickCare
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years' experience in software engineering
  • Experience with SRE principles
  • Experience with AI/ML in production environments
  • A passion for automation, intelligent systems, and operational excellence
  • Strong debugging, problem-solving, and system design skills
  • Languages: Python, Java, Bash, Terraform
  • Platforms: Azure, Kubernetes, Docker
  • Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions
  • ML/AI: MCP framework, AI agents, Vector store, Agent orchestration (LangChain), RAG
  • CI/CD: Jenkins, ArgoCD, Spinnaker
Job Responsibility
Job Responsibility
  • Build ML-based anomaly detection and pattern recognition systems
  • Enhance telemetry with smart tagging and metadata for better AI insights
  • Develop event-driven workflows and self-healing systems using AI triggers
  • Automate incident response with generative AI and custom AI agent orchestration
  • Use time-series forecasting and predictive modelling to anticipate failures
  • Optimise infrastructure with AI-powered autoscaling and cost-aware resource allocation
  • Build scalable, fault-tolerant systems in a cloud-native environment
  • Participate in on-call rotations and lead incident response for critical systems
  • Skilled in API integration for streamlined data exchange and system connectivity
  • Run internal AIOps workshops and help teams adopt AI maturity models
What we offer
What we offer
  • Benefits starting from Day 1
  • Retirement Plan Matching
  • Flexible Paid Time Off
  • Wellness Support Programs and Resources
  • Parental & Caregiver Leaves
  • Fertility & Adoption Support
  • Continuous Development Support Program
  • Employee Assistance Program
  • Allyship and Inclusion Communities
  • Employee Recognition … and more
  • Fulltime
Read More
Arrow Right

Principal Engineer

The Principal AI/ML Operations Engineer leads the architecture, automation, and ...
Location
Location
United States , Pleasanton, California
Salary
Salary:
251000.00 - 314500.00 USD / Year
blackline.com Logo
BlackLine
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
  • 10+ years in ML infrastructure, DevOps, and software system architecture
  • 4+ years in leading MLOps or AI Ops platforms
  • Strong programming skills in languages such as Python, Java, or Scala
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow)
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure)
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management
  • Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads
Job Responsibility
Job Responsibility
  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs)
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments
  • Lead incident response and reliability strategies for ML/AI systems
  • Lead the deployment of AI models and systems in various environments
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications
  • Ensure seamless integration with different platforms and technologies
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance
  • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics
What we offer
What we offer
  • short-term and long-term incentive programs
  • robust offering of benefit and wellness plans
  • Fulltime
Read More
Arrow Right

Principal Site Reliability Engineer

Groupon is modernizing its global platform — and reliability is at the center of...
Location
Location
Colombia
Salary
Salary:
Not provided
groupon.com Logo
Groupon
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years in software/systems engineering
  • 5+ years in SRE or platform reliability
  • Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform
  • Proficiency in Python or Go for automation and tooling
  • Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy)
  • Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations
  • Strong communication and influencing skills — data over hierarchy
Job Responsibility
Job Responsibility
  • Architect and maintain self-healing systems with 99.9%+ availability targets
  • Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns
  • Implement adaptive SLIs/SLOs that evolve automatically from real-time data
  • Build AIOps-based observability and auto-remediation pipelines
  • Apply predictive modeling to forecast failures before they impact users
  • Lead chaos, performance, and resilience testing programs
  • Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance
  • Mentor engineers and drive reliability standards across teams
  • Partner with platform, data, and product teams to ensure stability aligns with business goals
  • Support major incident response, incident review, and participate in on-call rotations
What we offer
What we offer
  • The opportunity to work with cutting-edge technologies in a transformative environment
  • Professional growth and leadership development pathways tailored to your aspirations
  • A chance to leave a lasting impact by shaping the future of reliable and scalable systems
Read More
Arrow Right