CrawlJobs Logo

AIOps Automation Engineering Lead

https://www.citi.com/ Logo

Citi

Location Icon

Location:
India , Chennai

Category Icon

Job Type Icon

Contract Type:
Employment contract

Salary Icon

Salary:

Not provided

Job Description:

The Engineering Lead Analyst is a senior level position responsible for leading a variety of engineering activities including the design, acquisition and deployment of hardware, software and network infrastructure in coordination with the Technology team. The position is within the Production Management AIOps Organization that is at the forefront of transforming production management and operations through cutting-edge technologies. The incumbent will lead the efforts to automate the routine production tasks, enhance predictive capabilities, reduce manual intervention and ensure integration of AI into existing operational workflows.

Job Responsibility:

  • Serve as a technology subject matter expert for internal and external stakeholders and provide direction for all firm mandated controls and compliance initiatives, all projects within the group and in creating a technology domain roadmap
  • ensure that all integration of functions meet business goals
  • define necessary system enhancements to deploy new products and process enhancements
  • recommend product customization for system integration
  • identify problem causality, business impact and root causes
  • exhibit knowledge of how own specialty area contributes to the business and apply knowledge of competitors, products and services
  • advise or mentor junior team members
  • impact the engineering function by influencing decisions through advice, counsel or facilitating services
  • drive and implement rigorous quality standards for all aspects of the automation delivery from initial concept to final implementation
  • continually evolve the working practices within and services provided by Production Management (regionally and globally) to improve efficiency and productivity
  • continuous forward compatibility and acquisition of competency around automation, Artificial Intelligence, Robotics Process Automation, predictive analytics, etc.
  • decision analytics and technology platforms to deliver immediate results and long-term business impact
  • develop predictive models that will form the basis of information-driven strategies executed with respect to services provided by Production Management

Requirements:

  • 10+ years of relevant experience in an Engineering role
  • experience working in Financial Services or a large complex and/or global environment
  • project management experience
  • J2EE/microservices development experience of running applications in cloud native environments (Google Cloud, AWS, API Gateway technologies)
  • strong proficiency in JavaScript, including experience with ReactJS and NodeJS
  • experience with MongoDB or other NoSQL databases
  • solid understanding of Python and experience with relevant libraries
  • experience with version control systems like Git
  • knowledge of CI/CD pipelines and DevOps practices is a plus
  • consistently demonstrates clear and concise written and verbal communication
  • comprehensive knowledge of design metrics, analytics tools, benchmarking activities and related reporting to identify best practices
  • demonstrated analytic/diagnostic skills
  • ability to work in a matrix environment and partner with virtual teams
  • ability to work independently, multi-task, and take ownership of various parts of a project or initiative
  • ability to work under pressure and manage to tight deadlines or unexpected changes in expectations or requirements
  • proven track record of operational process change and improvement

Nice to have:

  • knowledge of CI/CD pipelines and DevOps practices
  • project management experience
What we offer:
  • Equal opportunity employer
  • consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law

Additional Information:

Job Posted:
May 03, 2025

Employment Type:
Fulltime
Work Type:
On-site work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for AIOps Automation Engineering Lead

MTS, Systems Architecture Engineering

The System Architecture Engineer's role is to develop and evolve technical netwo...
Location
Location
United States , Bellevue; Overland Park; Frisco
Salary
Salary:
142800.00 - 257600.00 USD / Year
https://www.t-mobile.com Logo
T-Mobile
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Master’s/Advanced degree in Computer Science, Engineering, or related field. Equivalent experience considered
  • 7–10 years in system, network, or reliability engineering roles
  • Deep expertise in network infrastructure (Cisco, Juniper, Check Point, F5, A10, Infoblox, BIND, DNS)
  • Hands-on experience with observability tools: Dynatrace, ThousandEyes, SevOne, Splunk, ServiceNow AIOps, OTEL
  • Proficiency with automation tools (Terraform, Ansible, Chef, Puppet) and cloud deployments (AWS preferred)
  • Programming/scripting in Python, Go, or Shell
  • Experience with CI/CD pipelines, Kubernetes, and containerized environments
  • Communication
  • Technical Writing
  • Analytics
Job Responsibility
Job Responsibility
  • Develop and evolve technical network and service architectures and design strategies
  • Improve and protect the software, infrastructure, and network systems that power T-Mobile’s IT and customer-facing services
  • Ensure scalability, availability, performance, security, and reliability across applications and networks
  • Proactively identify and prevent network issues before they impact customers
  • Play a critical role in outage bridges, leveraging KPIs, telemetry, and AI-driven analytics to pinpoint problems
  • Create new designs, architectures, and standards for delivering software and network services
  • Improve scalability, latency, and efficiency of T-Mobile’s applications and network services
  • Contribute to cloud enablement, containerization, and microservices reliability
  • Manage improvement work, PoCs, and future automation projects
  • Diagnose and resolve complex issues in routers, firewalls, load balancers, DNS, and global traffic managers
What we offer
What we offer
  • Competitive base salary and compensation package
  • Annual stock grant
  • Employee stock purchase plan
  • 401(k)
  • Access to free, year-round money coaches
  • Annual bonus or periodic sales incentive or bonus
  • Medical, dental and vision insurance
  • Flexible spending account
  • Paid time off and up to 12 paid holidays
  • Paid parental and family leave
  • Fulltime
Read More
Arrow Right

Lead Platform Engineer

After the launch of its flagship product, a fast-growing scale-up is expanding i...
Location
Location
United Kingdom , Bradford and Leeds
Salary
Salary:
85000.00 - 95000.00 GBP / Year
lawrenceharvey.com Logo
Lawrence Harvey
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Proven experience leading high-performing Platform/DevOps teams, including hybrid/offshore or partner resource models
  • Over 3 years of hands-on experience with Google Cloud Platform
  • Strong expertise in CI/CD design and build using GitHub, Terraform or similar
  • Experience supporting microservices / API-driven architectures
  • Comfortable working in fast-paced, product-led organisations with multiple stakeholders
Job Responsibility
Job Responsibility
  • Take ownership of its cloud platform and enable rapid, secure product delivery at scale
  • Lead a multi-disciplinary platform function across CI/CD, networking, security, AIOps, and observability
  • Build a robust self-service platform that empowers engineering squads
  • Shape platform strategy
  • Champion automation
  • Play an integral role in the development of future products
What we offer
What we offer
  • 15% Bonus
  • Fulltime
Read More
Arrow Right

Principal Customer Success Manager

The Customer Success Architect position is a technical champion within the Custo...
Location
Location
United States , New York
Salary
Salary:
115500.00 - 266000.00 USD / Year
https://www.hpe.com/ Logo
Hewlett Packard Enterprise
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10-15 years experience, preferably in the IT management (ITOM)/APM fields
  • At least 5+ years experience in senior customer-facing positions as an Implementation Architect, Service Delivery Architect, or Lead Solution Architect
  • In-depth knowledge and hands-on experience in one or more of the following: Observability, Process Automation, Patching, AIOps
  • An in-depth understanding of infrastructure management and intelligent automation is preferred
  • Familiarity with cloud-native design patterns, microservices, and modern web-scale architectures
  • Excellent written and oral communication skills, analytical, self-motivated, and quick on-the-job learning skills
  • Effectively multitask between initiatives with minimal oversight and provide a positive customer service attitude.
Job Responsibility
Job Responsibility
  • Being the trusted partner for the customer on use-case and product functionality
  • Lead customers in the application of OpsRamp products and services offerings to meet their Business Outcomes
  • Develop a deep understanding of OpsRamp IT Operations Platform, architecture, and its capabilities through training and hands-on experience
  • Build on the technical design and architecture developed during the implementation phase to maintain a point-in-time architecture for each customer
  • Serve as an important source for information regarding the customer’s technical needs and provide customer feedback
  • Perform and own the health checks during the customer success engagement lifecycle in a client environment
  • Understand and document client use cases and build best practice enablement and content packs for the various use cases
  • Track support and feature requirements and interface with the Product and Engineering team where required
  • Establish technical authority quickly with executive technical customer stakeholders
  • Invest time in documenting best practices, capturing and disseminating knowledge, and other initiatives.
What we offer
What we offer
  • Flexibility to manage work and personal needs
  • Health and emotional wellbeing support
  • Personal and professional development programs
  • Unconditional inclusion
  • Career growth and skill application programs.
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
United States
Salary
Salary:
150000.00 - 225000.00 USD / Year
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • At least 3+ years in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
What we offer
What we offer
  • Equity
  • Generous benefits program
  • Fulltime
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
Finland , Helsinki
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • At least 3+ of those years operating in a Senior+ SRE position
  • Strong background in running production SaaS systems at scale
  • Proficiency in at least one programming/scripting language (Python, Go, or similar)
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • Familiarity with advanced observability (OTEL, continuous profiling)
  • Proven incident management experience, including leading high-severity incidents and postmortems
  • Strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Delhi
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • strong background in running production SaaS systems at scale
  • proficiency in at least one programming/scripting language (Python, Go, or similar)
  • hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • familiarity with advanced observability (OTEL, continuous profiling)
  • proven incident management experience, including leading high-severity incidents and postmortems
  • strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right

Staff Site Reliability Engineer

Our Site Reliability Engineering team is growing, and we are looking for a highl...
Location
Location
India , Pune
Salary
Salary:
Not provided
alpha-sense.com Logo
AlphaSense
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role
  • at least 3+ of those years operating in a Senior+ SRE position
  • strong background in running production SaaS systems at scale
  • proficiency in at least one programming/scripting language (Python, Go, or similar)
  • hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
  • deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
  • experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
  • familiarity with advanced observability (OTEL, continuous profiling)
  • proven incident management experience, including leading high-severity incidents and postmortems
  • strong troubleshooting skills across the full stack
Job Responsibility
Job Responsibility
  • Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture
  • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention
  • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards
  • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements
  • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively
  • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing
Read More
Arrow Right