CrawlJobs Logo

API Production Support Lead (SRE)

https://www.citi.com/ Logo

Citi

Location Icon

Location:
Canada , Mississauga, Ontario

Category Icon

Job Type Icon

Contract Type:
Not provided

Salary Icon

Salary:

94300.00 - 141500.00 USD / Year

Job Description:

At Citi, we’re passionate about building and maintaining highly reliable APIs that solve critical customer problems. We support mission-critical systems, empowering our customers with a rich feature set, high availability, and stellar performance levels to pursue their financial transactions. As we continue to expand our API scope and capabilities, we are seeking an experienced and dedicated API Support Lead to ensure the operational excellence and continuous improvement of our API ecosystems. This role requires an individual who brings fresh ideas, demonstrates a unique and informed viewpoint on API reliability, and enjoys collaborating with cross-functional teams to develop real-world solutions and ensure positive user experiences at every interaction. Our ultimate goal is to build proactive and predictive operational strategies, including leveraging intelligent automation, to avoid customer impacts.

Job Responsibility:

  • Champion stability initiatives to enable high availability and resilience for our API applications
  • Exhibit calm and analytical leadership when faced with major incidents on critical API systems
  • Lead the proactive monitoring and management of production API environments
  • Drive the definition, analysis, and reporting of SLIs and SLOs for all supported APIs and clients
  • Contribute to the development and implementation of tools and systems designed to enhance API operational management
  • Measure and optimize API system performance
  • Provide leadership and expert operational support for critical, large-scale distributed API ecosystems
  • Lead the gathering and analysis of performance metrics from API platforms and underlying infrastructure
  • Partner closely with API development teams to improve services through rigorous operational feedback loops, testing, and release procedures
  • Drive the creation of sustainable API operational systems and services through automation and continuous uplifts
  • Conduct thorough post-incident reviews for API-related issues
  • Actively participate in and lead high-priority API production support activities

Requirements:

  • Extensive experience supporting Java and J2EE based applications and tooling
  • Deep technical knowledge and hands-on experience supporting and troubleshooting environments including AWS, ECS, Oracle DB, and Mongo DB
  • A strong understanding and practical application of SRE concepts, particularly in defining and measuring SLIs, SLOs and Error Budgets
  • Demonstrated experience in building and utilizing comprehensive monitoring solutions such as AppDynamics, Splunk, Kibana to proactively alert on production API-related issues and ensure system health
  • In-depth knowledge and hands-on experience with API Gateway technologies, specifically APIGEE, and CDN solutions like Akamai
  • Proven ability to proactively identify and address problems, areas for improvement, and performance bottlenecks within complex API ecosystems using software-based solutions
  • Strong coding experience beyond simple scripts, preferably in Java or Python, for automation and internal tool development
  • Bachelor’s/University degree in Computer Science, Engineering, or a related field

Nice to have:

  • Prior experience or awareness of agentic or AI-based solutioning within the API Support domain, particularly for proactive issue detection and resolution
  • Exposure to ITRS monitoring tools and experience in configuring ITRS gateways
  • Effective in supporting Payments applications, preferably in corporate banking environments
  • Strong knowledge of ITIL practices in Incident, Release, Change, and Problem management
  • Master’s degree preferred

Additional Information:

Job Posted:
December 31, 2025

Employment Type:
Fulltime
Work Type:
Hybrid work
Job Link Share:

Looking for more opportunities? Search for other job offers that match your skills and interests.

Briefcase Icon

Similar Jobs for API Production Support Lead (SRE)

Digital Citi Connect API Support Lead

The Apps Support Sr Analyst is a seasoned professional role. Applies in-depth di...
Location
Location
India , Chennai
Salary
Salary:
Not provided
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6-10 years experience in an Application Support role
  • Expert experience in supporting API based high throughput low latency applications, SRE
  • Java, j2ee support is a must
  • Experience installing, configuring or supporting business applications
  • Experience with some programming languages and willingness/ability to learn
  • Advanced execution capabilities and ability to adjust quickly to changes and re-prioritization
  • Effective written and verbal communications including ability to explain technical issues in simple terms that non-IT staff can understand
  • Demonstrated analytical skills
  • Issue tracking and reporting using tools
  • Knowledge/ experience of problem Management Tools
Job Responsibility
Job Responsibility
  • Provides technical and business support for users of Citi Applications
  • Maintains application systems that have completed the development stage and are running in the daily operations of the firm
  • Manages, maintains and supports applications and their operating environments, focusing on stability, quality and functionality against service level expectations
  • Start of day checks, continuous monitoring, and regional handover
  • Perform same day risk reconciliations
  • Develop and maintain technical support documentation
  • Identifies ways to maximize the potential of the applications used
  • Assess risk and impact of production issues and escalate to business and technology management in a timely manner
  • Ensures that storage and archiving procedures are in place and functioning correctly
  • Formulates and defines scope and objectives for complex application enhancements and problem resolution
  • Fulltime
Read More
Arrow Right

Application Support Intermediate Analyst

The Applications Support Intermediate Analyst is a developing professional role....
Location
Location
Canada , Mississauga
Salary
Salary:
79320.00 - 110680.00 USD / Year
https://www.citi.com/ Logo
Citi
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Extensive experience supporting Java and J2EE based applications and tooling
  • Deep technical knowledge and hands-on experience supporting and troubleshooting environments including AWS, ECS, Oracle DB, and Mongo DB
  • A strong understanding and practical application of SRE concepts, particularly in defining and measuring SLIs, SLOs and Error Budgets
  • Demonstrated experience in building and utilizing comprehensive monitoring solutions such as AppDynamics, Splunk, Kibana to proactively alert on production API-related issues and ensure system health
  • Mandatory: In-depth knowledge and hands-on experience with API Gateway technologies, specifically APIGEE, and CDN solutions like Akamai
  • Proven ability to proactively identify and address problems, areas for improvement, and performance bottlenecks within complex API ecosystems using software-based solutions
  • Strong coding experience beyond simple scripts, preferably in Java or Python, for automation and internal tool development
  • Bachelor’s/University degree in Computer Science, Engineering, or a related field
  • Master’s degree preferred
Job Responsibility
Job Responsibility
  • Champion stability initiatives to enable high availability and resilience for our API applications, including enhancing monitoring, failover mechanisms, and overall system health
  • Exhibit calm and analytical leadership when faced with major incidents on critical API systems, ensuring effective incident, problem, and change management at a global enterprise level
  • Lead the proactive monitoring and management of production API environments, taking a holistic view of system health and performance
  • Drive the definition, analysis, and reporting of SLIs and SLOs for all supported APIs and clients, ensuring clear performance benchmarks
  • Contribute to the development and implementation of tools and systems designed to enhance API operational management and the client experience
  • Measure and optimize API system performance, always pushing capabilities forward, anticipating customer needs, and innovating for continuous improvement
  • Provide leadership and expert operational support for critical, large-scale distributed API ecosystems
  • Lead the gathering and analysis of performance metrics from API platforms and underlying infrastructure to assist in performance tuning, fault finding, and capacity planning
  • Partner closely with API development teams to improve services through rigorous operational feedback loops, testing, and release procedures
  • Drive the creation of sustainable API operational systems and services through automation and continuous uplifts, including developing, testing, and debugging automated tasks
  • Fulltime
Read More
Arrow Right

SRE Lead Design & Support Engineer

This is a critical enabler achieving a high resiliency during operations and als...
Location
Location
Mexico , Miguel Hidalgo
Salary
Salary:
Not provided
pepsico.com Logo
Pepsico
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 8+ years of work experience evolving to a SRE engineer
  • 3-5 years of experience in continuously improving and transforming IT operations ways of working
  • Bachelor’s degree in Computer Science, Information Technology or a related field
  • Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs
  • Highly quantitative, have great judgment, able to connect dots across ecosytems, and efficiently work cross-functionally across teams
  • A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes
  • Hands on experience in Python, SQL /No-SQl( MySQL, Mongo DB, Cassandra, Postgress), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets
  • A firm understanding of cloud archticture for distributed environments
  • Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js
  • Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase)
Job Responsibility
Job Responsibility
  • Drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enablees resilient outcomes
  • Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation
  • Ensure ecosystem availability and performance in production environments, Pro-actively preventing P1, P2, potential P3s
  • Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications
  • Accountable to institute non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product’s offerings as part of the engineering solution
  • Leads the team diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions
  • Collaborates with Engineering & support teams, including participation in escalations, and blameless postmortems
  • Work closely with customer-facing support teams to empower them with SRE insights and tooling
  • Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical “understanding of interactions" of a full stack application alongside with peer SRE team member
  • Continuously optimize the L2/support operations work via SRE workflow automation
What we offer
What we offer
  • Opportunities to learn and develop every day through a wide range of programs
  • Internal digital platforms that promote self-learning
  • Development programs according to Leadership skills
  • Specialized training according to the role
  • Learning experiences with internal and external providers
  • Recognition programs for seniority, behavior, leadership, moments of life, among others
  • Financial wellness programs that will help you reach your goals in all stages of life
  • A flexibility program that will allow you to balance your personal and work life, adapting your working day to your lifestyle
  • Wellness Line, thousands of Agreements and Discounts, Scholarship programs for your children, Aid Plans for different moments of life
Read More
Arrow Right

Platform Engineering Director

We are seeking an experienced Platform Engineering Director to manage and lead o...
Location
Location
France , Paris
Salary
Salary:
Not provided
https://www.ledger.com Logo
Ledger
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 10+ years’ experience in software and platform/infrastructure engineering, including senior technical leadership
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience
  • Proven experience leading platform or infrastructure engineering teams, including managers and senior ICs, in distributed and matrixed environments
  • Deep expertise in AWS, Kubernetes, containerisation, infrastructure as code, and CI/CD/deployment automation
  • Experience designing, building, and operating developer platforms or internal PaaS
  • Knowledge of platform engineering patterns (e.g. golden paths, paved roads), service mesh, and API management
  • Strong background in production system architecture, capacity planning, performance optimisation, and incident leadership
  • Excellent communication skills with senior and executive stakeholders
Job Responsibility
Job Responsibility
  • Define and execute the platform engineering strategy aligned with business objectives and Infrastructure & Operations goals
  • Lead and manage the Platform Engineering team to high performance
  • Establish and govern best practices, standards, and architecture for platform services and production cloud infrastructure
  • Partner with engineering leadership and Infrastructure & Operations teams to define platform and production system requirements
  • Act as the final escalation point for complex platform engineering issues beyond Level 2 operations support
  • Oversee the design, development, and maintenance of developer platforms enabling application delivery, CI/CD, and deployment automation
  • Build and maintain platform infrastructure, cloud environments, and tooling in line with architectural standards and requirements
  • Design and operate scalable, highly available, and reliable production systems and services
  • Lead initiatives for infrastructure as code (IaC), observability, and monitoring across development and production environments
  • Provide expert-level technical leadership during complex platform, infrastructure, and production incidents
What we offer
What we offer
  • Flexible work options - Our hybrid policy allows employees to work from home up to 3 times per week
  • Health & Wellness support - Health and Life Insurance
  • Financial growth opportunities - Employees can become shareholders in Ledger as well as other financial benefits depending on your country of work
  • Commuter allowance - Ledger offers a commuter allowance to contribute to your preferred means of transportation
  • Learning & Development - A comprehensive suite of training solutions providing a personalised learning experience for every employee
  • Fulltime
Read More
Arrow Right

Engineering Manager, Wikidata Platform

The Wikimedia Foundation is seeking an Engineering Manager to lead the Wikidata ...
Location
Location
Salary
Salary:
132439.00 - 208378.00 USD / Year
wikimediafoundation.org Logo
Wikimedia Foundation
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years of engineering management experience leading teams building API-driven or platform-level data services
  • Experience collaborating closely with product and tech leads on software development teams that ship products with community input
  • Experience building and operating large-scale, high-throughput products, with strong foundations in observability, incident response, runbook quality, and overall operational excellence
  • Experience guiding software systems through their full lifecycle
  • Strong people management skills including hiring, coaching, and performance management
  • Experience working with data streams and data-intensive applications
  • Experience navigating challenges related to privacy-sensitive data
  • Ability to influence and drive results across multiple teams in a distributed organization
  • Commitment to Wikimedia’s mission and values
  • Comfort with ambiguity, incomplete information, and navigating complex environments
Job Responsibility
Job Responsibility
  • Lead timely, high-quality engineering delivery for WDQS and related query platform services, including large-scale platform or data migrations involving multiple teams and stakeholders
  • Ensure reliability, performance, and sustainability of existing and future query infrastructure
  • Oversee planning activities including estimation, resource allocation, and work break-down, and balance roadmap work with maintenance needs
  • Triage incoming issues, bugs, and operational incidents
  • Develop and drive long-term engineering strategy for WDQS, including lifecycle management, architectural tradeoffs, and future planning
  • Partner with SRE and other Foundation teams to ensure operational excellence and alignment across the data ecosystem
  • Safeguard privacy, security, and data integrity across query services
  • Provide technical input on system design, complexity, estimates, and feasibility
  • Hire, onboard, mentor, and support the professional growth and performance of engineers on the Wikidata Platform team
  • Foster a collaborative, inclusive, and psychologically safe culture
  • Fulltime
Read More
Arrow Right

Experienced Software Engineer

Join State Farm's Digital Experience team as part of the Mobile Product Suite! W...
Location
Location
United States , Bloomington, IL or Richardson, TX
Salary
Salary:
110000.00 - 135000.00 USD / Year
itsmfonline.org Logo
Information Technology Senior Management Forum
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 5+ years experience with Java, spring-boot and spring-framework
  • Strong API development experience - REST, Swagger/OpenAPI
  • Experience with API security including OAuth2 and JSON Web Tokens (JWT)
  • Test Automation (Karate/Cucumber) Framework knowledge/experience
Job Responsibility
Job Responsibility
  • Applies skills, tools, security processes, applications, environments and programming language(s) to complete complex assignments
  • Applies advanced engineering practices to design full-stack applications using industry-adopted languages and frameworks
  • Diagnoses and resolves complex problems/issues
  • Maintains advanced understanding in software engineering topics, including classes, functions, security, containers, version control, CI/CD, and unit tests
  • Maintains advanced understanding in programming (e.g. Java), and database functionality
  • Maintains advanced understanding in compute environments, including but not limited to Linux, Hadoop, Mainframe, Public Cloud, and containers
  • Applies advanced understanding regarding technology trends/changes, best practices, and processes to complete assignments and influence the direction of product solutions
  • Applies advanced understanding of product design, data design and movement and test to ensure quality outcomes
  • Provides mentorship, technical guidance, training, and may delegate work to others
  • Understands, supports, and helps define the vision and direction for the product development
What we offer
What we offer
  • Potential yearly incentive pay up to 10% of base salary
  • healthcare premium mostly paid by employer
  • multiple healthcare plan options
  • 100% coverage for in-network preventative care
  • vision, dental, telemedicine, 24/7 mental health professionals
  • educational benefits
  • industry leading training programs
  • tuition assistance
  • employee resource groups
  • mentoring
  • Fulltime
Read More
Arrow Right

Software Engineering Specialist

The role is accountable for ensuring that our technical deliveries realise Busin...
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
plus.net Logo
Plusnet
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Deep knowledge on Networking domain along with solid understanding on OSS stack of Telecom including, Planning/Monitoring/Assurance
  • Having a strong grip on TMF standards with API based solution and Event based architecture patterns
  • Strong foundation on ODA Architecture patterns
  • Experience designing & solution and from an Engineering point of view with TMF complaint and ODA Architecture
  • Skilled in life cycle management of OSS tools/solutions including requirements analysis, platform selection, technical architecture design, application design & development, testing and deployment
  • Knowledge in various industry standard’s such as TMF, Open API
  • Lead and execute engineering initiatives to ensure the network cloud platform is easily consumable by products and solutions that are built on top of the platform
  • and at the sametime, is compliant with information security standards
  • Implement governance and controls to monitor and manage consumption and compliance with security and other standards
  • Implement and publish APIs for clients to consume platform services in a consistent way
Job Responsibility
Job Responsibility
  • Role implements the defined architectural roadmap for the Assurance Area for the following: Fault Management
  • Resource Management
  • Incident Management
  • Change Management
  • Role involves defining and implementing the roadmap for Transformation of IT, DataCenter and Network Cloud applications in Service and Problem management
  • Manage, Engineer, Architect, Develop and Maintain applications in Network Management, OSS and FCAPS space
  • Fulltime
Read More
Arrow Right
New

Lead Systems Operations Engineer

Wells Fargo is seeking a Lead Systems Operations Engineer
Location
Location
India , Bengaluru
Salary
Salary:
Not provided
https://www.wellsfargo.com/ Logo
Wells Fargo
Expiration Date
February 18, 2026
Flip Icon
Requirements
Requirements
  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years in large-scale distributed systems
  • minimum 5+ years hands-on experience in SRE, DevOps, or Platform Engineering
  • Cloud: Expertise in one or more: AWS, Azure, GCP (cloud certifications preferred)
  • IaC & Automation: Terraform, Ansible/Chef
  • strong Git and GitOps practices
  • Observability: Hands-on experience with Prometheus, Grafana, OpenTelemetry, ThousandEyes, AppDynamics, Aternity
  • CI/CD: Azure DevOps, GitHub Actions, Jenkins, or GitLab CI
  • strong understanding of artifact management & environment promotion workflows
  • Programming: Proficiency in Python/Go/Java for scripting, automation, and API integrations
Job Responsibility
Job Responsibility
  • Lead complex, broad impact initiatives including provision of high level systems consultation for the technology teams
  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
  • Make decisions on technical changes and enhancements
  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
  • Lead the transformation of traditional platform operations into a modern Site Reliability Engineering (SRE) model—driving reliability by design, elevating SLIs/SLOs, automating operational toil, strengthening observability, and maturing incident & problem management
  • Be hands-on while mentoring Ops and Engineering teams to adopt SRE practices at scale across the platform ecosystem
  • Define and implement SLIs/SLOs and error budgets for critical platform services
  • drive SLO adoption across product and operations teams
  • Fulltime
!
Read More
Arrow Right