Site Reliability Engineering Manager Job at Hewlett Packard Enterprise (Bangalore)

Senior Site Reliability Engineer

Baxter International is seeking a skilled Senior Principal Site Reliability Engi...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes

What we offer

Healthcare benefits
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan
Flexible Spending Accounts
Educational assistance programs
Paid holidays
Paid time off
Paid parental leave
Commuting benefits
Employee Discount Program

Fulltime

Senior Site Reliability Engineer

This is a role at Baxter where your work impacts saving and sustaining lives thr...

Location

United States , Deerfield

Salary:

96000.00 - 132000.00 USD / Year

Baxter

Expiration Date

Until further notice

Requirements

Bachelor's degree in computer science, IT, or related field (or equivalent experience)
Prior experience in Site Reliability Engineering and cloud-based infrastructure management
Experience in enterprise engineering, including 24x7 uptime, regulated environments, and planning/operations
Azure administration and operations experience, with certifications a plus
Knowledge of related technologies, including cloud, encryption, and security protocols
Systems administration experience in Windows and Linux environments
Proven problem-solving skills and experience with scripting and automation tools
Ability to create accurate documentation and reports, with excellent communication skills
Applicants must be authorized to work for any employer in the U.S.
Unable to sponsor or take over sponsorship of an employment visa at this time.

Job Responsibility

Drive strategies to ensure 24x7 availability of services and business continuity for customer-facing healthcare software applications and platforms hosted on Microsoft Azure cloud
Manage and administer Azure resources, including virtual machines, databases, and networking components
Define and document operating procedures to ensure required security, privacy and other compliance standards are maintained for digital solutions deployed in cloud
Manage process, planning, and execution for Disaster Recovery (DR) and Business Continuity Planning (BCP)
Define and refine Operations SLAs to maintain high level of Customer Satisfaction
Establish non-functional requirements to meet SLAs
Establish infrastructure and application monitoring dashboards and workflow for automatic routing of notifications
Define key performance indicators that can be monitored, measured, and used to derive opportunities
Standardize site metrics for stakeholders, reporting on various KPIs including SLAs, availability, capacity utilization, service metrics and cost utilization
Work closely with DevOps Engineers to automate infrastructure provisioning and deployment processes.

What we offer

Support for Parents
Continuing Education/Professional Development
Employee Health & Well-Being Benefits
Paid Time Off
2 Days a Year to Volunteer
Medical and dental coverage starting day one
Insurance coverage for basic life, accident, short-term and long-term disability
Business travel accident insurance
Employee Stock Purchase Plan (ESPP)
401(k) Retirement Savings Plan

Fulltime

Cloud Security Site Reliability Engineer

This role sits within the Cloud Security team responsible for Private and Public...

Location

Singapore , Singapore

Salary:

Not provided

Citi

Expiration Date

Until further notice

Requirements

Bachelor’s degree or equivalent work experience
6+ years of relevant work experience
Highly motivated self-starter with excellent interpersonal and communication skills
Certification or formal training in site reliability engineering concepts and practices
Prior experience working towards SLIs, SLOs and observability capabilities at a large scale
4+ years experience in Python (preferable) or Java, on large scale systems alongside Linux based scripting languages
Experience working on observability, logging and metrics toolsets
Experience of k8s and container technologies such as Docker, Openshift and EKS
Experience with public cloud technologies such as AWS, GCP or Azure
Experience with Secrets products such as HashiCorp Vault or CyberArk

Job Responsibility

Working across Container products and Secrets products, across Public and Private Cloud, as well as Cloud native specific products
Architecting and building tools and platforms that provide capabilities for SRE
Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organisation
Actively owning production level incidents till resolution.

What we offer

Equal opportunity employer
Accessibility support for persons with disabilities.

Fulltime

Senior Software Engineer, Site Reliability

Babylist is looking for a Senior Software Engineer, Site Reliability to join our...

Location

United States; Canada

Salary:

186818.00 - 224183.00 USD; CAD / Year

Babylist

Expiration Date

Until further notice

Requirements

8+ years of experience as a Site Reliability Engineer or similar role
Experience supporting high-traffic consumer-facing websites
Proficiency with Terraform
Strong experience working with AWS cloud-based infrastructure and services
Proficiency with Docker and Kubernetes
Solid understanding of cloud-native systems design
Troubleshooting and debugging skills
Experience designing and supporting CI systems
Familiar with monitoring and alerting best practices
Proven experience in on-call management best practices

Job Responsibility

Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform
Improve the speed and reliability of our Continuous Integration (CI) systems
Provide support to developers in troubleshooting issues
Establish, communicate, and support best practices for monitoring and alerting

What we offer

Company-paid medical, dental, and vision insurance
Retirement savings plan with company matching and flexible spending accounts
Generous paid parental leave and PTO
Remote work stipend
Perks for physical, mental, and emotional health, parenting, childcare, and financial planning

Fulltime

Principal Site Reliability Engineer

Location

United States , Ft. Meade

Salary:

Not provided

CipherLogix

Expiration Date

Until further notice

Requirements

Fourteen (14) years experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution
Ten (10) years experience in system engineering/architecture
Ten (10) years experience working with products that support highly distributed, massively parallel computation needs such as Hbase, Hadoop, CloudBase/Acumulo, Big Table, Cassandra, Scality etc
At least ten (10) years experience writing software scripts using scripting languages such as Perl, Python, or Ruby for software automation
At least four (4) years experience managing and monitoring large Cloud System (>200 nodes). Cloud Systems Administrator or Developer Certification
Experience in performing and providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems to include monitoring technical health of a system, improving organizational processes, implementation of postmortem (failure) analysis and incident management
Ten (10) years experience in the cleared environment
Ten (10) years demonstrated experience developing software for one of the following: Windows, UNIX, or Linux OS
Knowledge and experience with developing distributed storage routing and querying algorithms
Experience in developing documentation required to support a program’s technical issues and training situations

Fulltime

Staff Site Reliability Engineer

We are looking for a Site Reliability Engineer to own our internal systems infra...

Location

United States , Sunnyvale

Salary:

175000.00 - 250000.00 USD / Year

Figure

Expiration Date

Until further notice

Requirements

Strong experience with Linux/Unix systems administration
Proficiency in programming/scripting
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures
Experience designing, deploying, and operating high-availability, fault-tolerant, and distributed systems
Mastery of infrastructure as code (Terraform, CloudFormation, Ansible…)
Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, Datadog…)
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP, load balancers, firewalls)
Experience defining Service Level Objectives (SLO), developing runbooks/incident response plans, facilitating post-mortems and managing systems assets
Ability to work in cross-functional teams with developers, infra, and product teams
Excellent verbal and written communication skills

Job Responsibility

Be the go to person for mission critical infrastructure enabling critical operations such as Source Configuration Management, CI/CD systems, software distribution, supplier portals, manufacturing and more
Migrate SaaS to self-hosted solutions to enhance security and reliability
Implement monitoring and alerting systems, and define incident response plans and runbooks
Reduce human workload through automation to automate deployment and scaling
Establish strong relationships with stakeholders to identify infrastructure needs and establish Service Level Objectives
Use a data driven approach to demonstrate service robustness and track optimization work
Partner with the security team to ensure that security remediations and updates are applied in a timely manner

Fulltime