Returning Candidate?

ISS Site Reliability Engineer

Category:: Engineering
Position:: Full-Time
Work Type:: Hybrid

Company Overview

BE PART OF BUILDING THE FUTURE.

What do NASA and emerging space companies have in common with COVID vaccine R&D teams or with Roblox and the Metaverse?

The answer is data, -- all fast moving, fast growing industries rely on data for a competitive edge in their industries. And the most advanced companies are realizing the full data advantage by partnering with Pure Storage. Pure’s vision is to redefine the storage experience and empower innovators by simplifying how people consume and interact with data. With 11,000+ customers including 58% of the Fortune 500, we’ve only scratched the surface of our ambitions.

Pure is blazing trails and setting records:

For ten straight years, Gartner has named Pure a leader in the Magic Quadrant
Our customer-first culture and unwavering commitment to innovation have earned us a certified Net Promoter Score in the top 1% of B2B companies globally
Industry analysts and press applaud Pure’s leadership across these dimensions
And, our 5,000+ employees are emboldened to make Pure a faster, stronger, smarter company as we go

If you, like us, say “bring it on” to exciting challenges that change the world, we have endless opportunities where you can make your mark.

Position Overview

Data is the new oil. If you run a business, and you want to do anything with your data, the first thing you need to do is create the infrastructure required to store and query that data. This is where Pure Storage comes into play - with a variety of hardware and software solutions to get the maximum out of your data.

ISS (Infrastructure Shared Service) is an international organization within Pure, responsible for all of Pure Storage's engineering infrastructure, development environment, and production services. We work with all internal engineering teams to provide reliable services that are used to develop new products and features, in many different environments: from our multiple data centers to various public clouds.

As a Reliability Engineer in ISS, you will work to improve the reliability and performance of Pure Storage's critical infrastructure applications by owning their development and operation. This means setting and owning SLO goals for uptime and latency, as well as helping colleagues leverage the features and workflows available to them. All with the focus of keeping the backend web servers, load balancers, and database servers healthy and running smoothly.

We are looking for engineers who have a mix of software and systems skills, are passionate about reliability, performance, and efficiency, and have experience building tools, services, and automation to manage and improve production services.

Responsibilities

Design, operate, maintain, and troubleshoot enterprise systems such as databases, message queues, APIs, and distributed applications through the use of data and metrics such as SLOs and error budgets.
Establish and practice sustainable incident response and blameless postmortems to prevent problem recurrence.
Support services before they go live through activities such as system design, developing software platforms and frameworks, capacity planning, and launch reviews.
Scale systems sustainably through mechanisms like scripting and automation; evolve systems by pushing changes that improve their operational management reliability and velocity.
Collaborate with team members, across business units, and across multiple time zones to create high quality customer outcomes.

Minimum Qualifications

Demonstrated Coding ability with one or similar of the following: C, C++, Java, Python, or Go;
Demonstrated experience in design, implementation, delivery, and maintenance of software systems;
Able to work in a 24x7 on-call rotation using a follow the sun model (i.e. 8am to 8pm local time pager duty, approximately 1 week every 2-3 months);
Systematic problem-solving approach, strong communication skills, and a sense of ownership and drive;
Experience in analyzing performance & debugging Enterprise Systems.

Preferred qualifications

5+ years as a Site Reliability Engineer, DevOps Engineer, or Infrastructure engineer;
Understanding of Unix/Linux, and optionally Windows operating systems;
Experience working with Infrastructure as Code / Automation tools (Ansible, Terraform, CloudFormation);
Well organized, with ability to prioritize tasks independently, set goals and follow through in order to see them to completion;
Experience with containers and container orchestration systems such as Docker and/or Kubernetes;

Expertise with hybrid (bare metal/public cloud - AWS preferred) cloud environments.

Options

ApplyApply

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.

Application FAQs