Senior Site Reliability Engineer Job at XperiencOps Inc, Pleasanton, CA

dmtWVFRTM0hRNlgwbEs1dWxJMmxBdkZpOHc9PQ==
  • XperiencOps Inc
  • Pleasanton, CA

Job Description

The Senior Site Reliability Engineer (SRE) plays a vital role in ensuring the reliability, scalability, and performance of our enterprise software platform. This is a senior-level position that requires deep technical expertise, strong problem-solving skills, and the ability to collaborate effectively in a fast-paced, demanding environment. Our customers, the largest enterprises in the world, expect 24/7 platform availability and top-tier performance.

The ideal candidate has strong expertise in AWS cloud technologies , a deep understanding of serverless architectures (AWS Lambda), and a passion for building resilient systems to enhance the customer experience.

Platform Reliability:

  • Design, implement, and manage highly available and scalable systems to meet customer expectations for 24/7 uptime.
  • Monitor, troubleshoot, and resolve platform incidents using tools such as Sentry, New Relic, and custom monitoring frameworks.
  • Lead post-incident reviews to ensure root cause analysis and preventative measures are in place.

Automation and Optimization:

  • Develop and maintain automation for infrastructure management, monitoring, and incident response.
  • Optimize platform performance and scalability, proactively identifying and addressing bottlenecks.
  • Contribute to the development of CI/CD pipelines to improve deployment reliability and speed.

Collaboration:

  • Partner with L2 engineers to resolve complex customer issues, providing guidance and technical expertise as needed.
  • Work closely with product engineering to ensure platform improvements align with customer needs.
  • Actively contribute to the documentation and sharing of best practices to improve team performance and customer outcomes.

Leadership:

  • Mentor junior engineers and provide technical leadership in reliability engineering.

  • Drive cross-functional initiatives to improve platform stability and customer satisfaction.

Requirements

  • Bachelor's degree in Computer Science or related discipline.

  • 8+ years in a Site Reliability Engineering or DevOps role, with experience supporting enterprise-grade software platforms.

  • 3+ years of experience in cloud services, in particular AWS.

  • Experience building observability systems on New Relic, Cloudwatch or similar.

  • Experience implementing rate-limiting, API gateways, and load balancing for highly available systems.

  • Exposure to security best practices and compliance frameworks (e.g., SOC2, ISO27001).

  • Proficient in infrastructure as code (IaC) using tools such as Terraform or CloudFormation.

  • Hands-on experience with scripting and programming languages like Python, Go, or Bash.

  • Strong troubleshooting and debugging skills.

  • Excellent communication and collaboration skills.

  • Experience with incident management and post-mortem practices.

  • Soft Skills:

    • Exceptional problem-solving and critical thinking abilities.
    • Strong verbal and written communication skills, with the ability to navigate ambiguity and provide clarity.
    • Ability to work collaboratively in cross-functional teams under pressure.

Key Attributes:

  • Reliability-Driven: Strong commitment to platform reliability and performance.
  • Leadership and Mentorship: Willingness to guide and mentor less experienced team members.
  • Customer-Focused: Dedication to meeting and exceeding customer expectations in a high-pressure environment.

Expectations:

  • Availability to participate in a 24/7 on-call rotation.

  • Ability to work in a fast-paced, ambiguous environment with rapidly changing priorities.

  • Proactive approach to identifying and mitigating risks before they impact customers.

  • Strong sense of accountability and ownership for platform stability and customer satisfaction.

Benefits

  • Opportunity to work on cutting-edge products and make a real impact.
  • Collaborative and fast-paced work environment.
  • Chance to be part of a rapidly growing startup.
  • Competitive salary and benefits package (health insurance, dental insurance, vision insurance, paid time off, etc.)

Job Tags

Similar Jobs

Adidev Technologies Inc

React Native Hybrid Developer Onsite, Fast Ramp-Up Job at Adidev Technologies Inc

 ...software consulting firm is urgently seeking talented Hybrid Developers to join their dynamic team. This role offers the opportunity...  ...large-scale applications for well-known clients. With a focus on React Native, candidates will need to demonstrate their ability to develop... 

Encadria Staffing Solutions LLC

Manufacturing Packer- Temp-to-Hire (Jonestown) Job at Encadria Staffing Solutions LLC

 ...Encadria Staffing Solutions is the internal staffing agency for Georgia-Pacific and other Koch companies across the country. We proudly support Georgia-Pacific locations by hiring dependable, safety-minded employees who embody our core principles of integrity, respect,... 

Greenfield Milling

Miller Job at Greenfield Milling

GF Milling is seeking a skilled Miller to join our dynamic team, dedicated to maintaining the highest standards in grain and flour production. Utilizing state-of-the-art milling technologies, you will play a pivotal role in producing materials that meet the specific needs...

St. David's Georgetown Medical Center - TeamHealth

Hospitalist - Physician Job at St. David's Georgetown Medical Center - TeamHealth

 ...- Physician at St. David's Georgetown Medical Center - TeamHealth summary: The Hospitalist position at St. David's Georgetown Hospital involves providing inpatient care in a closed ICU setting on a traditional 7-on, 7-off day shift schedule. The role includes managing... 

ASK Consulting

Retail Channel Specialist (272702) Job at ASK Consulting

 ...and quickly changing facts to Clients retail leadership team in a succinct, easy to understand manner. Qualifications: Bachelors degree plus an additional 3 years of relevant retail experience required Demonstrated experience working with multiple accounts...