Principal Site Reliability Engineer

Synopsys
Sunnyvale, CA

We Are:

At Synopsys, we drive the innovations that shape the way we live and connect. Our technology is central to the Era of Pervasive Intelligence, from self-driving cars to learning machines. We lead in chip design, verification, and IP integration, empowering the creation of high-performance silicon chips and software content. Join us to transform the future through continuous technological innovation.


You Are:

You are a person looking to work in an intercultural and global team. You thrive on solving challenges in a large-scale HPC environment, right at the heart of technology. You are passionate about creating scalable processes and enjoy working collaboratively across time zones. Your excellent problem-solving skills and ability to work through issues and challenges make you a valuable team member. You are excited about joining an innovative team that values continuous learning, great leadership, and being part of a growing organization.

What You’ll Be Doing:

• Applying SRE practices to identify, monitor, communicate, and resolve issues in the environment, while also collaborating with internal teams and customers on post-mortem analysis to deliver root cause insights.

• Following up on issues reported and looking for procedures to prevent similar occurrences.

• Reviewing current processes and transforming them into scalable solutions.

• Debugging OS and engineering issues within our provided Linux environment.

• Collaborating on internal projects across different time zones and teams.

• Following up with customers and handing over tasks/issues with team members to utilize time zones efficiently.

The Impact You Will Have:

• Enhancing the reliability and performance of our engineering environment.

• Streamlining processes to ensure scalability and efficiency.

• Resolving complex OS and engineering issues, contributing to smoother operations.

• Driving successful project outcomes through effective collaboration across time zones.

• Improving customer satisfaction by addressing and resolving issues promptly.

• Foster SRE practices within multifunctional teams and identify gaps for resolution.

What You’ll Need:

• 10+ years of SRE processes and related skills required

• Capability to understand complex engineering implementations and their inter dependencies for troubleshooting.

• Deep Knowledge with Linux distributions (CentOS, RedHat, Ubuntu, SuSE).

• Deep Knowledge of virtualization and containerization technologies.

• Extensive knowledge of storage solutions, including network storage and associated protocols.

• Good Experience in network technologies.

• Good Experience in load sharing facilities such as LSF, Slurm and various workload scheduling technologies.

• Good interpersonal, communication and leadership skills

Who You Are:

• Part of a global Team supporting one of the biggest scaled environments that includes multiple HPC clusters, High performance Storage, Large scale private cloud implementation as well as GPU clusters for HPC/GenAI workloads.

• Challenging yourself to work with the latest state of Art technologies

• Being part of one of the biggest private clouds in the world.

• Embrace and implement SRE best practices.

• An individual who monitors and comprehends complex environments.

• Able to break down complex issues into relevant areas and independently coordinate follow-ups with internal teams. A good communicator with interpersonal skills.

• A proactive problem solver with a keen eye for detail.

• A collaborative team player who thrives in a global, intercultural environment.

• Adept at multitasking and managing multiple priorities effectively.

• Self-motivated and capable of working independently.

• Passionate about continuous learning and professional development.

The Team You’ll Be a Part Of:

You will be part of the Platform Team at Synopsys, a dynamic group dedicated to driving technological innovation. Our team works on cutting-edge projects, ensuring the reliability and performance of our engineering environment. We collaborate across time zones and cultures, leveraging diverse perspectives to achieve our goals.


// // //