The OCI AI Infrastructure team plays a critical role in building next-generation data centers, automating hardware lifecycle management, and integrating servers into new OCI regions. This work directly impacts the reliability, scalability, and performance of Oracle Cloud Infrastructure.
About You
- You work backward, starting from the customer. You care about creating usable, useful software that solves real problems and brings delight to users.
- You are a strong communicator, able to simplify complex technical concepts for diverse audiences
- You collaborate effectively across engineering, product, and design teams
- You are comfortable with ambiguity and can drive initiatives from concept to completion
- You bring a hands-on mindset and are comfortable working across the full technology stack
- You are passionate about AI and its application to improve systems, teams, and outcomes
As a Senior Manager, you will lead a team of Operators and Developers, providing strategic direction and hands-on support. You will bring proven leadership, people management, and communication skills, along with strong analytical abilities and a deep understanding of large-scale distributed systems.
You should be a distributed systems generalist, capable of comprehending complex system interactions and comfortable diving deep into any part of the stack to support your team as needed. You value simplicity and scalability, thrive in collaborative and agile environments, and are passionate about continuous learning. You will establish, develop, and provide ongoing direction for your team, while also collaborating with geographically distributed teams to drive organizational success.
Qualifications: BS or MS degree in a relevant field, or equivalent experience
5+ years of experience in software engineering, operations, or a related domain
5+ years of people management and/or technical leadership experience
Demonstrated experience building teams, including recruiting, hiring, and performance management
Strong organizational and planning abilities, including scheduling and resource management
Strong operational experience, including service team reporting on metrics for availability, operator/engineer performance, and ticket resolution analytics
Proficiency with scripting languages such as Python and BASH
Solid knowledge of distributed systems fundamentals
Working familiarity with networking protocols (e.g., TCP/IP, HTTP) and standard network architectures