Job Title: Cloud Operations Engineer (Mid-Level)
Location: Irvine, CA (Hybrid)
Experience: 3–5 Years
Industry: Food & Beverage
About the Role
We are looking for a Cloud Operations Engineer to ensure the reliability and performance of critical systems. This role focuses on production support, incident management, and system uptime in a fast-paced, high-availability environment.
Key Responsibilities
- Monitor and manage cloud infrastructure (AWS/Azure/GCP) to meet SLA and uptime targets
- Troubleshoot and resolve production incidents and system outages in real time
- Participate in on-call rotation and handle escalations as needed
- Perform root cause analysis and implement preventive measures
- Build and enhance monitoring, alerting, and logging systems
- Automate operational tasks to improve efficiency and reduce manual effort
- Develop and maintain runbooks, SOPs, and technical documentation
- Collaborate with engineering teams to improve system reliability and performance
Required Skills & Experience
- 3+ years of experience in Cloud Operations / Production Support / SRE
- Hands-on experience with AWS, Azure, or GCP
- Strong experience with monitoring & alerting tools (CloudWatch, Datadog, Prometheus, etc.)
- Proven experience handling production incidents and on-call support
- Solid understanding of Linux/Unix systems
- Basic scripting skills (Python, Bash, or similar)
Nice to Have
- Experience with Infrastructure as Code (Terraform, CloudFormation)
- Exposure to CI/CD tools (Jenkins, GitHub Actions)
- Familiarity with logging tools (Splunk, ELK stack)
- Experience with containerization (Docker, Kubernetes)
What We’re Looking For
- Strong troubleshooting and problem-solving skills
- Ability to work in a fast-paced, production environment
- Ownership mindset with focus on reliability and uptime
- Good communication and collaboration skills
Why Join
- Work on large-scale, global cloud infrastructure
- Be part of a high-impact operations team supporting mission-critical systems
- Opportunity to grow into Senior Cloud / SRE / DevOps roles