Job Title: SRE – Application / Platform SRE
Location: Bloomfield , CT (Onsite)
Job Type: 12+ Months Contract
Job Description:
Experience in monitoring, troubleshooting, performance tuning, capacity planning, and automation, along with strong exposure to distributed data processing frameworks like Spark, Flink, and Kafka.
Hadoop Cluster Administration & Operations
• Ensure 24x7 system reliability, incident response, and operational readiness for global applications.
• Lead troubleshooting efforts during outages/performance incidents; perform root cause analysis (RCA) and implement preventive actions.
• Define and maintain operational metrics and reliability goals (availability, latency, throughput, resource utilization).
• Improve system stability via proactive monitoring, alerting, and capacity planning
• Big Data & Streaming Support
• Support deployments and operations across:AWS Cloud, Kubernetes, containerized environments
• Implement and maintain cluster reliability in Kubernetes environments: Resource quotas, access control, permissions, namespace management
Thanks & Regards
Himanshu Verma | Recruiter – US Staffing
Email: Himanshu.v@ampstek.com | Desk: (609)-527-8914
Ampstek LLC – Global IT Partner | www.ampstek.com