Role: SRE – Application/Platform SRE
Location: Bloomfield , CT(Onsite)
Job Type: 12+ Months Contract
Job Description:
• Experience in monitoring, troubleshooting, performance tuning, capacity planning, and automation, along with strong exposure to distributed data processing frameworks like Spark, Flink, and Kafka.
• Hadoop Cluster Administration & Operations
o Ensure 24x7 system reliability, incident response, and operational readiness for global applications.
o Lead troubleshooting efforts during outages/performance incidents; perform root cause analysis (RCA) and implement preventive actions.
o Define and maintain operational metrics and reliability goals (availability, latency, throughput, resource utilization).
o Improve system stability via proactive monitoring, alerting, and capacity planning
o Big Data & Streaming Support
o Support deployments and operations across: AWS Cloud, Kubernetes, containerized environments
o Implement and maintain cluster reliability in Kubernetes environments: Resource quotas, access control, permissions, namespace management
Thanks and regards,
Deepa Maurya | Technical Recruiter - US Staffing
Email: deepa.m@ampstek.com | Desk: (609) 527-8971
Ampstek LLC – Global IT Partner | www.ampstek.com