Role: SRE-Application/Platform
Location: Bloomfield , CT(Onsite)
Duration : Long Term Contract
Experience in monitoring, troubleshooting, performance tuning, capacity planning, and automation, along with strong exposure to distributed data processing frameworks like Spark, Flink, and Kafka.
Hadoop Cluster Administration & Operations
• Ensure 24x7 system reliability, incident response, and operational readiness for global applications.
• Lead troubleshooting efforts during outages/performance incidents; perform root cause analysis (RCA) and implement preventive actions.
• Define and maintain operational metrics and reliability goals (availability, latency, throughput, resource utilization).
• Improve system stability via proactive monitoring, alerting, and capacity planning
• Big Data & Streaming Support
• Support deployments and operations across: AWS Cloud, Kubernetes, containerized environments
• Implement and maintain cluster reliability in Kubernetes environments: Resource quotas, access control, permissions, namespace management
Thanks
Rakesh Pathak | Lead Recruiter
Phone: 609-360-2642
Rakesh.pathak@ampstek.com| www.ampstek.com
https://www.linkedin.com/in/rakesh-kumar-pathak-00b039167/