SRE-Application/Platform(GC and USC Only)

Ampstek

Bloomfield, CT

Role: SRE-Application/Platform

Location: Bloomfield , CT(Onsite)

Duration : Long Term Contract

Experience in monitoring, troubleshooting, performance tuning, capacity planning, and automation, along with strong exposure to distributed data processing frameworks like Spark, Flink, and Kafka.

Hadoop Cluster Administration & Operations

• Ensure 24x7 system reliability, incident response, and operational readiness for global applications.

• Lead troubleshooting efforts during outages/performance incidents; perform root cause analysis (RCA) and implement preventive actions.

• Define and maintain operational metrics and reliability goals (availability, latency, throughput, resource utilization).

• Improve system stability via proactive monitoring, alerting, and capacity planning

• Big Data & Streaming Support

• Support deployments and operations across: AWS Cloud, Kubernetes, containerized environments

• Implement and maintain cluster reliability in Kubernetes environments: Resource quotas, access control, permissions, namespace management

Thanks

Rakesh Pathak | Lead Recruiter

Phone: 609-360-2642

Rakesh.pathak@ampstek.com| www.ampstek.com

https://www.linkedin.com/in/rakesh-kumar-pathak-00b039167/

Related jobs

Trending Job Titles

Trending Locations

Trending Companies

Trending Categories

Legal
QA