Spark-Scala developer

Cognizant
Bentonville, AR

** Unfortunately no visa sponsorship ( now or in the future) or contracting**


Job Title: Spark Scala Developer (GCP Batch Processing)

Location - Hybrid- Bentonville, AR


We are looking for a skilled Spark Scala Developer with strong experience in Google Cloud Platform (GCP) to support and enhance our batch data processing pipelines. The ideal candidate will be hands-on, detail-oriented, and comfortable working across the full development lifecycle in a production environment.


Required Skills & Qualifications

· Strong programming experience in Scala with hands-on expertise in Apache Spark.

· Experience working with GCP services such as Dataproc, BigQuery, Cloud Storage, and Airflow.

· Solid understanding of batch data processing frameworks and distributed systems.

· Knowledge of data validation techniques, data quality checks, and debugging production data issues.

· Ability to analyze logs, identify root causes, and resolve production incidents efficiently.

· Familiarity with Unix/Linux environments and scripting.

Soft Skills

· Strong problem-solving and analytical skills.

· Ownership mindset with attention to detail and quality.

Role Expectations (Day-to-Day Activities)

· Develop and enhance Spark Scala batch jobs.

· Test and validate data pipelines in staging and production.

· Deploy code to production and ensure successful job execution.

· Investigate and resolve data or pipeline issues.

· Support ongoing operations and ensure pipeline reliability.

Key Responsibilities

· Design, develop, and maintain batch data processing jobs using Apache Spark (Scala) on GCP.

· Perform end-to-end development activities including coding, unit testing, and integration testing.

· Manage and execute production deployments following established release processes.

· Conduct data validation and reconciliation across production and staging environments to ensure data accuracy and consistency.

· Monitor batch jobs, troubleshoot failures, and provide timely support for production issues.

· Optimize Spark jobs for performance, scalability, and cost efficiency on GCP.

· Maintain documentation for pipelines, processes, and operational procedures.

// // //