Role: Lead Engineer – Java
Location: Chicago, IL
Contract
Mandatory Skills – Java, Spark, Data and Cloud (AWS / Azure / GCP) GCP Preferred
Job Description
- 8–12 years of experience in production-grade software engineering and data engineering, with a strong foundation in Java-based application development.
- Demonstrated progression from hands-on Java development roles into data engineering and platform-level responsibilities.
- Extensive experience designing, building, and operating Spark-based batch data processing systems using Java in cloud or distributed environments.
- Proven experience working on shared data platforms that support multiple downstream analytics use cases, reporting systems, and business functions.
- Strong exposure to enterprise data processing workloads, including large-scale structured and semi-structured data handling with performance and reliability considerations.
Key Expertise
1. Technical Skills
- Deep hands-on experience with Java as the primary programming language, including building scalable and maintainable applications for data processing and backend systems.
- Strong working knowledge of Apache Spark using the Java API, with the ability to design and implement robust batch processing pipelines.
- Experience working with cloud-based data platforms (GCP preferred), including services such as BigQuery and Cloud Storage, or equivalent services in other cloud environments.
- Strong understanding of data storage formats and access patterns, including Parquet, Avro, and JSON, with a focus on optimizing data layout for analytical workloads.
- Experience implementing CI/CD practices for data engineering solutions, including source control strategies, automated deployments, and environment promotion across development, testing, and production.
- Solid understanding of data security fundamentals, including secure data access patterns, credential management, and compliance-aware data handling.
2. Architecture & Design
- Ownership of solution and platform-level architecture for batch data processing systems built on Java and Spark.
- Strong foundation in data modeling principles, including normalization, denormalization, and analytics-oriented schema design based on consumption patterns.
- Proven experience designing and enforcing layered data architectures, including clear separation of raw, processed, and curated data layers.
- Ability to define and document architecture standards, design guidelines, and reusable frameworks for ingestion, transformation, and consumption layers.
- Experience reviewing technical designs across teams to ensure alignment with scalability, performance, and maintainability requirements.
- Strong understanding of integration patterns across upstream source systems and downstream consumers such as BI tools and reporting platforms.
3. Big Data & Analytics
- Deep understanding of OLTP and OLAP concepts, and the implications of analytical workloads on storage layout, compute sizing, and query performance.
- Proven experience designing and optimizing ETL / ELT frameworks capable of handling large volumes of structured and semi-structured data with predictable performance and reliability.
- Strong expertise in Spark performance tuning techniques, including partitioning strategies, join optimizations, caching decisions, and query execution analysis.
- Experience supporting enterprise analytics use cases by delivering high-quality, well-modeled datasets suitable for consumption by BI and reporting tools.
- Ability to diagnose and resolve complex data issues related to:
- Latency
- Data correctness
- Schema drift
- Pipeline failures in production environments
4. GenAI Adoption & Automation
- Practical experience evaluating and adopting AI-assisted development tools to improve developer productivity, code quality, and delivery velocity within data engineering teams.
- Understanding of how AI-driven techniques can be applied to data engineering use cases, such as anomaly detection, data quality monitoring, and operational insights.
- Ability to assess emerging GenAI capabilities pragmatically and integrate them into the platform in a controlled, value-driven manner without compromising stability or governance.
5. Observability & Performance Optimization (Good to Have)
- Experience defining observability practices for data platforms, including monitoring of pipeline health, job execution metrics, and operational alerts.
- Strong hands-on ability to troubleshoot distributed Spark workloads, identify performance bottlenecks, and drive corrective optimizations.
- Exposure to data lineage, metadata management, or operational dashboards to improve platform transparency and operational maturity.
Responsibilities
- Own and evolve the solution architecture for Java and Spark-based batch data platforms supporting multiple enterprise use cases.
- Act as a technical authority for data engineering design decisions, ensuring consistency, scalability, and long-term maintainability of the platform.
- Guide Technical Leads and Senior Engineers on architecture, design patterns, and implementation best practices through design reviews and hands-on collaboration.
- Ensure platform implementations meet defined non-functional requirements, including performance, reliability, security, and cost efficiency.
- Collaborate closely with enterprise architecture, cloud, and security teams to align platform design with organizational standards and constraints.
- Support delivery planning, technical estimation, and risk assessment for complex data engineering initiatives.
- Continuously assess platform gaps and drive improvements in architecture, tooling, and engineering practices.
Skills & Competencies
- Strong architectural judgment with the ability to balance immediate delivery needs against long-term platform sustainability.
- Excellent communication skills to articulate complex technical concepts to both engineering teams and senior stakeholders.
- Ability to operate effectively in ambiguous environments and make well-reasoned technical decisions.
- Proven capability to mentor and elevate the technical maturity of data engineering teams.