Location: Hershey, PA or Dallas, TX
Summary
The Platform Engineer, Data & Analytics Platforms runs and continuously improves the enterprise data and analytics platforms that power Hershey’s data products. This role focuses on platform operations and enablement—standardizing environments, automating delivery, improving reliability and observability, and reducing time-to-value for Data Product teams across development, test, and production.
The Data Platform Engineer partners with Senior Data Engineers, Solution Architects, the Cloud COE, and Security to define guardrails and operating standards, deliver self-service tooling, and keep the platform cost-effective, secure, observable, reliable, and scalable.
What We Are Building for Hershey
This role supports Hershey’s enterprise data strategy by operating and enabling a trusted, governed data platform at scale. The platform team turns one-off solutions into reusable templates, guardrails, and automated workflows—improving reliability, cost transparency, and developer experience so Data Product teams can deliver high-quality data products faster.
Major Duties & Responsibilities
1. Data Platform Components
- Own and operate core data platform components (with emphasis on Databricks and supporting Azure services) across development, test, and production.
- Build and maintain CI/CD and environment standardization using Azure DevOps and infrastructure-as-code (e.g., Terraform) to improve consistency, security, and delivery speed.
- Implement observability (logging, monitoring, alerting, dashboards) and maintain operational runbooks to enable proactive detection and faster recovery.
- Implement identity/access controls, secrets management, and configuration standards in partnership with the Cloud COE and Security.
- Plan and execute platform releases and upgrades (libraries, runtimes, clusters/pools) and coordinate change communications to minimize disruption for Data Product teams.
2. Machine Learning Operations (MLOps)
- Enable MLOps capabilities (e.g., MLflow standards, deployment patterns, automation) in partnership with Data Science and engineering teams.
3. Governance, Quality & Operations
- Implement governance, security, and compliance standards through platform guardrails (policies, templates, controls) and clear documentation.
- Support FinOps by monitoring usage, identifying optimization opportunities (clusters, jobs, storage), and improving cost transparency (e.g., tagging and showback/chargeback inputs).
- Monitor platform health, resolve incidents, perform root-cause analysis, and drive problem management to improve stability and meet agreed service levels.
- Define, track, and report operational KPIs (availability, performance, deployment frequency, MTTR) and drive continuous improvement through automation and standardization.
- Provide operational support during standard business hours, with planned maintenance windows and documented support processes (no on-call rotation).
4. Collaboration Across Domains
- Enable Data Product teams with self-service tooling, reusable patterns/templates, and onboarding/training; manage a clear intake and prioritization process; and partner on platform performance and operational readiness.
Minimum Knowledge, Skills, and Abilities
- Cloud & Platforms: Hands-on administration and operations for Databricks and Azure data platform services. Strong understanding of environment provisioning, secrets management, identity/access, and networking patterns. Infrastructure-as-code experience (e.g., Terraform) is strongly preferred.
- Programming & Development: Proficient in Python and SQL for automation and troubleshooting; experience with modular coding, APIs, scripting, and source control (Git).
- DevOps: Experience implementing CI/CD in Azure DevOps, managing releases, and improving deployment safety through testing, approvals, and consistent branching/versioning practices.
- MLOps (nice to have): Familiarity with MLflow and operational patterns for model lifecycle management.
- Operations & Observability: Experience implementing monitoring/alerting and using operational metrics to drive reliability improvements; familiarity with incident management and root-cause analysis.
- Collaboration & Communication: Communicate best practices and technical solutions effectively across teams.
Experience & Education
- Bachelor’s degree in Computer Science, Engineering, Information Systems, Data Science, or related field
- 2–5 years in platform engineering roles.
#LI-MH1
#LI-Remote
The Hershey Company is an Equal Opportunity Employer. The policy of The Hershey Company is to extend opportunities to qualified applicants and employees on an equal basis regardless of an individual's race, color, gender, age, national origin, religion, citizenship status, marital status, sexual orientation, gender identity, transgender status, physical or mental disability, protected veteran status, genetic information, pregnancy, or any other categories protected by applicable federal, state or local laws.
The Hershey Company is an Equal Opportunity Employer - Minority/Female/Disabled/Protected Veterans.
You may request a reasonable accommodation if you are unable or limited in your ability to use or access our online application process as a result of a disability.
You can request an accommodation via phone or email.
To request an accommodation via phone, please call +1 877-804-1794 and leave a voicemail with your contact information. You may also email a request for accommodation to ApplicationHelp@hersheys.com. Please be sure to include “Accommodation Needed” in the subject line. This will ensure that your email is routed to the appropriate contact who will handle your request.