Sr. DevOps Platform Engineer

Berkley
Wilmington, DE

Company Details

Company URL: https://www.berkleytechnologyservices.com/

Berkley Technology Services (BTS) is the dynamic technology solution for W. R. Berkley Corporation, a Fortune 500 Commercial Lines Insurance Company. With key locations in Urbandale, IA and Wilmington, DE, BTS provides innovative and customer-focused IT solutions to the majority of WRBC’s 60+ operating units across the globe. BTS’s wide reach ensures that ideas and opinions are considered at every level of the organization to guarantee we find the best solutions possible.

Driven by a commitment to collaboration, BTS acts as consultants to our customers and Operating Units by providing comprehensive solutions that not only address the challenge at hand, but proactively plan for the “What’s Next” in our industry and beyond.

With a culture centered on innovation and entrepreneurial spirit, BTS stands as a community of technology leaders with eyes toward the future -- leaders who genuinely care about growing not only their team members, but themselves, and take pride in their employees who shine. BTS offers endless ways to get involved and have the chance to grow your career into a wide range of roles you had never known existed. Come join us as we push forward into the future of industry’s leading technological solutions.

Berkley Technology Services: Right Team, Right Technology, Simple and Secure.

Responsibilities

As a Senior DevOps Platform Engineer, you will play a critical role in ensuring the reliability, scalability, security, and performance of Berkley’s software systems. You will collaborate closely with product engineering, infrastructure, and architecture teams to build, mature, and operate an enterprise DevOps platform that enables teams to deliver software safely, efficiently, and at scale.

This role blends DevOps platform engineering and SRE practices, with a focus on CI/CD, observability, automation, and reliability across both cloud and on‑premises environments.

  • Maintain a strong understanding of the entire technology stack (networking, storage, OS, virtualization, databases, development frameworks, and applications) to design, observe, troubleshoot, and automate systems across the Berkley environment.
  • Design, build, and mature enterprise CI/CD pipelines and shared DevOps platform services, enabling secure, reliable, and scalable software delivery for multiple teams.
  • Define, implement, and track reliability and observability OKRs, including SLIs and SLOs, to guide reliability engineering, deployment practices, and operational decision‑making.
  • Implement and evolve monitoring, alerting, and observability solutions, including AIOps capabilities, to proactively assess system health, detect anomalies, enable self‑healing, and support rapid incident response.
  • Drive automation initiatives to eliminate operational toil, streamline platform and pipeline workflows, reduce manual intervention, and improve efficiency for product engineering and SRE teams.
  • Identify and address performance, scalability, and reliability bottlenecks across applications, infrastructure, and delivery pipelines to improve system efficiency and user experience.
  • Partner with incident management and operations teams to respond to, resolve, and prevent system outages or degradation, minimizing downtime and customer impact.
  • Collaborate actively with development, operations, and platform teams to embed resiliency, observability, security, and reliability requirements into system design, CI/CD pipelines, and runtime environments.
  • Lead cross‑functional coordination with product, development, infrastructure, and architecture teams to perform capacity planning, anticipate growth, and ensure systems scale reliably with business demand.
  • Continuously improve platform resilience by identifying and closing gaps in architecture, tooling, processes, and operational practices.
  • Modernize and strengthen disaster recovery capabilities for both on‑premises and cloud‑based Berkley solutions, ensuring recoverability, resilience, and compliance with enterprise standards.

Qualifications

  • 5+ years of experience in DevOps and Site Reliability Engineering, with hands‑on ownership of infrastructure, CI/CD platforms, and software delivery in enterprise environments.
  • Strong software engineering and automation skills, including proficiency in Python, Go, Bash, or JavaScript, and experience building production‑grade automation.
  • Proven expertise in enterprise CI/CD, GitOps, and containerized platforms, including Kubernetes, Helm, and cloud‑native delivery patterns.
  • Deep experience with reliability and observability, including monitoring, alerting, logging, and tracing platforms (e.g., Dynatrace, Datadog, ELK), and defining SLIs, SLOs, and reliability metrics.
  • Strong understanding of cloud, on‑prem, and hybrid architectures, including high availability, disaster recovery, capacity planning, and scalability.
  • Hands‑on experience with infrastructure as code and configuration management (e.g., Terraform, Ansible, GitHub Actions) to reduce operational toil and enable self‑service.
  • Solid knowledge of security and networking fundamentals, including applying industry‑standard security frameworks in enterprise environments.
  • Demonstrated ability to lead technical initiatives, influence system design decisions, mentor engineers, and collaborate effectively across product, engineering, infrastructure, and security teams.
  • Bachelor’s degree with emphasis in related field or equivalent experience.

Behavioral Core Competencies

  • Strategic
  • Influential
  • Organizational Navigation
  • Balanced Approach
  • Commandership Skills
  • Composure

The Company is an equal employment opportunity employer.

As a Senior DevOps Platform Engineer, you will play a critical role in ensuring the reliability, scalability, security, and performance of Berkley’s software systems. You will collaborate closely with product engineering, infrastructure, and architecture teams to build, mature, and operate an enterprise DevOps platform that enables teams to deliver software safely, efficiently, and at scale. This role blends DevOps platform engineering and SRE practices, with a focus on CI/CD, observability, automation, and reliability across both cloud and on‑premises environments. - Maintain a strong understanding of the entire technology stack (networking, storage, OS, virtualization, databases, development frameworks, and applications) to design, observe, troubleshoot, and automate systems across the Berkley environment. - Design, build, and mature enterprise CI/CD pipelines and shared DevOps platform services, enabling secure, reliable, and scalable software delivery for multiple teams. - Define, implement, and track reliability and observability OKRs, including SLIs and SLOs, to guide reliability engineering, deployment practices, and operational decision‑making. - Implement and evolve monitoring, alerting, and observability solutions, including AIOps capabilities, to proactively assess system health, detect anomalies, enable self‑healing, and support rapid incident response. - Drive automation initiatives to eliminate operational toil, streamline platform and pipeline workflows, reduce manual intervention, and improve efficiency for product engineering and SRE teams. - Identify and address performance, scalability, and reliability bottlenecks across applications, infrastructure, and delivery pipelines to improve system efficiency and user experience. - Partner with incident management and operations teams to respond to, resolve, and prevent system outages or degradation, minimizing downtime and customer impact. - Collaborate actively with development, operations, and platform teams to embed resiliency, observability, security, and reliability requirements into system design, CI/CD pipelines, and runtime environments. - Lead cross‑functional coordination with product, development, infrastructure, and architecture teams to perform capacity planning, anticipate growth, and ensure systems scale reliably with business demand. - Continuously improve platform resilience by identifying and closing gaps in architecture, tooling, processes, and operational practices. - Modernize and strengthen disaster recovery capabilities for both on‑premises and cloud‑based Berkley solutions, ensuring recoverability, resilience, and compliance with enterprise standards. Mid-Senior Level
// // //