As a Lead DevSecOps Engineer, you will play a key role in driving the delivery and operations of DTCC’s Web3 Platform, with a primary focus on leading infrastructure development and operational excellence for the Canton/DAML-based distributed ledger environment. You will work closely with engineering teams to establish and maintain DevSecOps best practices, including CI/CD automation, infrastructure reliability, observability, and secure deployment patterns across both cloud and DLT components.
In this role, you will serve as the Canton/DAML infrastructure and operations subject matter expert (SME)—guiding platform design decisions, defining production-grade deployment patterns, and ensuring that Canton/DAML services are scalable, resilient, secure, and compliant. You will collaborate with in-house engineering teams and external vendor partners to integrate platform components and deliver production-ready Web3 capabilities, supporting globally distributed teams through hands-on troubleshooting, environment improvements, tooling enhancements, and automation initiatives.
Primary Responsibilites
- Lead the design, build, and operationalization of Canton/DAML infrastructure (e.g., participant nodes and supporting services), ensuring production readiness, resilience, scalability, and security.
- Own Canton/DAML environment strategy across development, test, staging, and production—standardizing environment configurations, release processes, and operational runbooks.
- Develop and maintain Infrastructure-as-Code (IaC) for blockchain/Web3 platform deployments, including network topology, identity and access controls, encryption/key management integrations, compute, storage, and secrets management.
- Build and evolve CI/CD pipelines for blockchain and associated application workloads, including automated validation/testing, security scanning, artifact promotion, and controlled releases with rollback strategies.
- Define and implement observability standards across platform services—metrics, logs, traces, dashboards, and alerting—supporting SLOs/SLAs and rapid incident response.
- Establish high availability (HA) and disaster recovery (DR) patterns for platform infrastructure (multi-zone/region design where applicable), including backup/restore, upgrade strategies, and operational readiness.
- Partner with architecture, risk, and security to ensure platform deployments align with enterprise security and compliance controls, including certificate management, IAM integration, least-privilege access, and auditability.
- Provide hands-on L3/L4 support for production platform operations, including performance tuning, incident triage, root cause analysis, and continuous improvement initiatives.
- Coordinate with internal teams and vendor/partners to support platform upgrades, configuration changes, and operational improvements, ensuring minimal disruption and strong change management practices.
Talents Needed for Success
- Hands-on experience operating and scaling Canton/DAML in production or production-like environments (infrastructure, deployments, upgrades, monitoring, and incident response).
- Demonstrated ability to act as a technical lead / SME for DLT infrastructure, defining deployment standards, operational processes, and reliability practices.
- Experience building secure CI/CD and IaC patterns for complex distributed systems (DLT, microservices, event-driven platforms).
Technical Requirements
- Canton/DAML: Experience with Canton and DAML application/platform lifecycle, including deployment architecture, environment management, security controls, and observability.
- Azure DLT / Web3: Experience supporting Azure-based DLT stacks (e.g., Canton, Besu) and integrating with enterprise cloud services.
- Security + Identity: Strong understanding of IAM, certificate-based auth, secrets management, encryption/key vault patterns, and network segmentation for secure DLT operations.
- Reliability Engineering: Proven experience implementing HA/DR, SRE practices, and performance optimization for distributed platforms.
Nice to Have
- Experience with DAML application build/release workflows and supporting developer enablement (tooling, templates, automation).
- Experience with distributed ledger operational concerns (latency, throughput, node lifecycle management, certificate rotation, topology changes).
- Background in SRE practices, including error budgets, capacity planning, and resilience testing (chaos testing, failure injection).