Job Title: L3 – Senior Iceberg DBA / Lakehouse Operations Engineer
Location : Remote work accepted from anywhere in US
Duration : 9-12+ Months Contract
Teams Meeting Interview
Job Description:
• 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations rol
es• Minimum 2+ years of hands-on experience with Apache Iceberg in production environmen
ts• 6+ years of experience working with Cloudera ecosystem (CDP Ecosyste
m)• Strong expertise i
ues
• Proven experience handl
berg)
• Lead enforcement of data modeling and Lakehouse standards across applic
ations• Guide tea
ormance
• Review and resolve complex data modeling and performance tr
ade-offs• Ensure consistency of data structures across domains and w
orkloads• Mentor and guide L2 resources in operational best practices and trouble
shooting
Requir
ed Skills
• Strong hands-on experience with Apache Iceberg and/or Hive-based
data lakes• Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion arc
hitecture)• Exp
ertise in:o Table-level optimization and performa
nce tuningo Large-scale data management (TB
/PB scale)• Experi
ence with:o Spark SQL, Hive, Impala, N
iFI, Trino• Strong underst
anding of:o Partitioning
strategieso File formats (Pa
rquet/ORC)o Distributed query
processing
Prefe
rred Skills • Exper
thon, Shell
)Job SummaryWe are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical a
pplications.
This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradat
a → Iceberg).
The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-base
d environment.
This role is critical to ensuring data accuracy and performance—any degradation directly impacts downstream reporting, analytics, and business-critical d
ecision-making.
Key
Responsibilities:
Iceberg Data Layer Ownershi
p & Operations
• Own day-to-day operations of Apache Iceberg tables supporting multiple ente
rprise applications• Ensure data reliability, consistency, and availability across all
Lakehouse workloads• Maintain operational integrity for datasets at multi-terabyt
e to petabyte scale
Advanced Table Managemen
t & Optimization
• Execute advanced Iceberg table maintenance and opt
imization strategies:o Compaction (minor/major) and
small file mitigationo Snapshot expiration and metadata compaction to co
ntrol metadata growtho Orphan file cleanup (vacuum) to mainta
in storage efficiency• Optimize data layout and
performance through:o File size tuning and di
stribution strategieso Partition evolution and
pruning optimizationo Clustering and ordering techniques (e.g., Z-ordering
or similar patterns)
Data Modeling Standards & Lake
house Design Alignment
• Support and enforce data modeling best
practices aligned with:o Normalized data structures (3NF) for
source-aligned datasetso Medallion architecture (Bronze / Silver / Gold layers)
for curated data flows• Ensure Iceberg ta
ble design aligns with:o Data ingestion patterns
(raw vs curated layers)o Downstream consumption and p
erformance requirements• Assist in structuri
ng datasets to balance:o Data inte
grity and normalizationo Query performance an
d analytical efficiency• Work with data engineering teams to ensure consistent implementation of layered data architecture acros
s multiple applications
Multi-Engine Query Perfo
rmance & Consistency
• Ensure consistent and performa
nt query beha
vior across:o Spark (
CDE)o Hive / Impala (CDW)•
Troubleshoot and resolve:o Quer
y performance bottleneckso Metadata incon
sistencies across engineso Inefficient executio
n plans and scan patterns
Hive & Tera
data Modernization Support
• Play a key role in enterprise data platform modernization (H
ive and Te
radata → Iceberg)• Support:o Schema alig
nment and data type mappingo Data va
lidation and reconciliation• Troubleshoot migration-related issues and ensure post-migratio
n stability and performance
Metadata &am
p; Data Lifecycle Management
• Manag
e Iceberg metadata to ensure:o Effi
cient scaling and performanceo Consiste
nt table state across engines•
Execute lifecycle operations:o Data re
tention and archival policieso Snapshot lif
ecycle management and cleanupo Time-travel
optimization and maintenance
Production Support, Inc