Senior Iceberg DBA / Lakehouse Operations Engineer

United Software Group Inc

Dallas, TX

Job Title: L3 – Senior Iceberg DBA / Lakehouse Operations Engineer

Location : Remote work accepted from anywhere in US

Duration : 9-12+ Months Contract

Teams Meeting Interview

Job Description:

• 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations rol

es• Minimum 2+ years of hands-on experience with Apache Iceberg in production environmen

ts• 6+ years of experience working with Cloudera ecosystem (CDP Ecosyste

m)• Strong expertise i

n:
Iceberg table optimization (compaction, metadata management, partition evoluti
on)Multi-engine performance tuning (Spark, Hive, Impa
la)Troubleshooting complex data and query performance iss

ues
• Proven experience handl

ing:
P1/P2 production inci
dentsLarge-scale environments (TB/PB s
cale)Data migration initiatives (Hive/Teradata → Ice

berg)
• Lead enforcement of data modeling and Lakehouse standards across applic

ations• Guide tea

ms on:
Medallion architecture impleme
ntationBalancing normalization vs perf

ormance
• Review and resolve complex data modeling and performance tr

ade-offs• Ensure consistency of data structures across domains and w

orkloads• Mentor and guide L2 resources in operational best practices and trouble

shooting
Requir

ed Skills
• Strong hands-on experience with Apache Iceberg and/or Hive-based

data lakes• Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion arc

hitecture)• Exp

ertise in:o Table-level optimization and performa

nce tuningo Large-scale data management (TB

/PB scale)• Experi

ence with:o Spark SQL, Hive, Impala, N

iFI, Trino• Strong underst

anding of:o Partitioning

strategieso File formats (Pa

rquet/ORC)o Distributed query

processing
Prefe

rred Skills • Exper

ience with:
Hive-to-Iceberg or Teradata-to-Icebe
rg migrationCloudera C
DP (CDE/CDW)Famil
iarity with:Cloud platforms
(AWS, Azure)Scripting/automation (Py

thon, Shell

)Job SummaryWe are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical a

pplications.
This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradat

a → Iceberg).
The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-base

d environment.
This role is critical to ensuring data accuracy and performance—any degradation directly impacts downstream reporting, analytics, and business-critical d

ecision-making.

Key

Responsibilities:
Iceberg Data Layer Ownershi

p & Operations
• Own day-to-day operations of Apache Iceberg tables supporting multiple ente

rprise applications• Ensure data reliability, consistency, and availability across all

Lakehouse workloads• Maintain operational integrity for datasets at multi-terabyt

e to petabyte scale
Advanced Table Managemen

t & Optimization
• Execute advanced Iceberg table maintenance and opt

imization strategies:o Compaction (minor/major) and

small file mitigationo Snapshot expiration and metadata compaction to co

ntrol metadata growtho Orphan file cleanup (vacuum) to mainta

in storage efficiency• Optimize data layout and

performance through:o File size tuning and di

stribution strategieso Partition evolution and

pruning optimizationo Clustering and ordering techniques (e.g., Z-ordering

or similar patterns)
Data Modeling Standards & Lake

house Design Alignment
• Support and enforce data modeling best

practices aligned with:o Normalized data structures (3NF) for

source-aligned datasetso Medallion architecture (Bronze / Silver / Gold layers)

for curated data flows• Ensure Iceberg ta

ble design aligns with:o Data ingestion patterns

(raw vs curated layers)o Downstream consumption and p

erformance requirements• Assist in structuri

ng datasets to balance:o Data inte

grity and normalizationo Query performance an

d analytical efficiency• Work with data engineering teams to ensure consistent implementation of layered data architecture acros

s multiple applications
Multi-Engine Query Perfo

rmance & Consistency
• Ensure consistent and performa

nt query beha

vior across:o Spark (

CDE)o Hive / Impala (CDW)•

Troubleshoot and resolve:o Quer

y performance bottleneckso Metadata incon

sistencies across engineso Inefficient executio

n plans and scan patterns
Hive & Tera

data Modernization Support
• Play a key role in enterprise data platform modernization (H

ive and Te

radata → Iceberg)• Support:o Schema alig

nment and data type mappingo Data va

lidation and reconciliation• Troubleshoot migration-related issues and ensure post-migratio

n stability and performance
Metadata &am

p; Data Lifecycle Management
• Manag

e Iceberg metadata to ensure:o Effi

cient scaling and performanceo Consiste

nt table state across engines•

Execute lifecycle operations:o Data re

tention and archival policieso Snapshot lif

ecycle management and cleanupo Time-travel

optimization and maintenance
Production Support, Inc

ident Resolution & On-Call
Provide L2/L3 support for data-related production issues across Ic
eberg-based Lakehouse workloadsParticipate in on-call rotation to support critical data platforms and ensu
re timely response to incidentsRespond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downst
ream applicat
ions and reportingTroubleshoot:Data inconsistenc
ies and reporting discrepanciesQuery failu
res and performance degradationPerform root cause analysis (RCA) and implement preventive mea
sures to avoid recurring issuesCollaborate with platform and application teams during
incident triage and resolutionSecurit
y & Data Governance SupportSupport fin
e-grained access control
using:Ranger policies and RBACOwn and ensure data validation, reconciliation, and accuracy betw
een source and Iceberg datasetsEnsure secure and compliant acce

ss to data across applications.