Senior Iceberg DBA / Lakehouse Operations Engineer

United Software Group Inc
Dallas, TX

Job Title: L3 – Senior Iceberg DBA / Lakehouse Operations Engineer

Location : Remote work accepted from anywhere in US

Duration : 9-12+ Months Contract

Teams Meeting Interview

Job Description:





• 10+ years of experience in Big Data / Data Engineering / DBA / Data Operations rol

es• Minimum 2+ years of hands-on experience with Apache Iceberg in production environmen

ts• 6+ years of experience working with Cloudera ecosystem (CDP Ecosyste

m)• Strong expertise i


  • n:
    Iceberg table optimization (compaction, metadata management, partition evoluti
  • on)Multi-engine performance tuning (Spark, Hive, Impa
  • la)Troubleshooting complex data and query performance iss


ues
• Proven experience handl


  • ing:
    P1/P2 production inci
  • dentsLarge-scale environments (TB/PB s
  • cale)Data migration initiatives (Hive/Teradata → Ice


berg)
• Lead enforcement of data modeling and Lakehouse standards across applic

ations• Guide tea


  • ms on:
    Medallion architecture impleme
  • ntationBalancing normalization vs perf


ormance
• Review and resolve complex data modeling and performance tr

ade-offs• Ensure consistency of data structures across domains and w

orkloads• Mentor and guide L2 resources in operational best practices and trouble


shooting
Requir


ed Skills
• Strong hands-on experience with Apache Iceberg and/or Hive-based

data lakes• Understanding of data modeling concepts (normal forms) and modern Lakehouse patterns (Medallion arc

hitecture)• Exp

ertise in:o Table-level optimization and performa

nce tuningo Large-scale data management (TB

/PB scale)• Experi

ence with:o Spark SQL, Hive, Impala, N

iFI, Trino• Strong underst

anding of:o Partitioning

strategieso File formats (Pa

rquet/ORC)o Distributed query


processing
Prefe

rred Skills • Exper


  • ience with:
    Hive-to-Iceberg or Teradata-to-Icebe
  • rg migrationCloudera C
  • DP (CDE/CDW)Famil
  • iarity with:Cloud platforms
  • (AWS, Azure)Scripting/automation (Py


thon, Shell

)Job SummaryWe are seeking a highly skilled Iceberg DBA / Lakehouse Operations Engineer to own the reliability, performance, and operational integrity of the Iceberg data layer powering enterprise analytics and business-critical a


pplications.
This role operates in a large-scale, multi-engine Lakehouse environment, supporting workloads across Spark, Hive, and Impala, and plays a key role in enterprise data modernization initiatives (Hive and Teradat


a → Iceberg).
The ideal candidate brings deep expertise in Iceberg table operations, metadata management, and query performance optimization, ensuring consistent, high-performance data access across platforms in a cloud-base


d environment.
This role is critical to ensuring data accuracy and performance—any degradation directly impacts downstream reporting, analytics, and business-critical d



ecision-making.

Key


Responsibilities:
Iceberg Data Layer Ownershi


p & Operations
• Own day-to-day operations of Apache Iceberg tables supporting multiple ente

rprise applications• Ensure data reliability, consistency, and availability across all

Lakehouse workloads• Maintain operational integrity for datasets at multi-terabyt


e to petabyte scale
Advanced Table Managemen


t & Optimization
• Execute advanced Iceberg table maintenance and opt

imization strategies:o Compaction (minor/major) and

small file mitigationo Snapshot expiration and metadata compaction to co

ntrol metadata growtho Orphan file cleanup (vacuum) to mainta

in storage efficiency• Optimize data layout and

performance through:o File size tuning and di

stribution strategieso Partition evolution and

pruning optimizationo Clustering and ordering techniques (e.g., Z-ordering


or similar patterns)
Data Modeling Standards & Lake


house Design Alignment
• Support and enforce data modeling best

practices aligned with:o Normalized data structures (3NF) for

source-aligned datasetso Medallion architecture (Bronze / Silver / Gold layers)

for curated data flows• Ensure Iceberg ta

ble design aligns with:o Data ingestion patterns

(raw vs curated layers)o Downstream consumption and p

erformance requirements• Assist in structuri

ng datasets to balance:o Data inte

grity and normalizationo Query performance an

d analytical efficiency• Work with data engineering teams to ensure consistent implementation of layered data architecture acros


s multiple applications
Multi-Engine Query Perfo


rmance & Consistency
• Ensure consistent and performa

nt query beha

vior across:o Spark (

CDE)o Hive / Impala (CDW)•

Troubleshoot and resolve:o Quer

y performance bottleneckso Metadata incon

sistencies across engineso Inefficient executio


n plans and scan patterns
Hive & Tera


data Modernization Support
• Play a key role in enterprise data platform modernization (H

ive and Te

radata → Iceberg)• Support:o Schema alig

nment and data type mappingo Data va

lidation and reconciliation• Troubleshoot migration-related issues and ensure post-migratio


n stability and performance
Metadata &am


p; Data Lifecycle Management
• Manag

e Iceberg metadata to ensure:o Effi

cient scaling and performanceo Consiste

nt table state across engines•

Execute lifecycle operations:o Data re

tention and archival policieso Snapshot lif

ecycle management and cleanupo Time-travel


optimization and maintenance
Production Support, Inc


  • ident Resolution & On-Call
    Provide L2/L3 support for data-related production issues across Ic
  • eberg-based Lakehouse workloadsParticipate in on-call rotation to support critical data platforms and ensu
  • re timely response to incidentsRespond to and resolve P1/P2 production incidents within defined SLAs, minimizing impact to downst
  • ream applicat
  • ions and reportingTroubleshoot:Data inconsistenc
  • ies and reporting discrepanciesQuery failu
  • res and performance degradationPerform root cause analysis (RCA) and implement preventive mea
  • sures to avoid recurring issuesCollaborate with platform and application teams during
  • incident triage and resolutionSecurit
  • y & Data Governance SupportSupport fin
  • e-grained access control
  • using:Ranger policies and RBACOwn and ensure data validation, reconciliation, and accuracy betw
  • een source and Iceberg datasetsEnsure secure and compliant acce


ss to data across applications.
// // //