GPU Software Engineer (AMD)

Triune Infomatics Inc
San Jose, CA

GPU Software Engineer ( AMD)

12+ months

San Jose, CA



Hybrid will wor


k
Position Overvi

ewTriune Infomatics is seeking an experienced GPU Software Engineer for a 12-month milestone-based engagement supporting a cutting-edge GPU software integration project. The consultant will work on AMD GPU platforms, drive AI stack development, contribute to open-source projects, and deliver performance benchmarking and integration reports across a structured set of monthly deliverable

s.This is a highly technical, hands-on role requiring deep expertise in GPU software stacks, ROCm, AI frameworks, and systems-level integratio


n.
Position Deta

  • ilsProject Title: GPU SW Integration for Samsung Cog
  • nosEngagement Type: Contract / Milestone-Based (12 Mont
  • hs)Client Environment: AMD MI210 GPU, CXL Memory, NVMe Gen6, ROCm St
  • ackDelivery Tools: Confluence, Jira, GitHub/GitLab (client-provid


ed)
Key Responsibili

  • tiesDesign and develop GPU software modules aligned with project milesto
  • nes.Perform systems integration and end-to-end testing of AI stack SW modu
  • les.Validate AMD Infinity Bridge and AIS on MI210 GPU hardw
  • are.Conduct functional and performance benchmarking (pSLC Firmware, CXL, RO
  • Cm).Implement and validate SGLang changes for L3 to L1 memory transfer optimizat
  • ion.Develop and contribute CaMa module changes to the ROCm software st
  • ack.Collaborate with the SGLang open-source community and contribute code to their public GitHub r
  • epo.Develop CaMa module for ROCm over Infinity Fabric/Ether
  • net.Perform E2E performance benchmarking and publish formal benchmarking repo
  • rts.Integrate CaMa changes into the Cognos AI stack and publish integration documentat
  • ion.Scope UALink support for CaMa and publish an investigation/feasibility docum
  • ent.Maintain all documentation, code, and status updates in Confluence, Jira, and GitHub/Git


Lab.
Required Skills and Qualifica

tionsGPU Software and Har

  • dwareHands-on experience with AMD GPU platforms, specifically M
  • I210.Proficiency with AMD ROCm software stack including kernel libraries and dri
  • vers.Experience with AMD Infinity Bridge / Infinity Fabric architec
  • ture.Familiarity with CXL (Compute Express Link) memory integra
  • tion.Experience with NVMe storage and GPU Direct Storage (

GDS).AI Frameworks and Software

  • StackExperience with SGLang or similar LLM inference framew
  • orks.Familiarity with AI stack installation and end-to-end workload benchmar
  • king.Knowledge of GPU memory hierarchy (HBM, L1/L3 cache) and data transfer optimiza
  • tion.Proficiency in GPU kernel programming and library management (e.g., GDS, C

aMa).Programming and

  • ToolsStrong proficiency in C/C++ and Python for GPU/systems-level develop
  • ment.Experience with open-source contribution workflows (GitHub, pull requests, code revi
  • ews).Familiarity with Jira and Confluence for project management and documenta
  • tion.Experience with pSLC firmware validation and performance benchmarking methodolo

gies.Soft S

  • killsAbility to work independently and deliver against defined monthly milest
  • ones.Strong written communication skills for publishing technical reports and documenta
  • tion.Collaborative mindset; ability to work with third-party teams (AMD, SGLang commun


ity).
Preferred Qualific

  • ationsPrior experience with Samsung Cognos AI stack or similar enterprise AI plat
  • forms.Familiarity with UALink protocol and its GPU interconnect applica
  • tions.Prior open-source contributions to ROCm, SGLang, or similar GPU frame
  • works.Experience presenting benchmarking results to semiconductor partners (AMD, NVIDIA,


etc.).
// // //