GPU Software Engineer ( AMD)
12+ months
San Jose, CA
Hybrid will wor
k
Position Overvi
ewTriune Infomatics is seeking an experienced GPU Software Engineer for a 12-month milestone-based engagement supporting a cutting-edge GPU software integration project. The consultant will work on AMD GPU platforms, drive AI stack development, contribute to open-source projects, and deliver performance benchmarking and integration reports across a structured set of monthly deliverable
s.This is a highly technical, hands-on role requiring deep expertise in GPU software stacks, ROCm, AI frameworks, and systems-level integratio
n.
Position Deta
- ilsProject Title: GPU SW Integration for Samsung Cog
- nosEngagement Type: Contract / Milestone-Based (12 Mont
- hs)Client Environment: AMD MI210 GPU, CXL Memory, NVMe Gen6, ROCm St
- ackDelivery Tools: Confluence, Jira, GitHub/GitLab (client-provid
ed)
Key Responsibili
- tiesDesign and develop GPU software modules aligned with project milesto
- nes.Perform systems integration and end-to-end testing of AI stack SW modu
- les.Validate AMD Infinity Bridge and AIS on MI210 GPU hardw
- are.Conduct functional and performance benchmarking (pSLC Firmware, CXL, RO
- Cm).Implement and validate SGLang changes for L3 to L1 memory transfer optimizat
- ion.Develop and contribute CaMa module changes to the ROCm software st
- ack.Collaborate with the SGLang open-source community and contribute code to their public GitHub r
- epo.Develop CaMa module for ROCm over Infinity Fabric/Ether
- net.Perform E2E performance benchmarking and publish formal benchmarking repo
- rts.Integrate CaMa changes into the Cognos AI stack and publish integration documentat
- ion.Scope UALink support for CaMa and publish an investigation/feasibility docum
- ent.Maintain all documentation, code, and status updates in Confluence, Jira, and GitHub/Git
Lab.
Required Skills and Qualifica
tionsGPU Software and Har
- dwareHands-on experience with AMD GPU platforms, specifically M
- I210.Proficiency with AMD ROCm software stack including kernel libraries and dri
- vers.Experience with AMD Infinity Bridge / Infinity Fabric architec
- ture.Familiarity with CXL (Compute Express Link) memory integra
- tion.Experience with NVMe storage and GPU Direct Storage (
GDS).AI Frameworks and Software
- StackExperience with SGLang or similar LLM inference framew
- orks.Familiarity with AI stack installation and end-to-end workload benchmar
- king.Knowledge of GPU memory hierarchy (HBM, L1/L3 cache) and data transfer optimiza
- tion.Proficiency in GPU kernel programming and library management (e.g., GDS, C
aMa).Programming and
- ToolsStrong proficiency in C/C++ and Python for GPU/systems-level develop
- ment.Experience with open-source contribution workflows (GitHub, pull requests, code revi
- ews).Familiarity with Jira and Confluence for project management and documenta
- tion.Experience with pSLC firmware validation and performance benchmarking methodolo
gies.Soft S
- killsAbility to work independently and deliver against defined monthly milest
- ones.Strong written communication skills for publishing technical reports and documenta
- tion.Collaborative mindset; ability to work with third-party teams (AMD, SGLang commun
ity).
Preferred Qualific
- ationsPrior experience with Samsung Cognos AI stack or similar enterprise AI plat
- forms.Familiarity with UALink protocol and its GPU interconnect applica
- tions.Prior open-source contributions to ROCm, SGLang, or similar GPU frame
- works.Experience presenting benchmarking results to semiconductor partners (AMD, NVIDIA,
etc.).