You own the full model lifecycle from post-training to evaluation to production serving. Vecna's models execute multi-step task chains across complex environments, route to the right tools and reasoning strategies, and improve from real-world trajectories.The hard part of training agents isn't fitting a curve to a benchmark it's training models that stay coherent across hundreds of tool calls in environments they've never seen, reason over structured context that doesn't fit in any prompt, and recover when their plan breaks at step 173. Static benchmarks don't measure this. Static datasets don't teach it. We post-train on real trajectories, reward at the trajectory level, and evaluate against environments that fight back.Your work sets the capability ceiling of the platform. You'll work directly with the founders on the methodology that turns base models into Virtual Workers, and your training pipelines, evaluation harnesses, and serving infrastructure become the loop every other team relies on to ship.What You'll OwnPost-training and alignment pipeline SFT, DPO, GRPO, and trajectory-level rewards over real agent runs, built on a distributed training stack designed for long-horizon, tool-using behaviorAgentic online RL models that learn from live tool interactions, environment feedback, and trajectory outcomes rather than static datasets, with reward shaping that captures partial progress, recovery quality, and strategic coherenceContext graphs as model substrate designing how models read from, write to, and reason over persistent graphs of entities, relationships, and environmental state, including subgraph retrieval at inference time, graph-aware prompt construction, and post-training signals that reward grounded, graph-consistent reasoningGraph-based reasoning over relational environments path traversal, link prediction, and next-hop selection across complex topologies, including how models learn to plan multi-step trajectories through graph-structured state and update their world model as new information landsEvaluation infrastructure offline benchmarks, LLM-as-judge frameworks, trajectory scoring, red-teaming, and automated regression testing that catches capability drift before it shipsInference optimization and serving quantization, speculative decoding, KV-cache management, and production deployment tuned for agentic workloads with bursty, long-context usage patternsYou Might Be a Fit If YouHave a PhD or equivalent research depth in ML, with peer-reviewed publications in alignment, post-training, reinforcement learning, or agentic systemsHave owned a post-training pipeline end-to-end at a startup or research lab from data curation through RLHF/DPO/GRPO training to evaluation and deploymentHave built evaluation infrastructure for LLMs including automated benchmarks, human eval pipelines, and statistical analysis of model behavior under distribution shiftHave production experience with inference optimization and model serving at scale, including quantization, batching, and KV-cache strategies for long-context workloadsAre proficient with PyTorch and modern distributed training and serving frameworks, and understand the systems-level tradeoffs that determine whether a research idea actually scalesHave experience training or fine-tuning models against graph-structured context retrieval over knowledge or memory graphs, GNN architectures, or hybrid LLM-graph systemsHave experience with graph algorithms including path traversal, link prediction, and subgraph reasoningThrive in early-stage environments, ship research into production, and want to work where ML research meets real-world impact
recblid pwzlaxqjkwu3xlteph3568l5ruf6qx
Not Specified