Senior Developer, AI Evaluation & Cloud Infrastructure | Just Horizons Alliance
Join us to build the technical foundation for AI accountability.
The Role
Just Horizons Alliance is an 18-year-old applied research lab focused on ethics and technology. Our current focus is the AI Ethics Index, a measurement framework for evaluating AI systems on ethics, safety, and societal impact.
We need a senior engineer to own the technical infrastructure end-to-end: learn what exists, close critical gaps, and build something that lasts.
The evaluation methodology is validated and in use. We're now at the stage where the systems need to mature alongside the research. This is the first dedicated infrastructure hire for this work, and you'll shape how it scales.
What You’ll Do
Months 1–3: Learn the System
Map the current architecture with Sophia Zitman (AIEI Team Lead). Understand the evaluation methodology, the data flows, and the infrastructure that supports them. Identify what needs to evolve for multi-domain benchmarking—reproducibility, security posture, test coverage, deployment pipeline. Begin implementing the highest-priority improvements.
Months 4–6: Build for Scale
Architect the infrastructure to support the next phase of the Index. CI/CD that maintains stability as the system grows. IAM and secret management built for a production environment. Experiment tracking that makes every evaluation run auditable. Documentation that enables the research team to work independently.
Months 7–12: Expand
Multi-domain benchmarking across education, healthcare, finance, and other sectors. Reproducibility standards that meet external scientific scrutiny. A system the research team can extend without engineering support for every change. At this point, the infrastructure should be stable enough that you're focused on capability, not maintenance.
Why This Role Is Difficult
This is infrastructure for a scientific standard, not a product feature.
Correctness and delivery both matter. A bug in the evaluation engine doesn't break a feature, instead it invalidates a measurement. A flawed pipeline doesn't slow things down, it compromises the credibility of the research. At the same time, methodology that never runs in production has no impact. The role requires both rigor and momentum.
You're translating between disciplines. Your stakeholders are researchers, ethicists, and governance specialists. You'll need to take concepts like "operationalizing an ethical construct" and turn them into data models and pipelines. This is a translation problem as much as an engineering problem.
The work is novel. There's no existing system to reference. The AI Ethics Index is defining what rigorous AI evaluation looks like. You'll be making architectural decisions in areas where best practices don't yet exist.
You'll have full ownership. This is not a role where you're executing someone else's technical vision. You're setting the direction. That means autonomy, but it also means accountability.
You're probably the right person if
✅ You've built evaluation systems or data pipelines that other people depended on for correctness, not just uptime
✅ You're comfortable with GCP and have deployed containerized workloads in a real production context
✅ You've worked with LLM APIs and understand their reliability and reproducibility characteristics
✅ You can read a paper about measurement methodology and turn it into a working data structure
✅ You build for durability. Your code is still running 18 months later because you thought about the next person
✅ You've worked somewhere between 5 and 50 people and you're comfortable being the person who figures things out without a playbook
✅ You find working on AI ethics infrastructure more interesting than building another e-commerce checkout flow
You're probably not the right fit if
❌ Enterprise environments make up most of your experience. This is not a large-team context
❌ You need clearly defined requirements before you can start. The requirements here evolve through conversation with ethicists
❌ You're based on the West Coast US or expect West Coast US working hours
❌ You mainly build user-facing APIs and features — this is systems and infrastructure work
❌ You're looking for a high-growth startup where shipping speed is everything. This is a scientific organization. Correctness is everything.
Hard Skills
These are the technical capabilities you need going in — or need to be able to build up fast using an AI coding agent. We're not looking for someone who ticks every box. We're looking for someone who closes gaps quickly and knows how to learn.
What you get
The role: You'll work directly with Sophia Zitman (AIEI Team Lead) as the technical backbone of the AI Ethics Index. Full engineering ownership of the evaluation engine.
The comp: Base salary $110,000.
The team: Small, split between ethicists and engineers. You will interview with Janet Kang (Executive Director) and Sophia Zitman (AIEI Team Lead).
The environment: Boston-based non-profit (501(c)(3)). East Coast US or Western Europe time zones. Collaborative but autonomous — Sophia won't micromanage, but she will hold you to a high standard of systems thinking.
The upside: You'll have built the technical foundation of what may become the globally referenced standard for AI system evaluation. That's a meaningful line on any CV — and a genuinely hard thing to have done.