Software Development Engineer

SPECTRAFORCE
Lake County, IL

Role: Software Development Engineer

Location: Lake County, IL (Hybrid – 3 days/week onsite)

Duration: 5+ months (possibility of extension)


Job Description:

We are looking for a Software Development Engineer to build and scale an AI-powered document parsing platform that extracts structured data from complex PDFs (pharmaceutical batch records, certificates, regulatory documents) using OCR, LLMs, and RAG. You will work across the full stack — backend AI pipelines, frontend chat interface, and cloud infrastructure.


Roles & Responsibilities:

• Design and develop production-grade RAG (Retrieval-Augmented Generation) pipelines for domain-specific document querying with hybrid search, reranking, and multi-agent answer synthesis

• Build and optimize document processing pipelines using AWS Textract for OCR extraction from tables, handwritten content, and structured forms

• Integrate and orchestrate multiple LLM models (Claude, Gemini) for intent classification, data extraction, validation, and conversational AI

• Develop and maintain the FastAPI backend — REST APIs, streaming endpoints (SSE), authentication, and background task processing

• Build responsive frontend features using Next.js, React, and TypeScript — chat interface, PDF viewer with highlights, real-time progress tracking

• Manage cloud infrastructure on AWS — EC2 deployment, S3 storage, RDS (PostgreSQL), and IAM configuration

• Work with vector databases (Weaviate) and graph databases (Neo4j) for semantic search and structural document querying

• Implement chunking strategies, embedding generation, cross-encoder reranking, and semantic caching for accurate document retrieval

• Deploy and monitor AI models and services in production — model fallback chains, retry mechanisms, error handling

• Write clean, maintainable code with proper logging, error handling, and documentation


Required Skills:

• Python (FastAPI, async programming, pandas)

• TypeScript / React (Next.js)

• RAG systems — vector search, embeddings, chunking, reranking (production-grade)

• LLM integration — prompt engineering, structured output, multi-model orchestration

• AWS — EC2, S3, Textract, RDS

• PostgreSQL

• REST API design with streaming (SSE)

• Git, basic CI/CD, Linux server management


Good to Have:

• Weaviate, Neo4j, or similar vector/graph databases

• Gemini Vision or GPT-4V for document image analysis

• LangChain / LangGraph

• Docke, nginx

• Pharmaceutical/regulated document experience


Experience:

• 3–6 years

// // //