Position Title: Data Scientist (Remote)
Location: Lexington, MA, USA, 02421
Duration: 05 months Contract on W2 (possible extension)
*********************NO C2C******************
Note* Candidates must should have an Active Clearance (secret/top secret, etc.)
Only US Citizen
Position Description:
- Designs, develops, and implements methods, processes, and systems to consolidate and analyze diverse data sets including structured and unstructured.
- Develops software programs, algorithms, dashboards, information tools, and queries to clean, model, integrate and evaluate datasets. Keeps abreast of new analytic methodologies and technologies.
- Collaborates with functional business units to drive business solutions and direction.
Key Responsibilities include but not limited to:
- Design, implement, and maintain enterprise-scale search solutions using Apache Solr
- Develop and optimize semantic search capabilities using vector embeddings and neural search models
- Build custom indexers and indexing pipelines that support vector embeddings alongside traditional text fields
- Implement and tune Approximate Nearest Neighbor (ANN) algorithms for efficient similarity search at scale
- Design and optimize similarity functions (cosine, dot product, Euclidean) for various search use cases
- Build hybrid search systems that combine traditional keyword-based search with vector-based semantic search
- Perform traditional relevancy engineering including query analysis, field weighting, boosting strategies, and result tuning
- Conduct relevancy analysis using quantitative metrics and qualitative evaluation methods
- Monitor search performance metrics and implement continuous improvements
- Work cross-functionally with product, engineering, and data teams to define search requirements
Required Qualifications:
- 5+ years of hands-on experience with Apache Solr or Lucene in production environments
- Strong expertise in traditional relevancy engineering including query parsing, field boosting, function queries, and relevance tuning
- Proven experience conducting relevancy analysis using both automated metrics and manual evaluation techniques
- Strong expertise in vector embeddings and their application to semantic search
- Proven experience building hybrid search systems that combine keyword and vector-based approaches
- Knowledge of search relevance metrics (NDCG, MRR, precision/recall)
- Excellent problem-solving and analytical skills
- Strong communication skills and ability to work in collaborative environments
Nice to Have:
- Databases and Data Engineering for Big Data
- Elasticsearch
- Statistical Methods
Clearance:
Candidates should have an active clearance (secret/top secret, etc.) in order to be considered for this position due to the nature of the work being done.
Interview Process:
- 1st round interview will be a Zoom with the hiring manager. 2nd round interview will be a Zoom with additional team members as needed.