Senior Data Engineer

Greenfield Source Advisors
Boston, MA

Paving the future of cancer research requires creative and driven individuals who think differently about solutions to fundamental challenges blocking progress for cancer patients. Join us at Break Through Cancer for a rare opportunity with the potential to directly impact many lives. A distinctly new type of foundation, Break Through Cancer empowers outstanding researchers and physicians to intercept, as well as find cures, for the deadliest cancers by stimulating radical collaboration; and represents a first-in-kind partnership between many of the top cancer research centers in the world, including: Dana-Farber Cancer Institute, Johns Hopkins Sidney Kimmel Comprehensive Cancer Center, MIT’s Koch Institute for Integrative Cancer Research, The University of Texas MD Anderson Cancer Center, and Memorial Sloan Kettering Cancer Center.

 

Position Summary

 

Alongside many of the finest minds in cancer research, you will directly contribute to groundbreaking innovation and help tackle some of the hardest problems in areas of greatest unmet need. Part of what makes Break Through Cancer compelling is our belief that computational methods and data are an integral part of the entire research, development, and treatment process—not merely a quantitative afterthought. This ethos is codified into our first-of-kind Data Science Hub, which aims to rapidly unify the most advanced patient and molecular data with analysis methods and visual tools to analyze and explore them, making all of it accessible in as timely, frictionless, and simple manner as possible.

 

In this senior-level role you will work on-site at least 3 days per week, providing data engineering and software development to interdisciplinary projects ranging from basic research & discovery to clinical sample processing and iterative biological analysis. Motivated by helping to decipher fundamental biological processes that directly impact patient outcomes, the ideal candidate will exude a deep passion for cancer science and through this will:

 

·        help guide project execution and mentor junior engineers in best practices

·        help coordinate with scientists, bioinformatic developers, and clinicians across the BTC network

·        support cross-functional collaborations with partner cancer centers and industry consortia

·        work with CDO, CSO, and Programs Team to align data & technical strategy with BTC objectives

·        integrate, co-develop, and/or maintain analysis pipelines, software, databases and UIs/portals

·        embrace collecting, annotating & shepherding access to internal & external datasets

·        prioritize data standardization & accuracy towards the extraction of actionable scientific insight

·        continuously optimize scalability, software & data quality, and internal SOPs

·        help to harden, benchmark, and deploy cutting-edge bioinformatic methods

·        interface and iterate with stakeholders on data collection and requirements gathering

·        identify & fill gaps in data, software, documentation

·        demonstrate strong drive, flexibility, resilience, and positive attitude in the face of challenges

 

 

 

The Ideal Candidate Would Possess Most of These Experiences & Skills

 

·        Ph.D.

·        5+ years of data engineering, preferably in academic setting, with evidence of leadership

·        Strong proficiency in Python, UNIX shell and command line, AWS cloud infrastructure

·        Proficiency in SQL and NoSQL databases (Postgres and MongoDB preferred)

·        Effective communicator with very strong oral & written skills and attention to detail

·        Highly motivated thinker, who wants to put their own stamp on projects & responsibilities

·        Capable of working from incomplete information without micromanagement

·        Able to systematically prioritize deliverables across multiple projects

·        Proficiency in using APIs to drive systems to collect data, process, and compute upon it

·        Experience with public datasets used in biomedical research

·        Portals for data visualization & dashboarding with Jupyter, PANDAS, Streamlit, R/Shiny

 

Proficiency In or Exposure to the Following Would be Strong Pluses

 

·        Clinical trial research or operations

·        NextFlow, WDL, CWL pipeline workflow orchestration languages

·        Cirro, Seven Bridges, Terra, Synapse, CodeOcean, Foundry, DNAnexus analysis platforms

·        HTML, JavaScript, and Java tools / programming languages

·        Integration of external software tools, data repositories or research publications into concrete deliverables aligned with organizational goals

·        Performing root cause of failure analysis on data & processes to answer specific research questions or identify opportunities for improvement

·        Storage, pipelined analysis, and interpretation of large-scale:

·        DNA and RNA sequence data (bulk and single cell)

·        spatial profiling and pathology image data

·        clinical data elements

·        Multi-omic bioinformatics, statistics, data cleansing, integration, and analysis

·        Biomarker discovery, clinical trial sample processing and analysis

·        Cloud / big data toolchains: Docker, Kubernetes, Spark, Kafka, Parquet, HDF5

Oncology, immunology, immunotherapies for cancer

// // //