Bioinformatics Analyst III, ITEB, CGR Job ID: req4285
Employee Type: exempt full-time
Division: Clinical Research Program
Facility: Rockville: 9615 MedCtrDr
Location: 9615 Medical Center Drive, Rockville, MD 20850 USA
The Frederick National Laboratory is operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases.
Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way. PROGRAM DESCRIPTION We are seeking a skilled and motivated bioinformatics professional to join the Cancer Genomics Research Laboratory (CGR), located at the National Cancer Institute (NCI) Shady Grove campus in Rockville, MD. CGR is operated by Leidos Biomedical Research, Inc., and collaborates with the NCI's Division of Cancer Epidemiology and Genetics (DCEG)-the world's leading cancer epidemiology research group. Our scientific team leverages cutting-edge technologies to investigate genetic, epigenetic, transcriptomic, proteomic, and molecular factors that drive cancer susceptibility and outcomes. We are deeply committed to the mission of discovering the causes of cancer and advancing new prevention strategies through our contributions to DCEG's pioneering research.
Our team of CGR bioinformaticians supports DCEG's multidisciplinary family- and population-based studies by working closely with epidemiologists, biostatisticians, and basic research scientists in DCEG's intramural research program. We provide end-to-end bioinformatics support for genome-wide association studies (GWAS), methylation, targeted, whole-exome, whole-transcriptome and whole-genome sequencing along with viral and metagenomic studies from both short- and long-read sequencing platforms. This includes the analysis of germline and somatic variants, structural variations, copy number variations, gene and isoform expression, base modifications, viral and bacterial genomics, and more. Additionally, we advance cancer research by integrating latest technologies such as single cell, multiomics, spatial transcriptomics, and proteomics, in collaboration with the Functional and Molecular and Digital Pathology Laboratory groups within CGR. We extensively analyze large population databases such as All of Us, UK Biobank, gnomAD and 1000 genomes to inform and validate GWAS signals, study the association between genetic variation and gene expression, protein levels, metabolites and develop polygenic risk scores across multiple populations.
Our bioinformatics team develops and implements sophisticated, cloud-enabled pipelines and data analysis methodologies, blending traditional bioinformatics and statistical approaches with cutting-edge techniques like machine learning, deep learning, and generative AI models. We prioritize reproducibility through the use of containerization, workflow management tools, thorough benchmarking, and detailed workflow documentation. Our infrastructure and data management team works closely with researchers and bioinformaticians to maintain and optimize a high-performance computing (HPC) cluster, provision cloud environments, and curate and share large datasets.
The successful candidate will provide dedicated analytical support to the Integrative Tumor Epidemiology Branch (ITEB) and contribute to cancer research in areas such as GWAS, germline and somatic variant analysis, single-cell RNA sequencing, and proteomics expression analysis. The bioinformatics analyst will support the installation, troubleshooting, and execution of analytical pipelines using open-source scientific software on Unix/Linux and cloud-based platforms. They will leverage publicly available bioinformatics and genomic databases, as well as analysis pipelines, to process various data types, including genome-wide genotyping arrays, long-read DNA sequencing, gene expression, proteomic profiling, and methylation profiling across diverse tissues and cancer types.
Working closely with DCEG investigators and CGR bioinformaticians and scientists, the analyst will operate with a high degree of independence. This role involves handling large-scale sequencing data, developing robust pipelines, and collaborating with interdisciplinary teams to derive meaningful biological insights. The candidate will be expected to:
KEY ROLES/RESPONSIBILITIES - Develop, implement, and optimize analytical pipelines for germline and somatic variant analysis from short- and long-read whole-genome sequencing (WGS). Ability to run and interpret variant calling results, including SNP/indel, microsatellite, and structural variant analysis, using the latest community standards.
- Conduct association analyses of large GWAS datasets using widely used software such as PLINK and Genome-wide Complex Trait Analysis (GCTA).
- Apply statistical approaches to interpret diverse genetic and genomic datasets and integrate findings with clinical and multi-omics data.
- Collaborate with a multidisciplinary team to develop and analyze reproducible, standardized workflows for single-cell and proteomics studies by integrating the latest research developments with strong programming skills.
- Review, QC, and integrate single-cell and proteomic datasets, performing downstream statistical analysis using phenotypic and clinical metadata.
- Demonstrate strong teamwork and communication skills, with the ability to effectively learn and apply new bioinformatics techniques and resources.
- Maintain and document bioinformatics software and scripts to ensure reproducibility and scalability.
- Participate in group meetings, present findings, and contribute to publications resulting from research projects.
BASIC QUALIFICATIONS To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below: - Possession of a bachelor's degree from an accredited college or university according to the Council for Higher Education Accreditation (CHEA) in bioinformatics, computer science, computational biology or related field. Foreign degrees must be evaluated for U.S. equivalency.
- In addition to educational requirements, a minimum of six (6) years of related analytical or bioinformatics pipeline development experience.
- The ability to construct practical computational pipelines for data parsing, quality control and analysis for large-scale genetic or genomics datasets.
- Strong programming skills in at least two of R, Python, C , with experience in RStudio and Jupyter Notebooks.
- Strong experience analyzing high-throughput sequencing data including whole-genome, bulk and single-cell RNA sequencing.
- Experience in standard genetic association analysis software like PLINK, SAIGE, regenie, GCTA etc.
- Demonstrable shell scripting skills (e.g., bash, awk, sed).
- Experience working in a Linux environment (especially a HPC environment or cloud).
- Ability to obtain and maintain a security clearance.
PREFERRED QUALIFICATIONS Candidates with these desired skills will be given preferential consideration: - Strong proficiency in programming (R, Python and Bash) and GitHub.
- Provide support for analysis of genomic data from epidemiological studies. This includes but is not limited to data manipulation, and integrated genomic. analyses. Prepare various reports and presentations detailing analyses of data.
- Proficiency with core statistical and bioinformatics methods (linear regression, logistic regression, eQTL analysis, LDscore regression, fine-mapping, credible set and colocalization analysis, etc.)
- Experience or familiarity with processing of single-cell data utilizing latest bioinformatics tools such as Cell Ranger, Seurat, Scanpy, Squidpy, Cell2location etc.
- Familiarity and working knowledge of tools to query and investigate cancer genomics with publicly available data sources (such as dbGaP, TCGA, ENCODE, 1000 Genomes, GTEX, gnomAD, cBioPortal, TCPA).
- Experience working in Linux-based environments and using HPC (high-performance computing) clusters.
- Strong experience with large-scale multi-omics data integration (e.g., genomics, genetics, transcriptomics, proteomics).
- Good understanding of algorithmic efficiency and working on high performance clusters for supporting large and diverse datasets.
- Experience with various environment/dependency management tools (e.g. pip, venv, conda, renv) and workflow management systems such as Snakemake or Nextflow.
- Knowledge of containerization with Docker/Singularity, JIRA and GitHub for project management.
- Understanding of software and workflow development best practices such as source control, test driven programming and continuous integration/deployment.
- Strong analytical and problem-solving skills with attention to detail.
- Strong communication skills, and the ability to work both independently and collaboratively as part of team.
Commitment to Non-DiscriminationAll qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, color, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law . click apply for full job details