Databases that we use in pipelines

Massive Bioinformatics


Clinical Genomic Database (CGD)

Clinical Genomic Database (CGD), is a manually curated database of conditions with known genetic causes, focusing on medically necessary genetic data with available interventions. All conditions with identified genetic causes are included in the CGD. For each entry, the database includes the gene symbol, condition(s), allelic conditions, inheritance, age (pediatric or adult) in which interventions are indicated, clinical categorization, and a general description of interventions/rationale.

We use this database to obtain information about the inheritance pattern of variants such as x-linked or autosomal dominant. This information is used in the pathogenicity classification of the variants.

The Clinical Genome Resource (ClinGen)

ClinGen is a National Institutes of Health (NIH)-a funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. ClinGen aims to create an authoritative primary resource that determines the clinical relevance of genes and variants for use in precision medicine and research.

In Massive Analyser, we use ClinGen to assign some ACMG criteria like PS3, BS3. These criteria check the functional experimental studies in the literature. Therefore, we obtain more reliable results by specifying this ACMG criteria with the help of ClinGen database.

You can access the paper related with ClinGen here.

Catalogue Of Somatic Mutations In Cancer (COSMIC)

COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. COSMIC is divided into several specific projects such as CMC, Drug resistance, etc. We get gene fusions, drug resistance, and relevant tissue information from COSMIC.


ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation, observed health status, and that interpretation’s history. We use the ClinVar database in many parts of our pipelines like pathogenicity classification and variant annotation because ClinVar is one of the most popular variant annotation databases in this field.


dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. dbNFSP includes different information about variant annotation such as in silico pathogenicity tool, population frequencies, and gene splicing sites. So dbNFSP covers the central part of variant annotation of our pipelines.


The PharmGKB is a pharmacogenomics knowledge resource that encompasses clinical guidelines and drug labels, potentially clinically actionable gene-drug associations, and genotype-phenotype relationships. PharmGKB collects, curates, and disseminates knowledge about the impact of human genetic variation on drug responses.


The Monarch Initiative is an integrative data and analytic platform connecting phenotypes to genotypes across species, bridging basic and applied research with semantics-based analysis. Also, it leads the development of the Human Phenotype Ontology, which is used across the world for genomic diagnostics in genetic disease and other areas. In the database, the phenotype annotation is split into two-part. One of which is the gene-phenotype relationships, and the other is variant-phenotype relationships. Thus in our pipelines, while the gene-phenotype relationship database is used for the germline exome pipeline, the variant-phenotype relationship database is used for the somatic exome pipeline.


The UCSC Genome Browser is an on-line, and downloadable, genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. UCSC also has its own databases. We use simpleRepeat data from its database to detect repetitive regions in aligned DNA sequence to assign related ACMG criteria.