Data Curation
Is the data in OncoKB™ manually curated?
Yes. All data in OncoKB is manually curated by members of the Scientific Content Management Team (SCMT) and data curators, both under the leadership of the OncoKB Lead Scientist. Manually curated data includes:
Gene assignment as an oncogene and/or tumor suppressor
Gene Background
Variant Oncogenic Effect
Variant Biological Effect
Variant Drug Sensitivity and Resistance (utilizing the OncoKB Levels of Evidence)
For information about the primary data sources we use to identify and curate cancer variants and their biological and therapeutic implications, please refer to Section I.C of the OncoKB Curation Standard Operating Procedure v4.
Does OncoKB™ use any automated methods to predict variant effect or drug sensitivity?
All variants in OncoKB are manually researched by a member of the scientific content management team (SCMT) to determine their oncogenic and biological effect, as well as drug sensitivity (if any). The only automated method of prediction that OncoKB utilizes relates to assigning the oncogenic effect of variants from cancerhotspots.org. Each variant identified as a hotspot on this website is researched and reviewed by an SCMT member. Per Chapter 1: Sub-Protocol 2.5: Assertion of the oncogenic effect of a VPS of the OncoKB Standard Operating Procedure v4, variants with supporting scientific literature are classified as “Oncogenic” and variants that are considered hotspots based purely on statistical recurrence per Chang et al., 2018 are considered “Likely Oncogenic”. The Cancer Hotspots website has a static list of variants based on the 2018 publication, and OncoKB’s curation of cancer hotspots is based on this list.
Who curates OncoKB™ data and what is their educational background?
The OncoKB staff consists of the following:
The OncoKB Lead Scientist (Ph.D)
The Lead Scientist, Knowledge Systems (Ph.D)
The Scientific Content Management Team (SCMT), which includes two senior scientists (Ph.Ds) and three scientific writers/editors (ranging from Bachelor's-level to Ph.D-level scientists)
The Lead Software Engineer (MS)
Software Engineers
The Data and Software Liaison (MS)
For more information about the OncoKB Staff, please refer to Section I.B of the OncoKB Curation Standard Operating Procedure v4.
What is the source of the variant annotations (e.g. in-house database, automatic aggregation of public domain databases; which databases are included?)
Four primary data sources are used to identify and curate cancer variants and their biological and therapeutic implications:
Public cancer variant databases of alterations identified in tumor sequencing studies, e.g., cBioPortal
Statistically significant and recurrent variants identified based on 24,592 sequenced tumors using methods described in Chang et al., 2018
Disease-specific treatment guidelines such as those provided by the National Cancer Compendium Network (NCCN) and proceedings of major scientific and/or clinical conferences such as the American Society of Clinical Oncology (ASCO) and the American Association of Cancer Research (AACR)
General scientific literature, accessed through PubMed
For more information about the external databases we use as references for curation, please refer to Section I.C of the OncoKB Curation Standard Operating Procedure v4.
What are the criteria for defining a gene as an oncogene or tumor suppressor?
Please refer to Chapter 1: Table 1.3: Assertion of the function of a cancer gene of the OncoKB Standard Operating Procedure v4 for a detailed protocol on the criteria we use to categorize a gene as an oncogene and/or tumor suppressor.
What are the criteria for defining the oncogenic effect (oncogenic, likely oncogenic, likely neutral, inconclusive) of a variant?
Please refer to Chapter 1: Sub-Protocol 2.5: Assertion of the oncogenic effect of a VPS of the OncoKB Standard Operating Procedure v4 for a detailed protocol on the criteria we use to define the oncogenic effect of a variant.
What are the criteria for defining the biological effect (gain/loss/switch of function, likely gain/loss/switch of function, neutral, inconclusive) of a variant?
Please refer to Chapter 1: Sub-Protocol 2.4: Assertion of the biological effect of a VPS of the OncoKB Standard Operating Procedure v4 for a detailed protocol on the criteria we use to define the biological effect of a variant.
Where can I find the Gene ID / RefSeq for all genes in OncoKB™?
The OncoKB Cancer Genes Page contains a downloadable file (Cancer Gene List) that includes the Entrez Gene ID and RefSeq for all genes included in OncoKB. Additionally, at the top of every Gene Page, the Entrez Gene ID and RefSeq for that gene are displayed.
What are the rules for mutation syntax in OncoKB™?
OncoKB uses standardized syntax for naming different mutation types, including missense mutations (mis), duplications (dup), deletions (del) etc. For more information about OncoKB mutation syntax, please refer to Chapter 6: Table 3.1: OncoKB alteration nomenclature, style and formatting of the OncoKB Standard Operating Procedure v4.
What cancer type ontology is used in OncoKB™?
We use OncoTree as our ontology. OncoTree provides mapping to NCI Thesaurus and UMLS. UMLS includes SMOMED CT as its source. Please refer to Chapter 1: Protocol 3: Tumor type assignment of the OncoKB Standard Operating Procedure v4.
Do “delins” alterations map to deletions or insertions?
“delins” alterations are in-frame alterations that will map to either “insertions” or “deletions” based on the number of amino acid changes. For example, V600_K601delinsE would be interpreted as an inframe deletion, while R435_K436delinsKKR would be interpreted as an inframe insertion.
Some genes have an alteration called “Oncogenic Mutations.” What does this mean?
“Oncogenic Mutations” is used when there is tumor-specific information (e.g., a therapeutic implication) that applies to ALL functional (oncogenic/likely oncogenic) alterations of a gene. Please note that if a gene has “Amplification” curated as “Oncogenic” or “Likely Oncogenic”, this alteration will NOT be associated with the tumor-type specific information under “Oncogenic Mutations.”
How does OncoKB™ handle atypical variants such as EGFR vIII?
Alterations that do not follow the pre-specified OncoKB nomenclature are curated and hard-coded in the system in order for annotation to be pulled properly. Please refer to Chapter 6: Table 3.1: OncoKB alteration nomenclature, style and formatting of the OncoKB Standard Operating Procedure v4. Examples of such alterations include:
FLT3: internal tandem duplication
EGFR: vIII
EGFR: Kinase domain duplication
EGFR: C-terminal domain
What does “Switch-of-function” mean?
Mutations that are classified as “switch-of-function” have evidence-based data demonstrating that the alteration causes the protein to acquire a new function, such as the neomorphic ability of the IDH1 R132H-mutant protein to produce D-2-hydroxyglutarate.
How can I distinguish between a VUS and a variant that was not reviewed by the OncoKB™ team?
A VUS is a variant that has been investigated by a member of the OncoKB team and for which no known data was identified. These variants will be represented with a hollow, dark grey circular icon and will include the sentence: “As of [date], there was no available functional data about the [variant] mutation”. Variants that have not been investigated by the OncoKB team will be represented by a hollow, light grey circular icon and will include the sentence: “The [variant] has not specifically been reviewed by the OncoKB team, and its oncogenic function is considered unknown.”
Mutations in the RAS genes (HRAS, KRAS, NRAS) are curated as “gain-of-function” when other knowledgebases classify them as “loss-of-function”. Why?
Most oncogenic RAS mutations do indeed cause the RAS proteins to lose their GTPase catalytic activity. However, we take into account that these mutations freeze HRAS, KRAS and NRAS in their constitutively active form, which is associated with increased downstream pathway activity and thus why they are classified in OncoKB as gain-of-function mutations.
Do all therapeutics propagate as Level 3B in other indications?
No. While most associations in a specified cancer type that are OncoKB Levels 1, 2 or 3A will propagate as Level 3B in other cancer types, there are several exceptions to this rule. These include but are not limited to:
1. Therapeutic levels 1, 2 or 3A and associated drugs in solid cancer types do not propagate to hematologic cancers, or vice versa.
2. Levels for resistance (R1 and R2) and associated drugs do not propagate to other cancer types.
3. There are several therapeutic regimens in our system that are tissue-specific and would not be an appropriate recommendation in other cancer types (e.g. Selumetinib + Iodine I 131-6-Beta-Iodomethyl-19-Norcholesterol in NRAS mutant thyroid cancer).
4. Level 4 alterations do not propagate to other indications.
ClinVar calls a specific variant benign, but you call it likely oncogenic. Why?
ClinVar interprets the pathogenicity of an alteration in the germline context, while OncoKB currently only contains information for variants in the somatic context. While a variant might not predispose someone to familial cancer when found in the germline (thus being called benign, inconclusive or VUS in ClinVar), it is possible that it may be oncogenic in the somatic context.
Why is this specific hotspot a VUS but called likely oncogenic?
Mutations that occur at hotspots as per Chang et al. are considered likely oncogenic based on the statistical significance of their recurrence in cancer. Therefore, functional characterization may not exist for a variant, making it a variant of unknown significance (VUS); however, if it occurs at a statistically significant hotspot, it will be annotated as likely oncogenic. That being said, functional characterization will always supersede this designation, so some hotspot mutations may be called likely neutral or inconclusive based on functional characterization demonstrating that they are not oncogenic in vitro or in vivo.
Last updated