Data Curation
Last updated
Was this helpful?
Last updated
Was this helpful?
Yes. All data in OncoKB is manually curated by members of the Scientific Content Management Team (SCMT) and data curators, both under the leadership of the OncoKB Lead Scientist. Manually curated data includes:
Gene assignment as an oncogene and/or tumor suppressor
Gene Background
Variant Oncogenic Effect
Variant Biological Effect
Variant Drug Sensitivity and Resistance (utilizing the )
For information about the primary data sources we use to identify and curate cancer variants and their biological and therapeutic implications, please refer to .
All variants in OncoKB are manually researched by a member of the scientific content management team (SCMT) to determine their oncogenic and biological effect, as well as drug sensitivity (if any). The only automated method of prediction that OncoKB utilizes relates to assigning the oncogenic effect of variants from . Each variant identified as a hotspot on this website is researched and reviewed by an SCMT member. Per , variants with supporting scientific literature are classified as “Oncogenic” and variants that are considered hotspots based purely on statistical recurrence per are considered “Likely Oncogenic”. The Cancer Hotspots website has a static list of variants based on the 2018 publication, and OncoKB’s curation of cancer hotspots is based on this list.
The OncoKB staff consists of the following:
The OncoKB Lead Scientist (Ph.D)
The Lead Scientist, Knowledge Systems (Ph.D)
The Scientific Content Management Team (SCMT), which includes two senior scientists (Ph.Ds) and three scientific writers/editors (ranging from Bachelor's-level to Ph.D-level scientists)
The Lead Software Engineer (MS)
Software Engineers
The Data and Software Liaison (MS)
Four primary data sources are used to identify and curate cancer variants and their biological and therapeutic implications:
Disease-specific treatment guidelines such as those provided by the National Cancer Compendium Network (NCCN) and proceedings of major scientific and/or clinical conferences such as the American Society of Clinical Oncology (ASCO) and the American Association of Cancer Research (AACR)
General scientific literature, accessed through PubMed
“delins” alterations are in-frame alterations that will map to either “insertions” or “deletions” based on the number of amino acid changes. For example, V600_K601delinsE would be interpreted as an inframe deletion, while R435_K436delinsKKR would be interpreted as an inframe insertion.
“Oncogenic Mutations” is used when there is tumor-specific information (e.g., a therapeutic implication) that applies to ALL functional (oncogenic/likely oncogenic) alterations of a gene. Please note that if a gene has “Amplification” curated as “Oncogenic” or “Likely Oncogenic”, this alteration will NOT be associated with the tumor-type specific information under “Oncogenic Mutations.”
FLT3: internal tandem duplication
EGFR: vIII
EGFR: Kinase domain duplication
EGFR: C-terminal domain
Mutations that are classified as “switch-of-function” have evidence-based data demonstrating that the alteration causes the protein to acquire a new function, such as the neomorphic ability of the IDH1 R132H-mutant protein to produce D-2-hydroxyglutarate.
A VUS is a variant that has been investigated by a member of the OncoKB team and for which no known data was identified. These variants will be represented with a hollow, dark grey circular icon and will include the sentence: “As of [date], there was no available functional data about the [variant] mutation”. Variants that have not been investigated by the OncoKB team will be represented by a hollow, light grey circular icon and will include the sentence: “The [variant] has not specifically been reviewed by the OncoKB team, and its oncogenic function is considered unknown.”
Most oncogenic RAS mutations do indeed cause the RAS proteins to lose their GTPase catalytic activity. However, we take into account that these mutations freeze HRAS, KRAS and NRAS in their constitutively active form, which is associated with increased downstream pathway activity and thus why they are classified in OncoKB as gain-of-function mutations.
No. While most associations in a specified cancer type that are OncoKB Levels 1, 2 or 3A will propagate as Level 3B in other cancer types, there are several exceptions to this rule. These include but are not limited to:
1. Therapeutic levels 1, 2 or 3A and associated drugs in solid cancer types do not propagate to hematologic cancers, or vice versa.
2. Levels for resistance (R1 and R2) and associated drugs do not propagate to other cancer types.
3. There are several therapeutic regimens in our system that are tissue-specific and would not be an appropriate recommendation in other cancer types (e.g. Selumetinib + Iodine I 131-6-Beta-Iodomethyl-19-Norcholesterol in NRAS mutant thyroid cancer).
4. Level 4 alterations do not propagate to other indications.
ClinVar interprets the pathogenicity of an alteration in the germline context, while OncoKB currently only contains information for variants in the somatic context. While a variant might not predispose someone to familial cancer when found in the germline (thus being called benign, inconclusive or VUS in ClinVar), it is possible that it may be oncogenic in the somatic context.
For more information about the OncoKB Staff, please refer to.
Public cancer variant databases of alterations identified in tumor sequencing studies, e.g.,
Statistically significant and recurrent variants identified based on 24,592 sequenced tumors using methods described in
For more information about the external databases we use as references for curation, please refer to .
Please refer to for a detailed protocol on the criteria we use to categorize a gene as an oncogene and/or tumor suppressor.
Please refer to for a detailed protocol on the criteria we use to define the oncogenic effect of a variant.
Please refer to for a detailed protocol on the criteria we use to define the biological effect of a variant.
The OncoKB contains a downloadable file (Cancer Gene List) that includes the Entrez Gene ID and RefSeq for all genes included in OncoKB. Additionally, at the top of every Gene Page, the Entrez Gene ID and RefSeq for that gene are displayed.
OncoKB uses standardized syntax for naming different mutation types, including missense mutations (mis), duplications (dup), deletions (del) etc. For more information about OncoKB mutation syntax, please refer to .
We use as our ontology. OncoTree provides mapping to NCI Thesaurus and . UMLS includes as its source. Please refer to .
Alterations that do not follow the pre-specified OncoKB nomenclature are curated and hard-coded in the system in order for annotation to be pulled properly. Please refer to . Examples of such alterations include:
Mutations that occur at hotspots as per are considered likely oncogenic based on the statistical significance of their recurrence in cancer. Therefore, functional characterization may not exist for a variant, making it a variant of unknown significance (VUS); however, if it occurs at a statistically significant hotspot, it will be annotated as likely oncogenic. That being said, functional characterization will always supersede this designation, so some hotspot mutations may be called likely neutral or inconclusive based on functional characterization demonstrating that they are not oncogenic in vitro or in vivo.