- Method
- Open access
- Published:
Curating genomic disease-gene relationships with Gene2Phenotype (G2P)
Genome Medicine volume 16, Article number: 127 (2024)
Abstract
Genetically determined disorders are highly heterogenous in clinical presentation and underlying molecular mechanism. The evidence underpinning these conditions in the peer-reviewed literature requires robust critical evaluation for diagnostic use. Here, we present a structured curation process for Gene2Phenotype (G2P). This draws on multiple lines of clinical, bioinformatic and functional evidence. The process utilises and extends existing terminologies, allows for precise definition of the molecular basis of disease, and confidence levels to be attributed to a given gene-disease assertion. In-depth disease curation using this process will prove useful in applications including in diagnostics, research and development of targeted therapeutics. G2P: www.ebi.ac.uk/gene2phenotype.
Background
Genomic sequencing is widely used in the diagnosis of rare genetically determined disorders. It is relatively straightforward and cost-effective to generate large volumes of data from genome or exome sequencing. However, sampling of a large genomic footprint inevitably results in a number of candidate disease-associated variants passing initial filtering steps. It is therefore important to develop automated strategies to reduce false negative and false positive results, in order to optimise use of clinician and scientist time, and focus attention on variants with the highest likelihood of being clinically significant. The Gene2Phenotype (G2P) project [1] was developed to enable high-throughput filtering of variant calls and prioritisation of likely clinically relevant variants [2]. G2P has been successfully used in a number of diagnostic clinical and research applications, for example, the DDD (Deciphering Developmental Disorders) Study [3], assessment of disorders of the eye [4] and inherited cardiac disorders [5].
G2P defines monogenic gene-disease associations through Locus-Genotype-Mechanism-Disease-Evidence (LGMDE) threads [2]. This allows for precise definition of the clinical phenotype and molecular basis of a given condition. G2P was developed in 2012, primarily as a database of all known loci associated with Developmental Disorders (DDG2P). DDG2P has more than tripled in size over the last decade, now covering over 2500 loci [6]. The system was designed to be generalizable across disease domains and has now expanded to include cancer, cardiac, eye, skeletal and skin disorders (Fig. 1). Each panel is freely downloadable [1], comprehensive and actively curated by expert curators. G2P entries can be present in multiple panels for convenience.
The number of gene-disease association LGMDE threads in G2P, plotted by confidence level and panel [1]. Developmental disorders panel: 1477 definitive entries, 727 strong entries, 158 moderate entries, 375 limited entries. Downloaded 14th October 2024
Each G2P entry is curated by clinical and scientific experts through a detailed assessment of publications from the peer-reviewed literature. For newly described gene-disease associations, manuscripts are identified through a monthly manual search of relevant journals. For DDG2P, these include the American Journal of Human Genetics, Nature Genetics, Journal of Medical Genetics, Genetics in Medicine, Human Mutation, European Journal of Human Genetics, Clinical Genetics and the American Journal of Medical Genetics Part A. All abstracts per issue are reviewed. Case reports/case series containing detailed human phenotypic data are prioritised. This is an important process given the significant variation in evidence available for different gene-disease assertions. A curated confidence level for this assertion is assigned, to enable prioritisation of clinically relevant diagnostic variants.
Here, we present a newly defined G2P curation process with associated data structures. We aim to update the methods used to create and update gene-disease associations as defined within the database. This is to include more granular mechanistic data, as well as standardise the approach to curation.
Methods
The previous curation system for G2P relied on experienced clinical geneticists and scientists making expert judgements to form LGMDE threads. This was based on careful review of the primary literature and manual population of the data fields contained within G2P [2]. The process was not standardised. An expansion of the DDG2P curation team prompted a comprehensive review of this system. A standardised template was introduced to ensure curators followed the same process. We also sought to optimise interoperability with other international resources, for example by use of terminology compatible with the Gene Curation Coalition (GenCC) standards [7]. Finally, the depth of granular molecular and mechanistic data recorded per disease was increased, to allow more precise disease definition. This reflects a greater understanding of the molecular basis of genetic disorders since G2P was developed and is aimed at improving the utility of this resource.
We also present a formalised curation template (Additional file 1) with accompanying SOP (standard operating procedure) (Additional file 2) as a resource for the genomics community.
Input was then sought at a 1-day workshop where a wide range of expertise was represented, including academic experts in clinical genetics and genomics, biomedical informatics, computational protein biology, and genome annotation. Attendees, in addition to the authors, are listed in the acknowledgements section below. The workshop consisted of round table discussions on topics including standardising terminology, defining molecular disease mechanism and disease naming. Edge cases and curation examples were included, for example how to address pleiotropy, multiple allelic requirements per disorder and variants in regulatory regions. An online version of the curation template was updated live during the workshop, and attendees also added comments afterwards. All comments were added to the template for active curation where possible. The remainder were aimed at future-proofing the curation process, for example, by utilising automated data pulls from online resources such as PubMed and MaveDB [8, 9].
This curation process results in a focused, in-depth assessment of the evidence underlying the association of a disease with a particular genomic locus. Phenotypic and molecular data is analysed in a consistent, sequential manner following the curation template. This populates the LGMDE threads as well as allowing for a confidence assertion to be made.
Results
LGMDE threads
The LGMDE thread is described here in detail, including definitions, terminology, data sources, and utilisation in the curation process.
Locus
The locus is typically a gene or can be a genomic interval (defined as chromosome:genomic coordinates for a given reference genome assembly). For genes, the HUGO Gene Nomenclature Committee (HGNC) [10] symbol is used for ease of reference. The symbol is mapped to the relevant stable numerical HGNC ID.
Genotype
Allelic requirement
Standardised Human Phenotype Ontology (HPO) [11] allelic requirement terms are used. These have corresponding Mendelian inheritance terms, for example, monoallelic_autosomal—Autosomal dominant—HP:0000006. In general, genes which have been associated with multiple allelic requirements for a given disease require separate G2P entries. This would apply, for example, to monoallelic_autosomal and biallelic_autosomal disorders. X-linked conditions which are usually penetrant in males and recessive in females may be recorded as monoallelic_X_hemizygous. X-linked disease where heterozygous females and hemizygous males have similar phenotypes, for example in relation to SHOX and SMC1A variants, are recorded as monoallelic_X_heterozygous. However, it is recognized that this distinction may be difficult in practice, and separate entries can be used after discussion by the curation group.
Cross-cutting modifier
Optional cross-cutting modifiers give extra information for a gene-disease relationship. The terms used largely correspond to children of “Inheritance qualifier” (HP:0034335), for example, “Typically de novo” (HP:0025352). A subset of these HPO terms is included in the curation template. These have been chosen to focus curation on collection of data most relevant to diagnostic filtering.
Two additional cross-cutting modifiers not representing inheritance information are defined. The first is for alerting the user to potential secondary findings (including American College of Medical Genetics Secondary Findings [12] and/or late onset conditions). The second modifier is for a “Restricted Variant Set”. This may include, for example, where diseases are associated with a single recurrent variant, or variants only found in a particular protein domain. Further information regarding the specific variant set is recorded elsewhere in the curation data.
Types of variants reported
A comprehensive list of Sequence Ontology (SO) [13, 14] terms is used to record the types of variants reported in association with a disease entity. For example, “frameshift_variant” (SO:0001589) (13). Additional information is recorded for each variant type, including if it is reported as de novo and/or inherited and/or of unknown inheritance, whether it is predicted to escape or trigger nonsense-mediated decay (NMD) and gene domain/genomic region. This is important for filtering genomic data, as well as in defining the mechanism of disease.
Protein view
A snapshot of the DECIPHER [15] protein view for the relevant gene is included (Fig. 2). DECIPHER is a web platform developed to enable the annotation and sharing of anonymised phenotype-linked variants. It integrates essential genomics resources and provides visualisations and interactive tools to facilitate variant interpretation.
DECIPHER protein view for the gene REST [15]. Red (likely loss of function) filled variants in the ClinVar variants track, on the left side of the image, are annotated as pathogenic (squares)/likely pathogenic (triangles). These are associated with predisposition to Wilms tumour [16]. The Predicted NMD Escape track is not filled in this region, indicating these variants are likely to result in NMD. Red/orange areas show areas of Regional Missense Constraint in the corresponding track. Missense variants in the DNA binding domain—shown by the green oval—in this region have also been reported in association with Wilms tumour [16]. The filled red triangles/squares on the right of the image in the ClinVar track are associated with gingival fibromatosis [17]. These are in the final exon and hence predicted not to result in NMD as shown in the Predicted NMD Escape track. This is therefore likely to be a different disease mechanism than for Wilms tumour. There are relatively few gnomAD loss of function variants and none of these correspond to the likely pathogenic/pathogenic ClinVar variants. The DECIPHER protein view allows for all these sources of information to be visualised at once. Further data is available through interaction with the web-based interface at www.deciphergenomics.org, which is regularly updated. Data is pulled from multiple bioinformatic resources. NMD, Nonsense Mediated Decay
The DECIPHER protein view for a given gene is an information-dense graphical representation including, for example, exon boundaries, predicted NMD escape regions, conservation levels, missense constraint and protein domains. Annotation includes gnomAD [18] variants from global populations, as well as DECIPHER and ClinVar [19] variants from diagnostic testing (including reported pathogenicity).
The use of the DECIPHER protein view allows curators to easily put individual-level gene-disease information from publications into a wider context, encompassing molecular/gene-level data, disease-associated/diagnostic variation and population-level normal variation (Fig. 2).
The function of the gene as defined by UniProt [20] is also shown in DECIPHER [15, 20]. This is used in the curation process to determine if the proposed gene-disease relationship is likely to be biologically plausible.
DECIPHER [15] is a constantly updated resource. The live website interface is often used in the curation process to interrogate the above data further. However, a snapshot of the protein view used at the time of curation is recorded for reference, to enable comparison with the most recent data at the time of re-curation.
Variant consequence per allele
For a given allelic requirement, SO terms for the variant consequence are used [13, 14]. This may be recorded as inferred or evidence based. Inferred includes, for example, when a variant is predicted to result in nonsense-mediated decay. Evidence based is usually used for biological functional studies such as demonstration of absent protein expression. Computational evidence may also be taken into account, such as modelling of protein structure. Terms for both altered protein (for protein-coding genes) or altered RNA level (for non-protein coding genes) are included. For example, a missense variant may be recorded as altered_gene_product structure (SO:0002318).
As SO terms are part of an ontology, higher level terms may be inferred for a recorded variant consequence. For example, absent_gene_product (SO:0002317) is a child of decreased_gene_product_level (SO:0002316), which itself is a child of altered_gene_product_level (SO:0002314).
Mechanism
Enabling a precise definition of the mechanism of disease to be captured is a crucial function of G2P. This follows the definitions laid out by Backwell and Marsh [21]. The mechanism is initially recorded in broad categories depending on the protein level consequence of variants reported. These include ‘Loss of function’, ‘Dominant negative’ and ‘Gain of function’. There are also categories for ‘Undetermined’ and ‘Undetermined non-loss-of-function’, reflecting the fact that there are often cases where a mechanism is clearly not a loss of function, but where is difficult to distinguish between dominant-negative and gain-of-function effects from available evidence. It is recorded whether these are inferred or from functional evidence. The level of functional evidence is not currently directly quantified e.g. by number of publications. The ‘evidence’ assertion is made by consensus agreement within the curation group, taking into account the quality and quantity of functional data. For example, are the analyses reported biologically relevant to the gene, disease and putative mechanism? Are the results replicated by separate groups?
Synopsis of mechanism
As per Backwell and Marsh [21], the complex effects of disease-associated variants are not fully captured using broad terms such as ‘loss of function’ alone. Therefore, G2P records a more detailed synopsis of the inferred or evidence-based molecular mechanism. For example, ‘Destabilising loss of function’ or ‘Interaction-disrupting loss of function’. There is often insufficient evidence for newly described disorders to record this information. This may be completed when functional studies become available for a particular condition/variant(s).
The process of determining the likely mechanism for a given disorder is complex. Multiple lines of evidence are reviewed, where available. These may include, for example, observations of variant clustering and in vitro/in vivo functional assays. Tools predicting likely mechanism may be used for guiding assessments, although these should not be relied upon alone [22]. MaveDB is interrogated for relevant multiplexed assays of variant effect (MAVEs) [9]. Free text fields are currently used to record analysis of this evidence during the curation process. For example, it is critical to evaluate if a MAVE assay is relevant to the disorder being curated, especially whether the multiplexed assay truly reflects the mechanism of disease in vivo. Hence, we have prioritised curation of mechanism in G2P. Curation discussion may include information such as the functional domains assayed, how well it replicates the mechanism of disease for the stated gene-disease pair, and which tissue/cell-line is relevant.
Disease entity
Clinical phenotype
The reported clinical phenotype is recorded per publication. This includes the number of families/individuals reported, including information on consanguinity and/or ethnicity if relevant. G2P records phenotype data in the form of HPO terms [11], which are standardised and machine readable. However, it is also useful for curation purposes to record descriptive free text regarding the phenotype. For example, the proportion of individuals reported with a given phenotypic feature (i.e. variable expressivity), whether the phenotype is clinically distinctive and/or consistent, and if there is evidence for incomplete penetrance. The clinical phenotype is of crucial importance in determining the confidence level for a given gene-disease association. For example, whilst there is a phenotypic spectrum for any genetic condition, it is usually the case that core features would be shared across virtually all individuals for a newly described disorder. Reported incomplete penetrance is reviewed critically; there may be an alternate genetic or other explanation for the described phenotype. On the other hand, if all reported individuals share a rare phenotypic feature, this puts more weight towards a true gene-disease association. At present in G2P, this is a subjective judgement relying on clinical experience, given the difficulties in quantifying and comparing phenotypes across persons and diseases using standardised terminology [23].
Disease name
Genetic disease naming is a complex topic, in part reflecting the evolution in knowledge from clinical descriptions to molecular diagnosis. Conditions which have been well-defined clinically in the past may be known by an eponymous name such as Noonan syndrome [24]. However, this naming system does not reflect the molecular basis of disease. This is especially true for conditions such as Noonan, where a similar phenotype is now known to result from variants in multiple genes. The dyadic naming system suggested by Biesecker et al. [25] aims to address these issues by including a gene symbol with a phenotypic descriptor—for example, PTPN11-related Noonan syndrome. Ideally, a precise clinically relevant phenotypic descriptor is used. For example, AMOTL1-related orofacial clefting, cardiac anomalies, and tall stature.
G2P records disease names following this dyadic approach. However, the process of naming a condition is not straightforward. We recognise that an international collaborative approach is needed to address this topic, as other curation resources may define diseases differently, or use an alternate naming convention [7]. If a disorder has been named—in a compatible format—by another group, G2P aims to use this name, to enhance standardisation across resources. In some cases, the gene symbol may be added to an existing disease name to maintain the dyadic structure. Mapping to other resources is added, where available, for example to OMIM (Online Mendelian Inheritance in Man) Morbid IDs and Mondo IDs [26, 27]. Disease synonyms from these and other sources are also included.
G2P is updated at least monthly with information from the latest research publications. Many newly defined conditions do not have a recognised disease name. In this case, the curation group agrees on a dyadic name reflecting the most pertinent phenotypic features and plans to submit these for Mondo accessioning to enable reuse.
Agreed confidence category
A confidence attribute is assigned to each G2P entry to indicate the likelihood that the gene-disease association is true. G2P now uses the gene-disease validity terms developed by GenCC [7]. Gene-disease associations curated as ‘Definitive’, ‘Confirmed’ and ‘Moderate’ are used by several groups in clinical reporting, for example in the DDD study [3]. Assertions with the confidence term ‘Limited’ are excluded from clinical reporting; however, variants found in this group may be useful in the research setting and they may be promoted to a higher confidence level as further evidence becomes available. ‘Disputed’ and ‘Refuted’ are also used in G2P to indicate previously reported gene-disease links that should now be excluded from research or clinical use.
Panel
G2P is grouped into broad panels, which each focus on a disease grouping or defined category of clinical presentation of relevance to the clinical diagnosis of Mendelian disease. The curation structure outlined here is presently used by the DDG2P curation group, although it is anticipated other G2P panels will adopt it in future.
Evidence
Links to the original peer-reviewed publications analysed during the curation process are recorded, generally in the form of PubMed ID [8] and title for ease of reference. Manuscripts from non-peer-reviewed sources such as MedRxiv are generally not included, except in exceptional circumstances.
Implementation
The template and process described here is used in regular DDG2P curation meetings. Disorders are assessed by individual curators and the template populated. This is then brought to the curation meeting for discussion by the group, which always includes at least one clinical geneticist. Input from outside experts on particular conditions may also be sought, for example where the molecular mechanism is particularly complex.
Discussion
The structured curation process described here uses LGMDE threads to define genetically determined disorders from the molecular level through to clinical phenotype.
This creates a focus on disease mechanism and differentiates G2P from other initiatives recording gene-disease associations. For example, PanelApp [28] records an allelic requirement and phenotype terms in association with a gene to define a disease. The granularity of data in the G2P curation template should allow for more accurate interpretation of genomic results, especially where the disease mechanism is complex or multiple disorders are associated with a given gene. Coupling disease-mechanism and inheritance-based variant filtering can significantly reduce the noise associated with large, highly variable genes, by focussing interpretation time only on specific variant types that are consistent with the disease mechanism.
G2P covers a wide spectrum of genetic disease through a limited number of large domain-specific panels. Other sources, such as PanelApp [28], may define many more panels in association with more specific phenotypes/diseases. These often contain small numbers of genes, to decrease the number of variants of uncertain clinical significance returned. This requires curation regarding which panels are applied to which patients, as well as curation of the individual gene-disease associations. Furthermore, application of small panels is likely to have a higher specificity (fewer false positives) but lower sensitivity (fewer true positives) than larger panels. In contrast, G2P allows for scalable high-throughput filtering of large numbers of variants across more genes through precise disease definition. This is particularly relevant for disease domains where there is high locus and allelic heterogeneity, such as developmental disorders.
The curation process outlined here demonstrates a transparent, standardised and comprehensive method for collating and critically assessing data relevant to each G2P entry. We also include a more detailed analysis of disease mechanism, particularly building on the work of Backwell and Marsh [21]. Moving from broad categorisations, such as loss of function, towards precise definitions of the molecular effects of disease-associated variants is increasingly important in the context of targeted therapeutics. These include gene therapy, targeting of antisense oligonucleotides and identification of other drug targets.
However, despite its importance, there is often insufficient evidence on first curation of a newly described disorder to precisely determine the disease mechanism. Relevant data fields need to be retrospectively completed when further functional studies become available. This emphasises the importance of regular review of disease entries, and re-curation with new evidence. The curation template used here, and the G2P database structure, allow for iterative re-curation in this manner.
The updated curation template described here is in active use and the online G2P database is now undergoing further development to reflect these changes. Future planned updates include the provision of data downloads both of gene records and full panel summaries, implementation of a web-based curation tool including standardised functional data, and incorporation of automated manuscript search and annotation.
Conclusions
In summary, we present a structured process for the curation of gene-disease assertions, with precise definitions of the underlying molecular mechanism. This is compatible with international initiatives aimed at harmonising gene curation [7]. The process defines rigorous, evidence-based gene-disease associations, which will prove useful in a range of applications, including clinical diagnostics and targeted therapeutics.
Data availability
The G2P data curation template is available as a supplementary file (Additional File 1) and on the G2P FTP site (http://ftp.ebi.ac.uk/pub/databases/gene2phenotype/curation_documentation). The datasets generated through the process described in the current study are available in the G2P repository, www.ebi.ac.uk/gene2phenotype/downloads using the original LGMDE threads. A new database architecture incorporating all the information contained in the curation template is in progress and due to go live towards the end of 2024, using the same URL www.ebi.ac.uk/gene2phenotype.
DECIPHER is found at https://deciphergenomics.org.
Abbreviations
- G2P:
-
Gene2Phenotype
- DDD:
-
Deciphering Developmental Disorders
- LGMDE:
-
Locus-Genotype-Mechanism-Disease-Evidence
- DDG2P:
-
Developmental Disorders Gene2Phenotype
- SOP:
-
Standard operating procedure
- GenCC:
-
Gene Curation Coalition
- HGNC:
-
HUGO Gene Nomenclature Committee
- HPO:
-
Human Phenotype Ontology
- SO:
-
Sequence Ontology
- NMD:
-
Nonsense Mediated Decay
- OMIM:
-
Online Mendelian Inheritance in Man
- MAVE:
-
Multiplex Assays of Variant Effects
References
Gene2Phenotype. Available from: https://www.ebi.ac.uk/gene2phenotype. Cited 2024 Jan 24.
Thormann A, Halachev M, McLaren W, Moore DJ, Svinti V, Campbell A, et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat Commun. 2019;10(1):2373.
Wright CF, Fitzgerald TW, Jones WD, Clayton S, McRae JF, van Kogelenberg M, et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet Lond Engl. 2015;385(9975):1305–14.
Lenassi E, Carvalho A, Thormann A, Abrahams L, Arno G, Fletcher T, et al. EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders. J Med Genet. 2023;60(8):810–8.
Josephs KS, Roberts AM, Theotokis P, Walsh R, Ostrowski PJ, Edwards M, et al. Beyond gene-disease validity: capturing structured data on inheritance, allelic requirement, disease-relevant variant classes, and disease mechanism for inherited cardiac conditions. Genome Med. 2023;15(1):86.
Wright CF, Campbell P, Eberhardt RY, Aitken S, Perrett D, Brent S, et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N Engl J Med. 2023;388(17):1559–71.
DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, et al. The gene curation coalition: a global effort to harmonize gene–disease evidence resources. Genet Med. 2022;24(8):1732–42.
Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50(D1):D20–D26.
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 2019;20(1):223.
Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023;51(D1):D1003–9.
Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 2021;49(D1):D1207–17.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med Off J Am Coll Med Genet. 2015;17(5):405–24.
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, et al. The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6(5):R44.
Roberts AM, DiStefano MT, Riggs ER, Josephs KS, Alkuraya FS, Amberger J, et al. Towards robust clinical genome interpretation: developing a consistent terminology to characterize Mendelian disease-gene relationships - allelic requirement, inheritance modes and disease mechanisms. Genet Med. 2024;26(2):101029.
Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, et al. DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet. 2009;84(4):524–33.
Mahamdallie SS, Hanks S, Karlin KL, Zachariou A, Perdeaux ER, Ruark E, et al. Mutations in the transcriptional repressor REST predispose to Wilms tumor. Nat Genet. 2015;47(12):1471–4.
Bayram Y, White JJ, Elcioglu N, Cho MT, Zadeh N, Gedikbasi A, et al. REST final-exon-truncating mutations cause hereditary gingival fibromatosis. Am J Hum Genet. 2017;101(1):149–56.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43.
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7.
Bateman A, Martin MJ, Orchard S, Magrane M, Ahmad S, The UniProt Consortium, et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–31.
Backwell L, Marsh JA. Diverse molecular mechanisms underlying pathogenic protein mutations: beyond the loss-of-function paradigm. Annu Rev Genomics Hum Genet. 2022;23(1):475–98.
Badonyi M, Marsh JA. Proteome-scale prediction of molecular mechanisms underlying dominant genetic diseases. Genetics; 2023. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.09.08.556798. Cited 2024 Jan 2.
Jacobsen JOB, Kelly C, Cipriani V, Robinson PN, Smedley D. Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases. Brief Bioinform. 2022;23(5):bbac188.
Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, et al. Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet. 2001;29(4):465–8.
Biesecker LG, Adam MP, Alkuraya FS, Amemiya AR, Bamshad MJ, Beck AE, et al. A dyadic approach to the delineation of diagnostic entities in clinical genomics. Am J Hum Genet. 2021;108(1):8–15.
Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype–gene relationships. Nucleic Acids Res. 2019;47(D1):D1038–43.
Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, et al. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2020;48(D1):D704–15.
Stark Z, Foulger RE, Williams E, Thompson BA, Patel C, Lunke S, et al. Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution. Am J Hum Genet. 2021;108(9):1551–7.
Acknowledgements
We thank our collaborators who attended of the PARADIGM G2P Gene Curation Workshop and reviewed the curation template: James Ware, Professor of Cardiovascular and Genomic Medicine, Imperial College London, Ian Simpson, Professor of Biomedical Informatics & Director UKRI CDT in Biomedical Artificial Intelligence, University of Edinburgh, Andy Yates, Team Leader, Genomics Technology Infrastructure, EMBL-EBI, Sarah Wynn, Chief Executive Officer, Unique, Rare Chromosome Disorder Support Group, London, Adam Frankish, Manual Genome Annotation Coordinator, EMBL-EBI, Tariro Chatiza, PhD student, University of Edinburgh, Richard Chin, Professor of Paediatric Neurology and Clinical Epidemiology, Clinical Director of the Muir Maxwell Epilepsy Centre, University of Edinburgh, Diana Lemos, Senior Bioinformatics Developer, EMBL-EBI, Ola Austine, Bioinformatics Software Developer, EMBL-EBI, Nicky Whiffin, Associate Professor and Group Leader at the Big Data Institute and the Wellcome Centre for Human Genetics, University of Oxford, Alison Meynert, Senior Research Fellow, IGC Bioinformatics Analysis Core Manager, University of Edinburgh, Angharad Roberts, Clinical Senior Lecturer in Cardiovascular Genetics, Imperial College London, Jane Loveland, Annotation Project Leader, EMBL-EBI, Katharine Fitzpatrick, PARADIGM Project Manager, University of Exeter, Kat Josephs, Specialist registrar in Clinical Genetics and a Clinical Research Fellow in the Cardiovascular Genetics and Genomics group, Imperial College London, Massimo Mangino, Imperial College London, Julia Foreman, DECIPHER Project Leader, EMBL-EBI, We thank the clinical genomics community for ongoing curation suggestions. We thank Professor David FitzPatrick for initially setting up and further developing G2P.
Funding
Open Access funding enabled and organized by Projekt DEAL. This research was funded in whole or in part by the Wellcome Trust [226083/Z/22/Z] for PARADIGM (Primary Annotated Resources to Aid Diagnosis In Genomic Medicine). SEH also received funding from the European Molecular Biology Laboratory. HVF was also supported by the NIHR Cambridge Biomedical Research Centre (NIHR203312). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. For the purpose of open access, the author has applied a CC-BY public copyright licence to any author-accepted manuscript version arising from this submission.
Author information
Authors and Affiliations
Contributions
TMY, MA, LT, SEH, ECU, RJH, JAM, CFW and HVF made substantial contributions to the conception and design of the curation template. MA and LT iteratively drafted the curation template document. TMY drafted the initial manuscript. All authors read, revised and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yates, T.M., Ansari, M., Thompson, L. et al. Curating genomic disease-gene relationships with Gene2Phenotype (G2P). Genome Med 16, 127 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13073-024-01398-1
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13073-024-01398-1