Change - Aging Variant Database

Created on June 4, 2013, 7:19 p.m. by Hevok & updated on July 1, 2013, 8:16 p.m. by Hevok

====================== ¶
Aging Variant Database ¶
====================== ¶
¶
NAR instructions for July 1 ¶
=========================== ¶
On July 1, we have to submit a database description, highlighting the particular features that we offer, and explaining how our db differs from existing ones (especially other NAR dbs). ¶
¶
Here are the instructions from the NAR website: ¶
¶
Pre-submission inquiries for the description of NEW databases should include the URL of a fully functional database and a brief description of the database. If NAR Molecular Biology Database Collection includes any similar databases, the authors should explain how the new database is different from (and/or better than) the existing ones. The proposal messages should use plain text or HTML format and avoid any attachments. Authors of the databases dedicated to relatively narrow topics (e.g., organism-specific genome databases, plant databases or databases of specific protein families) should be able to explain how their database would benefit the wide readership of NAR. ¶
¶
Draft NAR inquiry ¶
================= ¶
Hello Dr. Galperin, ¶
¶
We would like to submit a new database, LongeviGene (http://denigma.de/lifespan/variants/) to the NAR database issue. LongeviGene is the first resource bringing together data on all known genetic variants in humans that have been studied for their relation to exceptional longevity. We have systematically curated all GWAS, candidate gene, and candidate region studies conducted on long-lived human populations from the scientific literature. We have included both those studies linking new variants to human longevity, as well as those refuting putative longevity variants identified in previous works. In total our database comprises X distinct variants curated from Y publications. Variants are annotated with several sources of relevant data such as … . LongeviGene can be searched by …, and the entire database can be downloaded as …. . We anticipate that LongeviGene will complement existing databases as a valuable tool for unraveling the genetics of healthy aging. &para]
¶
Though there are several valuable databases that collect data on variants relevant to specific diseases (e.g. AlzGene), no resource of this type exists for aging. Existing aging-focused databases (HAGR, Lifespan Observations Database, etc.) contain extensive data on genetic and small molecule interventions, mostly in model organisms, but none catalogs genetic variants associated to exceptional human longevity. ¶
¶
Sincerely, ¶
... ¶
¶
Example successful NAR inquiries ¶
================================ ¶
¶
EXAMPLE 1: NETWORX ¶
¶
Hello Dr. Galperin, ¶
¶
We would like to submit a new database, NetwoRx (http://ophid.utoronto.ca/networx/), that integrates large-scale chemogenomic experiments in yeast to connect drug response to biological pathways, phenotypes, and networks. NetwoRx is the first resource to store data from these extremely valuable yeast chemogenomics experiments. &para]
¶
In total, NetwoRx stores data on 5924 genes and 466 drugs. We applied data-mining approaches to identify yeast pathways, functions, and phenotypes that are targeted by particular drugs, compute measures of drug-drug similarity, and construct drug-phenotype networks. These data are all available to search or download through the database. Users can search NetwoRx by drug name, gene name, or gene set identifier (Gene Ontology, KEGG, YEASTRACT, or SGD phenotype). Drug networks can also be downloaded as Navigator 2 xml files for network visualization. We also set up automated analysis routines in NetwoRx; users can query new gene sets against the entire collection of drug profiles and NetwoRx will retrieve the drugs that target them. ¶
¶
NetwoRx and similar tools that facilitate the computational interrogation of high-throughput drug-response experiments can help provide crucial clues to the global cellular response to drugs. This database should interest anyone concerned with the bioinformatics and systems biology of drugs, drug mode-of-action, and drug repurposing, or with genome-scale screening in S. cerevisiae. ¶
¶
Sincerely, ¶
... ¶
¶
EXAMPLE 2: SCRIPDB ¶
¶
Hello Dr. Galperin, ¶
¶
We would like to submit a new database supporting the Search for ¶
Chemicals and Reactions In Patents to the NAR Database issue. It is ¶
available at http://dcv.uhnres.utoronto.ca/SCRIPDB/search/ &para]
¶
Databases such as PubChem and ChEBI contain chemical structures ¶
disclosed by patents. However, patents are a rich source of metadata ¶
about bioactive molecules, such as disease class, mechanism of action, ¶
homologous experimental series, or the synthetic pathways to produce ¶
molecules of interest. Unfortunately, this metadata is lost when ¶
chemical structures are deposited separately in databases. To ¶
facilitate the use of such metadata, we provide SCRIPDB, a ¶
patent-oriented chemical structure database. In addition to the MOL ¶
files common to structure-focused databases, SCRIPDB provides ¶
downloads of the full original patent text and the other chemical ¶
structures claimed within any individual patent. Furthermore, we ¶
provide the original ChemDraw (CDX) and image (TIF) files from which ¶
can be derived additional relationships and information such as ¶
chemical reactions or OCR error recognition. SCRIPDB may be searched ¶
by exact chemical structure, substructure, or Tanimoto similarity ¶
(>0.8) and the results may be optionally restricted to patents that ¶
describe synthetic routes. ¶
¶
Sincerely, ¶
... ¶
¶
¶
:Abstract: We are constructing the most comprehensive database on genetic variants associated with longevity in humans. This project is a collaborative crowdsourced endeavour which will create a resource that is very valuable for deciphering the genetics of human Aging. ¶
¶
:Journal: Nucleic Acid Research ¶
:Deadline: July 1. 2013 ¶
¶
Introduction ¶
============ ¶
Aging has a genetic component which is estimated to be over 25%. Different individuals age with a different pace even if they have a similar lifestyle. The reason for this is that the genetic changes that differentiate an individual in specific genetic locations change the aging program or render an individual more resistant against damages and age-related diseases. ¶
¶
As the cost for genomic sequences falls the genetic markup underlying human longevity is becoming unraveled, there is a urgent need to utilize the knowledge on genetic variants that influence the speed of aging. ¶
¶
The simplest form of such variants are Single-Nucleotide Polymorphism (SNPs). Association to longevity can be identified either via high throughput Genome wide Association Studies (GWAS) or via focused candidate gene approaches. ¶
¶
This information is to increase the relevance about the functional modifications at the protein level in the SNP. ¶
¶
It is really important to have a idea why certain SNPs create a longevity effect, although this can not be done yet with all SNPs. SNPs some times change the promoter activity or protein function which can be used for functional networks algorithms. Some times they are just photogenic traces and the real criminal gene is surrounding there with other variants not detectable by SNP modifications. ¶
¶
It is important to know how a SNP affects protein function and activity and give some more relevance to the information. Functional assays are crucial for figuring out what longevity SNPs do and why they matter. ¶
¶
Methods ¶
======= ¶
Study Identifications ¶
--------------------- ¶
We designed a Boolean expression for paper identification. We use an inclusive policy which means positive as well as negative studies are included. ¶
¶
Articles were flagged as "curate" if they have valuable information for the database, "discard" if the are irrelevant", and "review" if they are relevant reviews. Reviews are not curated in any detail, but they are ranked on a subjective scale from 1to 5; where 1 means highly relevant. ¶
¶
Database Design ¶
--------------- ¶
Each genetic variant investigated in a study is a data record, which has defined polymorphism and a genomic location. If the polymorphism is within or next to a gene it is linked to the respective genetic factor. Further a record of a polmorphism has an informal description as well as association specific information such as the ethnicity, age of cases (mean) and shorter/longer-lived alleles. Further statistical data like number of cases/controls in the initial and replication studies, odds-ratio, p-value and whether it is significant are also included, if available. Moreover the utilized technology (e.g. PCR) and the type of study (GWAS, candidate gene). We strictly provide references to the primary information. ¶
¶
Where information is known about the functional impact of a longevity variant (e.g. APOE, CETP), this is also represented in the database. ¶
¶
Discussion ¶
========== ¶
For several longevity SNPs, no functional assays have been conducted yet, but then hopefully our database will help to identify the most well-replicated mystery SNPs, and let us prioritize them for functional follow-up
###################### ¶
Aging Variant Database ¶
###################### ¶
:Journal: Nucleic Acid Research ¶
:Deadline: July 1. 2013 ¶
:Abstract: We are constructing the most comprehensive database on genetic variants associated with longevity in humans. This project is a collaborative crowdsourced endeavour which will create a resource that is very valuable for deciphering the genetics of human Aging. This database will allow for a greater understanding of aging so we can solve the problems of aging. ¶
¶
Introduction ¶
============ ¶
Aging is the single biggest underlying cause of disease. Cancer, diabetes, Alzheimer's stroke and heart conditions are mostly diseases of aging. ¶
¶
Aging has a genetic component which is estimated to be over 25%. Different individuals age with a different pace even if they have a similar lifestyle. The reason for this is that the genetic changes that differentiate an individual in specific genetic locations change the aging program or render an individual more resistant against damages and age-related diseases. ¶
¶
As the cost for genomic sequences falls the genetic markup underlying human longevity is becoming unraveled, there is a urgent need to utilize the knowledge on genetic variants that influence the speed of aging. ¶
¶
The simplest form of variants are Single-Nucleotide Polymorphisms (SNPs). A SNP is like a single letter in the 23 volume manual of human life containing approximately 3 billion letters. It is important to understand why certain SNPs affect longevit. Furthermore, SNPs sometimes change, or operate in networks, and work in other ways we are beginning to understand. ¶
¶
Association to longevity can be identified either via high throughput Genome wide Association Studies (GWAS) or via focused candidate gene approaches. ¶
¶
This information is to increase the relevance about the functional modifications at the protein level in the SNP. ¶
¶
It is really important to have a idea why certain SNPs create a longevity effect, although this can not be done yet with all SNPs. SNPs some times change the promoter activity or protein function which can be used for functional networks algorithms. Some times they are just photogenic traces and the real criminal gene is surrounding there with other variants not detectable by SNP modifications. ¶
¶
It is important to know how a SNP affects protein function and activity and give some more relevance to the information. Functional assays are crucial for figuring out what longevity SNPs do and why they matter. ¶
¶
There are around thousands of genetic variants associated with longevity. ¶
¶
* Polymorphism ¶
* Multiple Gene effects [20834067; 20569235]. ¶
* Haplotypes [17445222; 23382853]. ¶
* Copy number variations ¶
¶
Polymorph: number of evolution ? ¶
¶
intergenic, exons, ¶
:Ensembl variants. ¶
¶
There are three type of Studies: Candidate region approach, Candidate gene approach, genome-wide association studies. ¶
¶
Look at current genetic databases and borrow ideas. ¶
Region centric search. Most of the SNPs are not associated. Annotate. ¶
¶
In this study, [X number] papers were curated or evaluated for inclusing in our database. In the papers, longevity determing genes were identified via high throughput genome-wide-association studies (GWAS), focused candidate gene approaches, and others. ¶
¶
This database will allow researchers to better answer questions, solve problems, and make smart decisions for allocating future resources. ¶
¶
¶
Methods ¶
======= ¶
Study Identifications ¶
--------------------- ¶
We designed a Boolean expression for paper identification. We use an inclusive policy which means positive as well as negative studies are included. ¶
¶
Articles were flagged as "curate" if they have valuable information for the database, "discard" if the are irrelevant", and "review" if they are relevant reviews. Reviews are not curated in any detail, but they are ranked on a subjective scale from 1to 5; where 1 means highly relevant. ¶
¶
.. In Methods, just a little more detail should be offered as to how the papers were analyzed to get them into the database format, and how it will result in turning the info into a uniform format that can be used. ¶
¶
Database Design ¶
--------------- ¶
Each genetic variant investigated in a study is a data record, which has defined polymorphism and a genomic location. If the polymorphism is within or next to a gene it is linked to the respective genetic factor. Further a record of a polymorphism has an informal description (notes) as well as association specific information such as the ethnicity, age of cases (mean) and shorter/longer-lived alleles. Further statistical data like number of cases/controls in the initial and replication studies, odds-ratio, p-value and whether it is significant are also included, if available. Moreover the utilized technology (e.g. PCR) and the type of study (GWAS, candidate gene). We strictly provide references to the primary information. ¶
¶
Each row represents an association to be more precise. So in the case a study looked into the same variant in different populations, we want to have here different entries. ¶
¶
An association of a genetic variant with longevity in a specific population can be either significant or not. If it is a significant association it can be a positive or a negative association (i.e. it is overrepresented or underrepresented in long-lived individuals, respectively). ¶
¶
Where information is known about the functional impact of a longevity variant (e.g. APOE, CETP), this is also represented in the database. ¶
¶
User Interface ¶
-------------- ¶
For the UI we need: ¶
* Filters ¶
* Queries ¶
* Additions/editions/deletion ¶
* Mouseover for definition ¶
¶
We will have an index view, and table view of studies implicated in longevity (and whether they are curated or not) as well as table view for association and varaints. We need a detail view for both association and variant. Possible other views are Population list and table, study type and technology. Population need to support hierarchy. ¶
¶
Discussion ¶
========== ¶
For several longevity SNPs, no functional assays have been conducted yet, but then hopefully our database will help to identify the most well-replicated mystery SNPs, and let us prioritize them for functional follow-up. ¶
¶
A "disease SNP" is not a "true longevity SNP". Since age-related disease clearly do shorten the lifespan this means that "disease SNPs" and "true longeivty SNPs" are like two overlapping olympic rings, a Ven Diagram with an area where the two cicles overlap [James P. Watson, personal communication]
.


Comment: Updated entry

Comment on This Data Unit