SNP Disease-association

A single nucleotide polymorphism or SNP results from the mutation of a single nucleotide base pair in the genome. These SNPs account for much of the phenotypical variation among humans and, every so often, such mutations result in disease. The disrupted mechanism resulting in disease depends on the location of the mutation i.e. a mutation in the coding region may cause an amino acid substitution in the protein product; this is known as a non-synonymous SNP or as a single amino acid polymorphism (SAP). Even with the relative rarity of coding regions in the genome, SAPs account for roughly 50% of the genetic diseases caused by point mutations. A number of large scale high throughput experiments have accumulated considerable SAP related data but fail to characterize SAPs in terms of disease and the underlying causes are poorly understood.
New Representation
Our work has focused on developing a new representation for a protein with a single amino acid polymorphism. First, we consider a new representation for solvent accessibility using the beta carbon that does not require a crystal structure. Second, we hypothesize that microenvironments around a SAP may determine whether that it might be deleterious to the protein's function. To this end, we consider the residue types within a predetermined radius. Third, we also consider nearby functional sites that could be disrupted by a SAP. Along with these new attributes, we also benchmark a large number of previously proposed attributes [7].