Leveraging existing data sets to generate new insights into Alzheimer’s disease biology in specific patient subsets

By September 23, 2015July 17th, 2020Publications

To generate new insights into the biology of Alzheimer’s Disease (AD), we developed methods to combine and reuse a wide variety of existing data sets in new ways. We first identified genes consistently associated with AD in each of four separate expression studies, and confirmed this result using a fifth study. We next developed algorithms to search hundreds of thousands of Gene Expression Omnibus (GEO) data sets, identifying a link between an AD-associated gene (NEUROD6) and gender. We therefore stratified patients by gender along with APOE4 status, and analyzed multiple SNP data sets to identify variants associated with AD. SNPs in either the region of NEUROD6 or SNAP25 were significantly associated with AD, in APOE4+ females and APOE4+ males, respectively. We developed algorithms to search Connectivity Map (CMAP) data for medicines that modulate AD-associated genes, identifying hypotheses that warrant further investigation for treating specific AD patient subsets. In contrast to other methods, this approach focused on integrating multiple gene expression datasets across platforms in order to achieve a robust intersection of disease-affected genes, and then leveraging these results in combination with genetic studies in order to prioritize potential genes for targeted therapy.