Integrity, standards, and QC-related issues with big-data in pre-clinical drug discovery

The tremendous expansion of data analytics and public and private big datasets presents an important opportunity for pre-clinical drug discovery and development. In the field of life sciences, the growth of genetic, genomic, transcriptomic and proteomic data is partly driven by a rapid decline in experimental costs as biotechnology improves throughput, scalability, and speed. Yet far too many researchers tend to underestimate the challenges and consequences involving data integrity and quality standards. Given the effect of data integrity on scientific interpretation, these issues have significant implications during preclinical drug development. We describe standardized approaches for maximizing the utility of publicly available or privately generated biological data and address some of the common pitfalls. We also discuss the increasing interest to integrate and interpret cross-platform data. Principles outlined here should serve as a useful broad guide for existing analytical practices and pipelines and as a tool for developing additional insights into therapeutics using big data.