If you would walk into the White House Oval Office today and look down at the floor, you would see at its centre the Seal of the United States of America. It depicts an eagle with a quiver of arrows in one claw, an olive branch in the other and in its beak a banner reading: E Pluribus Unum; Out of Many, One.
While preparing a presentation for the ‘IOSpress 30 years in scientific publishing’ symposium, I realised how much this motto reflects the main challenge of ‘Big Data’ analytics in the life sciences. Are we not all searching for this particular ‘one’ drug target, biomarker or disease indication ‘out of many’ possible candidates?
Take for example the 175+ life sciences data sources that have been integrated in the Euretos Knowledge Platform. It is of course great to have all this data together in one single view. On the other hand it also makes painfully clear the amount of data that is available for a researcher to assess:
In the above example, 94 genes associated with presenile dementia have amongst themselves close to 3300 relations. And this only covers the genetic interactions!
These ‘ridiculograms’ are clearly not human-interpretable and require smart ways to deal with the data overload. For us this data overload has been a major challenge and to address it we use the parallel of the hourglass:
Basically you first need approaches to bring down the number of potential candidates to a relevant short list: go from a high volume of candidates to just a few. Then you can expand the detail again and add relevant key ‘multi-omics’ players interacting with the short-list. This way you can move from high volume to high detail in a controlled manner.
An example of how we enable the researcher to manage the volume of results is the intersecting function in Euretos Analytics. In this approach the user is able to create and overlay sets of concepts (genes, metabolites, pathways etc.), each representing specific criteria as shown in the Sarcoidosis example below:
In addition to this approach other strategies are also possible such as using ranking, sorting and scoring algorithms. Having reached a much more manageable set of concepts, the researcher can now start to expand the analysis and add further detail by identifying key players such as such as pathways, cell & tissue functions, interacting genes, proteins and small molecules.
E Pluribus Unum; Out of Many, One. That is the quest all life scientists are in the end involved in. Not an easy quest, because the ‘pluribus’ of options is vast. But, with the appropriate tools to navigate high volume as well as high complexity, a quest that can reach its destiny, in the end.