19 Dec

Using SPLOMs for Exploratory analysis

Over the last 2 weeks I’ve been working with cancer researchers from UC Davis and NIH for helping them use the Data Explorer we’ve been building at Emory. The use of SPLOMs was suggested by my mentor Ashish and in this post I’ll discuss how SPLOMs are crucial for exploratory analyses. Most of the ideas in this post emerged from the meetings and discussions with Ashish. 

In exploratory analysis the goal is to use visual methods to find hidden patterns in the data and to help analysts coming up with a hypothesis. The DataExplorer‘s goals are just that.  I won’t be talking much about the specifics of the work we are doing but instead will focus on using SPLOMs effectively and the reasons we’re incorporating it in the DataExplorer.

Bostock's D3 SPLOM exmample

Bostock’s D3 SPLOM exmample

A SPLOM is a Scatter Plot Matrix. It lets you do pairwise comparison of different data attributes. Its a really effective way of identifying relationships in the dataset. Following are some insights that we gathered while using SPLOMs with TCGA datasets.

  1. Quick summary of correlations between multiple attributes. Choosing which attributes to represent on the SPLOM is crucial. For our purpose we used Domain expertise provided by the experts from UC Davis. In case domain expertise is not available statistical techniques like PCA etc. can be used to identify attributes to be represented as SPLOMs.
  2. Allowing drill down and working with a limited datasets. The DataExplorer lets you filter datapoints on the SPLOM to explore relations in the subsets of the data. This step is crucial as it lets you drill down and explore related subsets of the data.
  3. Whats hidden can also be useful! Often visualization systems hide filtered data points. We chose to “gray out” filtered data points to be able to see what gets hidden as often thats useful in formulating hypothesis and analysis.
  4. Handling messy/missing data. Biomedical data is messy! Any realworld data set is messy. Handling it properly is crucial to a useful visual experience. For our purposes we snapped the missing data on the negative axes.
  5. Continuos variables. The SPLOM works best when the attributes are continous. For discrete variables you could use bubble charts on the SPLOM.

We use a fork of dc.js . I’ll be posting a guide about using dc.js for creating SPLOMs.