26 Dec

Exploring relationships and drug abuse in young adults.

I am doing a MOOC on data visualization offered by Wesleyan University. As a part of that course I need to work on a research question that I’ll be working on during the course. The dataset I’ve chosen is the National Longitudinal Study of Adolescent Health (AddHealth) dataset. My initial work is going to be exploratory and I will be working towards interesting hypothesis. For my week 1 I’ll be focusing on:

  1. Exploring the usage of Alcohol vis-a-vis the relationship history. How does alcohol usage affect the relationships of adolescents.
  2. Later on I plan to work with other dimensions that are there in the dataset for a explainatory presentation.

Literature Survey

  1. Fleming CB, White HR, Catalano RF. Romantic Relationships and Substance Use in Early Adulthood: An Examination of the Influences of Relationship Type, Partner Substance Use, and Relationship Quality. Journal of health and social behavior. 2010;51(2):153-167.
  2. Goodman E, Huang B. Socioeconomic Status, Depressive Symptoms, and Adolescent Substance Use. Arch Pediatr Adolesc Med. 2002;156(5):448-453. doi:10.1001/archpedi.156.5.448.

The studies found a strong protective effect of marriage on substance use and abuse. Their research indicated that single young adults had higher rates of substance abuse. They also studied the effect of the quality of relationships. They observed that higher quality relationship had a negative association with smoking.

19 Dec

Using SPLOMs for Exploratory analysis

Over the last 2 weeks I’ve been working with cancer researchers from UC Davis and NIH for helping them use the Data Explorer we’ve been building at Emory. The use of SPLOMs was suggested by my mentor Ashish and in this post I’ll discuss how SPLOMs are crucial for exploratory analyses. Most of the ideas in this post emerged from the meetings and discussions with Ashish. 

In exploratory analysis the goal is to use visual methods to find hidden patterns in the data and to help analysts coming up with a hypothesis. The DataExplorer‘s goals are just that.  I won’t be talking much about the specifics of the work we are doing but instead will focus on using SPLOMs effectively and the reasons we’re incorporating it in the DataExplorer.

Bostock's D3 SPLOM exmample

Bostock’s D3 SPLOM exmample

A SPLOM is a Scatter Plot Matrix. It lets you do pairwise comparison of different data attributes. Its a really effective way of identifying relationships in the dataset. Following are some insights that we gathered while using SPLOMs with TCGA datasets.

  1. Quick summary of correlations between multiple attributes. Choosing which attributes to represent on the SPLOM is crucial. For our purpose we used Domain expertise provided by the experts from UC Davis. In case domain expertise is not available statistical techniques like PCA etc. can be used to identify attributes to be represented as SPLOMs.
  2. Allowing drill down and working with a limited datasets. The DataExplorer lets you filter datapoints on the SPLOM to explore relations in the subsets of the data. This step is crucial as it lets you drill down and explore related subsets of the data.
  3. Whats hidden can also be useful! Often visualization systems hide filtered data points. We chose to “gray out” filtered data points to be able to see what gets hidden as often thats useful in formulating hypothesis and analysis.
  4. Handling messy/missing data. Biomedical data is messy! Any realworld data set is messy. Handling it properly is crucial to a useful visual experience. For our purposes we snapped the missing data on the negative axes.
  5. Continuos variables. The SPLOM works best when the attributes are continous. For discrete variables you could use bubble charts on the SPLOM.

We use a fork of dc.js . I’ll be posting a guide about using dc.js for creating SPLOMs.