Here are a few of the design elements:
- Small multiples: put opportunity for multiple comparisons within one eye-span
- Use the alpha channel (transparency) to convey an indication of the underlying probability density: the more data points overlap, the darker the region appears
- Use transparency to lighten the "heavy grid prison" of the scatter plot axes. The dark parts of the figure are the density lines and the tightly clustered data points themselves.
- Display less redundant information than the default matrix style multiple scatter plot, and spill less ink on the 'bureaucratic' parts of the figure (on this see more below)
- Use a bullet point list to describe the graphic ; - )
Default Data Frame Scatter Plot |
The default approach plots every variable in the data frame versus every other. The upper triangular portion of the matrix is simply the transpose of the lower triangular portion, so only half the plot is actually conveying unique information. For my purposes I find being able to plot the inputs versus the outputs is more useful. Taking this tabular approach means each plot conveys a unique relationship (though I'm only plotting 6 rather than 10 of the default approach).
The R script to generate the multi-scatter plot graphic is shown below.
I just found the seaborn python visualization library which has the pairplot function that accomplishes some pretty nice pairwise scatter plots with the univariate densities down the diagonal of the plot grid. I like this one for quick and easy visualization to get familiar with a data set.
ReplyDelete