We've updated our privacy policy.

UMAP clustering of NSRR data

Visualizing data is almost always useful, but this can be difficult when you have a lot of complex data. This is where dimension reduction techniques can play an important role. As described here, we applied one such technique to sleep EEG spectra from over 16,000 individuals in the NSRR, to get a feel for some of the sources of individual differences (both physiological and artefactual) in these data.

Uniform Manifold Approximation and Projection (or UMAP) is a new dimension reduction technique that can be used to visualize patterns of clustering in high-dimensional data. Unlike PCA (but similar to other approaches such as t-SNE), it is focused on local clustering, meaning that whilst "similar" observations should be grouped together, it does not attempt to preserve the exact global structure between all observations. Seemingly, in many contexts this property can make for more intuitive (and visually interesting) representations of data. (Think of sorting Lego blocks into piles based on their shape and size: you'd likely only care that similar blocks are in the same pile, and not so much about the relative positions of the different piles to each other.)

This approach has been used widely on single-cell RNA sequencing data to delineate cell types, as well as population genetic data, to reveal fine scale ethnic and geographical structure in human populations at different scales. And, conveniently, UMAP is very computationally efficient. Sounds great -- so what does the NSRR look like through the lens of UMAP?

Purely to explore some NSRR data and to generate a few images for their own sake, we applied UMAP (as implemented in the umap R package and making absolutely no effort to use anything other than its default settings) to EEG power spectra from over 10 million epochs of sleep. To read about this analysis, see this vignette on the Luna website.

Shaun Purcell, smpurcell@bwh.harvard.edu

URL: http://zzz.bwh.harvard.edu/luna/vignettes/nsrr-polarity/

By shaunpurcell on May 28, 2019 May 28, 2019 in Data Notes
no comments
· sorted by
Write a Reply