9 Practice on Other Datasets

In this section, we will apply what we have learned in previous chapters by analyzing single cell gene expression datasets from 10x Genomics. First we will work through one together, for both SingleCellExperiment and Seurat workflows, to collectively figure out how to modify our earlier analysis to handle the new data. Then you may choose another one to attempt on your own (or in groups if you prefer).

9.1 Downloading data

Several of the 10x Genomics gene expression datasets already live on the NCGR server, so there is no need to download them. We are interested in the Feature / cell matrix (filtered) data, which contain compressed barcodes.tsv, features.tsv, and matrix.mtx files, as in our earlier dataset.

The Clustering analysis data are not known cell types, but unidentified clusters using graph-based or k-means clustering.

9.2 Determining cell types

When we used SingleR in the Dimensionality Reduction chapter, we used our own dataset as the reference. Again, if you work with human or mouse single-cell data, the Bioconductor packages celldex and scRNAseq provide several public cell type databases for annotating them. We will learn how to use these as our reference dataset.

9.3 Ready?

The 10x Genomics datasets live in their own directory, so let’s make a symbolic link to it.

ln -s /home/data/single-cell-workshop/data-10x-genomics data-10x-genomics

It is also worth creating a separate pdf subdirectory for each dataset, to keep their charts separate. For example,

if (!dir.exists("pdf-human-pbmc")) dir.create("pdf-human-pbmc")

9.3.1 Human PBMCs data, using SingleCellExperiment

5k Human PBMCs, 3’ v3.1, Chromium Controller
data-10x-genomics/SC3pv3_GEX_Human_PBMC/

9.3.2 Mouse brain data, using Seurat

5k Adult Mouse Brain Nuclei Isolated with Chromium Nuclei Isolation Kit
data-10x-genomics/5k_mouse_brain_CNIK_3pv3/

9.3.3 Now you try it

Choose either SingleCellExperiment or Seurat, another 10x Genomics dataset, and (if you want to predict cell types using SingleR) a reference dataset that seems appropriate, and repeat the process on your own. Other datasets on the NCGR server include:

9.4 Further resources

10x Genomics - Other data types such as spatial are available - click the Products filter for a full list. Registration required.

T. Mou, W. Deng, F. Gu, Y. Pawitan, and T.N. Vu, Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing, Frontiers in Genetics 10:1331 (2019). (Datasets)

Single Cell Portal (Broad Institute). Registration required.