7 Seurat

Seurat is another R package for single cell analysis, developed by the Satija Lab. In this module, we will repeat many of the same analyses we did with SingleCellExperiment, while noting differences between them.


Important note: In this workshop, we use Seurat v4 (4.4.0). The Seurat package is currently transitioning to v5, and some of the code is not compatible between versions. If you end up working with Seurat v5 in the near future, the following shows how to adapt the workshop code.

Seurat v3, v4 Seurat v5
Accessing counts seurat[["RNA"]]@counts seurat[["RNA"]]$counts
Accessing data seurat[["RNA"]]@data seurat[["RNA"]]$data
Initial value of data Identical to counts Does not exist until you normalize counts
Creating another assay, like CPM CreateAssayObject() CreateAssay5Object()
Integration As in the Integration chapter Requires JoinLayers() before looking at differential expression between integrated datasets
Refer to the Seurat documentation for more details.

First we load the packages we will need.

library(Seurat)
library(ggplot2)
library(gridExtra)

7.1 Build a fake Seurat object from scratch

First we generate the same small matrix of counts as we did for SingleCellExperiment. Note that instead of creating row and column names as metadata, we should attach them to the counts matrix.

set.seed(42)
num_genes <- 12
num_cells <- 8
raw_counts <- as.integer(rexp(num_genes*num_cells, rate = 0.5))
raw_counts <- matrix(raw_counts, nrow = num_genes, ncol = num_cells)

rownames(raw_counts) <- paste("Gene", 1:num_genes, sep = "-")
colnames(raw_counts) <- paste("Cell", 1:num_cells, sep = "-")

And finally we create and inspect a Seurat object.

fake_seurat <- CreateSeuratObject(counts = raw_counts, project = "Fake Seurat")
fake_seurat
## An object of class Seurat 
## 12 features across 8 samples within 1 assay 
## Active assay: RNA (12 features, 0 variable features)
##  2 layers present: counts, data

The default assay is called “RNA” and is accessible from fake_seurat@assays$RNA or just fake_seurat[["RNA"]]. This stores the raw counts in its counts slot, while its data slot is initially equal to counts but reserved for normalized data.

Assays(fake_seurat)
## [1] "RNA"
fake_seurat[["RNA"]]@counts
## 12 x 8 sparse Matrix of class "dgCMatrix"
##         Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1       .      .      2      .      .      .      1      2
## Gene-2       1      .      1      .     13      2      1      .
## Gene-3       .      2      2      .      .      .      2      .
## Gene-4       .      .      .      3      3      .     13      .
## Gene-5       .      .      .      1      2      .      2      .
## Gene-6       2      1      2      1      .      .      .      1
## Gene-7       .      2      1      .      1      3      1      5
## Gene-8       .      .      6      2      .      1      2      3
## Gene-9       2      9      .      3      2      .      2      4
## Gene-10      1      1      .      .      1      .      .      2
## Gene-11      2      9      3      8      .     11      .      1
## Gene-12      4      .      1      2     16      .      .      3
all(fake_seurat[["RNA"]]@data == fake_seurat[["RNA"]]@counts)
## [1] TRUE

Note that counts is a sparse matrix by default.

7.1.1 Metadata

The meta.data slot is created with three columns by default.

head(fake_seurat@meta.data)
##         orig.ident nCount_RNA nFeature_RNA
## Cell-1 Fake Seurat         12            6
## Cell-2 Fake Seurat         24            6
## Cell-3 Fake Seurat         18            8
## Cell-4 Fake Seurat         20            7
## Cell-5 Fake Seurat         38            7
## Cell-6 Fake Seurat         17            4

orig.ident is the project name we assigned this Seurat object, nCount_RNA is the total counts for each cell (like sum in SingleCellExperiment), and nFeature_RNA is the number of genes with nonzero counts for each cell (like detected).

7.1.2 Normalization and multiple assays

The Seurat normalization functions work slightly differently than in SingleCellExperiment, where multiple assays like logcounts, normcounts, and cpm naturally coexist. Instead, Seurat expects you to explicitly create a new assay for each (non-default) one, starting from the same counts. The transformed data are assigned to the new assay’s data slot, as in this example for counts per million. To compute CPM, use the relative counts (“RC”) method.

fake_seurat[["CPM"]] <- CreateAssayObject(counts = fake_seurat[["RNA"]]@counts)
fake_seurat <- NormalizeData(fake_seurat, assay = "CPM", normalization.method = "RC", scale.factor = 1000000)
fake_seurat[["CPM"]]@data
## 12 x 8 sparse Matrix of class "dgCMatrix"
##            Cell-1    Cell-2    Cell-3 Cell-4    Cell-5    Cell-6    Cell-7
## Gene-1       .         .    111111.11      .      .         .     41666.67
## Gene-2   83333.33      .     55555.56      . 342105.26 117647.06  41666.67
## Gene-3       .     83333.33 111111.11      .      .         .     83333.33
## Gene-4       .         .         .    150000  78947.37      .    541666.67
## Gene-5       .         .         .     50000  52631.58      .     83333.33
## Gene-6  166666.67  41666.67 111111.11  50000      .         .         .   
## Gene-7       .     83333.33  55555.56      .  26315.79 176470.59  41666.67
## Gene-8       .         .    333333.33 100000      .     58823.53  83333.33
## Gene-9  166666.67 375000.00      .    150000  52631.58      .     83333.33
## Gene-10  83333.33  41666.67      .         .  26315.79      .         .   
## Gene-11 166666.67 375000.00 166666.67 400000      .    647058.82      .   
## Gene-12 333333.33      .     55555.56 100000 421052.63      .         .   
##            Cell-8
## Gene-1   95238.10
## Gene-2       .   
## Gene-3       .   
## Gene-4       .   
## Gene-5       .   
## Gene-6   47619.05
## Gene-7  238095.24
## Gene-8  142857.14
## Gene-9  190476.19
## Gene-10  95238.10
## Gene-11  47619.05
## Gene-12 142857.14

To compute the log-normalized counts, use the (default) “LogNormalize” method. Note that the log-normalized counts use a different scale factor than in SingleCellExperiment, but this does not affect downstream computations.

fake_seurat <- NormalizeData(fake_seurat) # normalization.method = "LogNormalize"
fake_seurat[["RNA"]]@data
## 12 x 8 sparse Matrix of class "dgCMatrix"
##           Cell-1   Cell-2   Cell-3   Cell-4   Cell-5   Cell-6   Cell-7   Cell-8
## Gene-1  .        .        7.014015 .        .        .        6.034684 6.860015
## Gene-2  6.726633 .        6.321767 .        8.137996 7.071124 6.034684 .       
## Gene-3  .        6.726633 7.014015 .        .        .        6.726633 .       
## Gene-4  .        .        .        7.313887 6.672632 .        8.597420 .       
## Gene-5  .        .        .        6.216606 6.267800 .        6.726633 .       
## Gene-6  7.419181 6.034684 7.014015 6.216606 .        .        .        6.167916
## Gene-7  .        6.726633 6.321767 .        5.576547 7.476306 6.034684 7.775676
## Gene-8  .        .        8.112028 6.908755 .        6.378826 6.726633 7.265130
## Gene-9  7.419181 8.229778 .        7.313887 6.267800 .        6.726633 7.552637
## Gene-10 6.726633 6.034684 .        .        5.576547 .        .        6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 .        8.775177 .        6.167916
## Gene-12 8.112028 .        6.321767 6.908755 8.345580 .        .        7.265130
Assays(fake_seurat)
## [1] "RNA" "CPM"
fake_seurat@active.assay
## [1] "RNA"

Now we have two assays, “RNA” and “CPM”. Both have the same counts but different data, and “RNA” is the active (default) one which (in this case) contains the log-normalized data.

7.2 Import a real one from our dataset

First we import the same data as before, using the Read10X() function.

my_counts <- Read10X("data/filtered_feature_bc_matrix/")
dim(my_counts)
## [1] 34218  5780
my_seurat <- CreateSeuratObject(counts = my_counts, project = "Arabidopsis")
my_seurat
## An object of class Seurat 
## 34218 features across 5780 samples within 1 assay 
## Active assay: RNA (34218 features, 0 variable features)
##  2 layers present: counts, data
# The following are equivalent, as you can see
# dim(my_seurat@assays$RNA@counts)
# dim(my_seurat[["RNA"]]@counts)
# dim(my_seurat[["RNA"]])
dim(my_seurat)
## [1] 34218  5780
dim(my_seurat@meta.data)
## [1] 5780    3
head(my_seurat@meta.data)
##                     orig.ident nCount_RNA nFeature_RNA
## AAACCCACACGCGCAT-1 Arabidopsis       2285         1382
## AAACCCACAGAAGTTA-1 Arabidopsis       2609         1525
## AAACCCACAGACAAAT-1 Arabidopsis       4712         2606
## AAACCCACATTGACAC-1 Arabidopsis       6133         3434
## AAACCCAGTAGCTTTG-1 Arabidopsis       1243          998
## AAACCCAGTCCAGTTA-1 Arabidopsis       1095          914

The metadata at this point are orig.ident (the project name), nCount_RNA (sum from my_SCE), and nFeature_RNA (detected from my_SCE).

We can also add the predetermined clusters.

df.clusters <- readRDS("data/clusters_5.rds")
df.clusters$cell_barcode <- substring(df.clusters$cell_barcode, 1, 18)
my_seurat@meta.data <- merge(my_seurat@meta.data, df.clusters, by.x = "row.names", by.y = "cell_barcode", all.x = TRUE)
# It converted row.names(my_seurat@meta.data) to a spurious column called Row.names,
# so clean up
row.names(my_seurat@meta.data) <- my_seurat@meta.data$Row.names
my_seurat@meta.data$Row.names <- NULL
head(my_seurat@meta.data)
##                     orig.ident nCount_RNA nFeature_RNA cluster_id
## AAACCCACACGCGCAT-1 Arabidopsis       2285         1382         15
## AAACCCACAGAAGTTA-1 Arabidopsis       2609         1525         15
## AAACCCACAGACAAAT-1 Arabidopsis       4712         2606          4
## AAACCCACATTGACAC-1 Arabidopsis       6133         3434         14
## AAACCCAGTAGCTTTG-1 Arabidopsis       1243          998         17
## AAACCCAGTCCAGTTA-1 Arabidopsis       1095          914          9
##                          supercluster
## AAACCCACACGCGCAT-1   Endodermal cells
## AAACCCACAGAAGTTA-1   Endodermal cells
## AAACCCACAGACAAAT-1      Atrichoblasts
## AAACCCACATTGACAC-1   Endodermal cells
## AAACCCAGTAGCTTTG-1        Stele cells
## AAACCCAGTCCAGTTA-1 Meristematic cells
# Assign display colors for the 21 clusters and 6 superclusters
# (close to those used in Farmer et al.)
cluster_colors <- c(
  "cornflowerblue", "darkblue", "cyan",
  "springgreen", "olivedrab", "chartreuse", "darkgreen",
  "gray", "black", "saddlebrown",
  "slateblue", "purple", "pink", "lightpink3", "hotpink", "darkred",
  "lightsalmon", "orange", "tan", "firebrick1", "indianred"
)
supercluster_colors <- c("blue", "green", "black", "magenta", "red", "orange")

7.2.1 Creating other metadata columns

Here we will create artificial metadata columns corresponding to the Arabidopsis thaliana gene name prefixes (AT1-AT5, ATC, ATM, and ENS). We use Seurat’s PercentageFeatureSet() function to compute the percentage of cell counts belonging to each category.

gene_prefix <- substring(rownames(my_seurat), 1, 3)
feature_categories <- sort(unique(gene_prefix))
for (fc in feature_categories) {
  fc_name <- paste0("pct_", fc)
  my_seurat[[fc_name]] <- PercentageFeatureSet(my_seurat, pattern = paste0("^", fc))
}
head(my_seurat@meta.data)
##                     orig.ident nCount_RNA nFeature_RNA cluster_id
## AAACCCACACGCGCAT-1 Arabidopsis       2285         1382         15
## AAACCCACAGAAGTTA-1 Arabidopsis       2609         1525         15
## AAACCCACAGACAAAT-1 Arabidopsis       4712         2606          4
## AAACCCACATTGACAC-1 Arabidopsis       6133         3434         14
## AAACCCAGTAGCTTTG-1 Arabidopsis       1243          998         17
## AAACCCAGTCCAGTTA-1 Arabidopsis       1095          914          9
##                          supercluster  pct_AT1  pct_AT2  pct_AT3  pct_AT4
## AAACCCACACGCGCAT-1   Endodermal cells 24.11379 16.10503 22.27571 17.41794
## AAACCCACAGAAGTTA-1   Endodermal cells 26.94519 15.82982 20.96589 14.25834
## AAACCCACAGACAAAT-1      Atrichoblasts 25.65789 16.02292 20.47963 16.08659
## AAACCCACATTGACAC-1   Endodermal cells 27.19713 15.32692 18.21295 16.24001
## AAACCCAGTAGCTTTG-1        Stele cells 24.29606 16.41191 21.31939 16.00965
## AAACCCAGTCCAGTTA-1 Meristematic cells 25.20548 13.78995 20.45662 15.89041
##                     pct_AT5    pct_ATC    pct_ATM pct_ENS
## AAACCCACACGCGCAT-1 19.69365 0.04376368 0.35010941       0
## AAACCCACAGAAGTTA-1 21.08087 0.00000000 0.91989268       0
## AAACCCACAGACAAAT-1 21.73175 0.00000000 0.02122241       0
## AAACCCACATTGACAC-1 20.80548 0.35871515 1.85879667       0
## AAACCCAGTAGCTTTG-1 21.64119 0.08045052 0.24135157       0
## AAACCCAGTCCAGTTA-1 23.74429 0.18264840 0.73059361       0
pdf("pdf/at1-5.pdf")
VlnPlot(my_seurat, features = paste0("pct_", "AT", 1:5), pt.size = 0, ncol = 5, fill.by = "feature")  
dev.off()
pdf("pdf/atc-atm-ens.pdf")
VlnPlot(my_seurat, features = paste0("pct_", c("ATC", "ATM", "ENS")), pt.size = 0, ncol = 3, fill.by = "feature")
dev.off()
Violin plots of AT1-AT5 counts

Figure 7.1: Violin plots of AT1-AT5 counts

Violin plots of ATC, ATM, ENSRNA counts

Figure 7.2: Violin plots of ATC, ATM, ENSRNA counts

The chloroplast, mitochondrial, and non-protein coding genes do not contribute much, so we are justified in eliminating them.

7.3 Quality control

We can decide which cells to discard using fixed thresholds. Before we filter out the discards, let’s illustrate FeatureScatter() which creates scatter plots for visualizing feature-feature relationships. It can be used for anything calculated by the object, i.e. columns in object metadata, PC scores etc.

my_seurat$discard <- (my_seurat$nCount_RNA < 1000 | my_seurat$nFeature_RNA < 750 | is.na(my_seurat$cluster_id))
pdf("pdf/seurat-fixed-threshold-qc.pdf")
FeatureScatter(my_seurat, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "discard", cols = c("blue", "tan")) +
  labs(color = "discard")
dev.off()
Cell counts v. number of expressed genes, showing outliers from fixed-threshold QC to discard

Figure 7.3: Cell counts v. number of expressed genes, showing outliers from fixed-threshold QC to discard

Now we can discard those cells and the undesired genes (“ATC”, “ATM”, “ENS”, and those with no cell counts).

genes_to_keep <- (rowSums(my_seurat[["RNA"]]@counts) > 0)
genes_to_keep <- genes_to_keep & gene_prefix %in% paste0("AT", 1:5)
my_seurat <- my_seurat[genes_to_keep, !my_seurat$discard]
dim(my_seurat)
## [1] 23848  5151

7.4 Plots

VlnPlot() shows the variation in individual features. By default, this overlays the violin plot with the jittered count values, so it is difficult to see. To remove them, we add the argument pt.size = 0.

pdf("pdf/nCount-nFeature-by-cluster.pdf")
plot1 <- VlnPlot(my_seurat, features = "nCount_RNA", cols = cluster_colors, pt.size = 0, group.by = "cluster_id") +
  xlab("cluster_id") +
  NoLegend()
plot2 <- VlnPlot(my_seurat, features = "nFeature_RNA", cols = cluster_colors, pt.size = 0, group.by = "cluster_id") +
  xlab("cluster_id") +
  NoLegend()
grid.arrange(plot1, plot2, ncol = 1)
dev.off()
Cell counts and number of expressed genes, by cluster

Figure 7.4: Cell counts and number of expressed genes, by cluster

The group.by argument determines which categories of cell to give their own violin plot. For example, to group by supercluster instead of cluster_id,

pdf("pdf/nCount-nFeature-by-supercluster.pdf")
plot1 <- VlnPlot(my_seurat, features = "nCount_RNA", cols = supercluster_colors, pt.size = 0, group.by = "supercluster") +
  xlab("supercluster") +
  NoLegend()
plot2 <- VlnPlot(my_seurat, features = "nFeature_RNA", cols = supercluster_colors, pt.size = 0, group.by = "supercluster") +
  xlab("supercluster") +
  NoLegend()
grid.arrange(plot1, plot2, ncol = 2)
dev.off()
Cell counts and number of expressed genes, by supercluster

Figure 7.5: Cell counts and number of expressed genes, by supercluster

We can also visualize the same data as a ridgeline plot.

pdf("pdf/nCount-nFeature-ridgeplots.pdf")
plot1 <- RidgePlot(my_seurat, features = "nCount_RNA", cols = supercluster_colors, group.by = "supercluster") +
  ylab(NULL) +
  NoLegend()
plot2 <- RidgePlot(my_seurat, features = "nFeature_RNA", cols = supercluster_colors, group.by = "supercluster") +
  ylab(NULL) +
  NoLegend()
grid.arrange(plot1, plot2, ncol = 1)
dev.off()
Ridgeline plots of cell counts and number of expressed genes, by supercluster

Figure 7.6: Ridgeline plots of cell counts and number of expressed genes, by supercluster

To combine all data into a single category, group by orig.ident.

7.5 Normalization

Again, we can log-normalize our data just by running NormalizeData().

my_seurat <- NormalizeData(my_seurat)
my_seurat[["RNA"]]@counts[1:6, 1:3]
## 6 x 3 sparse Matrix of class "dgCMatrix"
##           AAACCCACACGCGCAT-1 AAACCCACAGAAGTTA-1 AAACCCACAGACAAAT-1
## AT1G01010                  .                  1                  .
## AT1G01020                  .                  .                  1
## AT1G01030                  .                  .                  .
## AT1G01040                  .                  .                  2
## AT1G03993                  .                  .                  .
## AT1G01050                  1                  .                  .
my_seurat[["RNA"]]@data[1:6, 1:3]
## 6 x 3 sparse Matrix of class "dgCMatrix"
##           AAACCCACACGCGCAT-1 AAACCCACAGAAGTTA-1 AAACCCACAGACAAAT-1
## AT1G01010           .                   1.58278           .       
## AT1G01020           .                   .                 1.138695
## AT1G01030           .                   .                 .       
## AT1G01040           .                   .                 1.657348
## AT1G03993           .                   .                 .       
## AT1G01050           1.685227            .                 .

7.6 Identification of highly variable features

Another method of quality control is to determine the most variable genes and use these for dimensionality reduction. Here we use FindVariableFeatures() to identify the 2000 most variable genes (with the top 10 labeled), and VariableFeaturePlot() to visually compare them to the less variable ones.

my_seurat <- FindVariableFeatures(my_seurat) # nfeatures = 2000 by default, unlike 500 as in SingleCellExperiment
top_10_genes <- head(VariableFeatures(my_seurat), 10)
pdf("pdf/variable-features.pdf")
vf_plot <- VariableFeaturePlot(my_seurat)
LabelPoints(plot = vf_plot, points = top_10_genes, repel = TRUE, xnudge = 0, ynudge = 0)
dev.off()
Variable feature plot

Figure 7.7: Variable feature plot

7.7 Dimensionality reduction

7.7.1 PCA

In Seurat, before computing principal components we must scale the (log-normalized) data by subtracting the mean and dividing by the standard deviation of each row. The features argument allows your choice of genes to scale, but the default is to only scale the rows corresponding to the most variable genes.

my_seurat <- ScaleData(my_seurat, verbose = FALSE)
my_seurat <- RunPCA(my_seurat, verbose = FALSE)

We can view the individual PC values for each cell as follows.

# Just the first 6 rows and 5 columns
my_seurat[["pca"]]@cell.embeddings[1:6, 1:5]
##                          PC_1       PC_2       PC_3       PC_4       PC_5
## AAACCCACACGCGCAT-1 -3.5730765  -0.720932  1.5214319 -3.1399856  4.6600356
## AAACCCACAGAAGTTA-1 -4.6125194 -21.638604 -5.1986060 -0.9866993  3.4633961
## AAACCCACAGACAAAT-1  1.8597980   2.741490  1.7616183 10.2241086  1.9462996
## AAACCCACATTGACAC-1 -0.4922986  -8.080446 -0.8845399  0.8801926  0.2192683
## AAACCCAGTAGCTTTG-1 -3.5552763   1.851916  4.3438592 -3.5047218 -2.2177276
## AAACCCAGTCCAGTTA-1 -4.0690844   3.278787 -0.1907693 -1.4210729 -3.8602152

The VizDimLoadings() function shows which genes are most strongly associated with each principal component.

pdf("pdf/vizdimloadings.pdf")
VizDimLoadings(my_seurat, dims = 1:4, nfeatures = 10, reduction = "pca")
dev.off()
Dimensional loadings plot for the first 4 principal components

Figure 7.8: Dimensional loadings plot for the first 4 principal components

DimHeatmap() sorts both genes and cells by principal component values. Genes with the most negative PC values are on top, and those with the most positive values are on the bottom, while cells are arranged from left to right. In this default color scheme, magenta represents positive correlations and yellow represents negative ones, with no correlation in black.

pdf("pdf/dimheatmaps.pdf")
DimHeatmap(my_seurat, dims = 1:4, nfeatures = 10, ncol = 2, cells = 500)
dev.off()
Heatmaps for the first 4 principal components, top 10 genes x 500 cells

Figure 7.9: Heatmaps for the first 4 principal components, top 10 genes x 500 cells

Exercise: compare the genes shown in the heatmaps to those from VizDimLoadings(). Do they agree?
Click for answer Yes, remember to count positive-valued genes from the bottom and negative-valued genes from the top of each heatmap.

The ElbowPlot() function shows the amount of variance in each principal component, giving us a sense of how many PCs we should retain in subsequent computations.

pdf("pdf/elbow-plot-seurat.pdf")
ElbowPlot(my_seurat, ndims = 50)
dev.off()
Elbow plot for 50 PCs

Figure 7.10: Elbow plot for 50 PCs

It looks like we would not gain much by retaining more than 20 PCs in our dimensionality reduction computations.

Finally, we can create our dimensionality reduction plots with DimPlot().

pdf("pdf/pca-by-cluster-seurat.pdf")
DimPlot(my_seurat, reduction = "pca", cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/pca-by-supercluster-seurat.pdf")
DimPlot(my_seurat, reduction = "pca", cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
PCA plot by cluster

Figure 7.11: PCA plot by cluster

PCA plot by supercluster

Figure 7.12: PCA plot by supercluster

We do not really need the reduction = "pca" argument in this initial example, as we have only run PCA so far. Seurat looks for existing reduced dimensions in order: first UMAP, then t-SNE, then PCA. (Therefore, we would do it this way if we wanted to visualize PCA later after having computed UMAP, for example.)

7.7.2 t-SNE

We repeat the same analysis for t-SNE, using 20 principal components.

set.seed(42)
my_seurat <- RunTSNE(my_seurat, dims = 1:20, verbose = FALSE)
my_seurat[["tsne"]]@cell.embeddings[1:6, ]
##                         tSNE_1     tSNE_2
## AAACCCACACGCGCAT-1   9.1858828 -22.510883
## AAACCCACAGAAGTTA-1   0.3305121 -43.819644
## AAACCCACAGACAAAT-1  28.1141137   1.241850
## AAACCCACATTGACAC-1 -21.5127426 -31.453517
## AAACCCAGTAGCTTTG-1 -12.3916993  -9.794599
## AAACCCAGTCCAGTTA-1 -13.2210028  15.448777
pdf("pdf/tsne-by-cluster-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/tsne-by-supercluster-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
t-SNE plot by cluster

Figure 7.13: t-SNE plot by cluster

t-SNE plot by supercluster

Figure 7.14: t-SNE plot by supercluster

7.7.3 UMAP

And the same for UMAP,

set.seed(42)
my_seurat <- RunUMAP(my_seurat, dims = 1:20, verbose = FALSE)
my_seurat[["umap"]]@cell.embeddings[1:6, ]
##                          UMAP_1      UMAP_2
## AAACCCACACGCGCAT-1  -5.46506841  9.20502553
## AAACCCACAGAAGTTA-1 -10.09535081  7.70838723
## AAACCCACAGACAAAT-1   8.11869615  0.06980795
## AAACCCACATTGACAC-1 -13.14318997  0.89037666
## AAACCCAGTAGCTTTG-1  -6.06361253 -5.82680860
## AAACCCAGTCCAGTTA-1   0.01529878 -4.98024049
pdf("pdf/umap-by-cluster-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
UMAP plot by cluster

Figure 7.15: UMAP plot by cluster

UMAP plot by supercluster

Figure 7.16: UMAP plot by supercluster

7.8 Differential gene expression

Feature plots reveal differential expression of marker genes in different cell types. Here we create feature plots for the most strongly associated gene from the first 4 principal components (from the VizDimLoadings() example above).

Idents(my_seurat) <- my_seurat$cluster_id
my_genes <- c("AT5G17390", "AT1G68850", "AT5G22310", "AT3G16340")
pdf("pdf/featureplots-seurat.pdf")
FeaturePlot(my_seurat, features = my_genes, label.size = 4, repel = TRUE, label = TRUE)
dev.off()
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Feature plots for selected genes

Figure 7.17: Feature plots for selected genes

An alternative is to create violin plots for genes of interest.

pdf("pdf/violin-featureplots-seurat.pdf")
VlnPlot(my_seurat, features = my_genes, cols = cluster_colors, pt.size = 0, group.by = "cluster_id", log = TRUE, ncol = 2) +
  xlab("cluster_id") +
  NoLegend()
dev.off()
Violin plots for selected genes

Figure 7.18: Violin plots for selected genes

Another is to make dot plots of gene expression.

pdf("pdf/dotplot-seurat.pdf")
DotPlot(my_seurat, features = top_10_genes, cols = c("darkblue", "red"), cluster.idents = TRUE, group.by = "supercluster") +
  ylab("supercluster") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
dev.off()
Dot plot for selected genes

Figure 7.19: Dot plot for selected genes

Seurat also has a more direct statistical way to automatically suggest which genes go with which clusters (or any category you specify using Idents()), FindAllMarkers(). The following code identifies the top 3 genes for each cluster. First we must load the dplyr package for its data frame manipulation functions.

library(dplyr)
Idents(my_seurat) <- my_seurat$cluster_id

all_markers <- FindAllMarkers(my_seurat, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5, verbose = FALSE)
head(all_markers)
##                  p_val avg_log2FC pct.1 pct.2    p_val_adj cluster      gene
## AT2G20520 5.283152e-85   3.077935 0.750 0.037 1.259926e-80       1 AT2G20520
## AT4G08400 2.328373e-81   4.675131 0.821 0.048 5.552703e-77       1 AT4G08400
## AT3G49960 1.064398e-80   4.071788 0.929 0.063 2.538376e-76       1 AT3G49960
## AT5G04960 4.239402e-79   4.015624 0.964 0.071 1.011013e-74       1 AT5G04960
## AT5G06630 1.586581e-78   5.431388 0.929 0.067 3.783679e-74       1 AT5G06630
## AT1G12040 2.055730e-78   3.985362 0.893 0.060 4.902504e-74       1 AT1G12040
top_3_markers <- as.data.frame(top_n(group_by(all_markers, cluster), n = 3, wt = avg_log2FC))
top_3_markers
##            p_val avg_log2FC pct.1 pct.2     p_val_adj cluster      gene
## 1   4.805192e-55   5.432073 1.000 0.122  1.145942e-50       1 AT3G54590
## 2   2.837409e-50   5.997661 0.929 0.114  6.766654e-46       1 AT2G24980
## 3   3.182976e-42   5.437437 0.964 0.156  7.590760e-38       1 AT3G54580
## 4   0.000000e+00   4.957604 0.975 0.054  0.000000e+00       2 AT1G12560
## 5   0.000000e+00   3.866195 0.944 0.037  0.000000e+00       2 AT1G50930
## 6  2.869877e-270   3.852518 0.975 0.104 6.844082e-266       2 AT4G25820
## 7   0.000000e+00   2.871506 0.655 0.009  0.000000e+00       3 AT1G27140
## 8  3.860169e-127   4.049835 0.905 0.151 9.205730e-123       3 AT1G66270
## 9   2.229071e-93   3.141852 0.672 0.099  5.315887e-89       3 AT1G04040
## 10 2.041199e-183   3.446692 0.605 0.103 4.867850e-179       4 AT4G18550
## 11 1.004522e-135   3.698562 0.562 0.118 2.395585e-131       4 AT3G48340
## 12 5.911867e-106   3.583753 0.568 0.154 1.409862e-101       4 AT3G45010
## 13  0.000000e+00   2.774994 0.926 0.290  0.000000e+00       5 AT1G66800
## 14 1.111775e-269   2.668145 0.842 0.264 2.651361e-265       5 AT2G37130
## 15 4.451567e-196   2.459206 0.827 0.328 1.061610e-191       5 AT5G42600
## 16 2.080539e-252   3.791566 0.733 0.038 4.961670e-248       6 AT1G53830
## 17 3.200221e-230   4.671823 0.600 0.026 7.631887e-226       6 AT4G33790
## 18 5.994578e-169   4.446089 0.658 0.051 1.429587e-164       6 AT4G15290
## 19  7.799618e-93   2.334195 0.620 0.121  1.860053e-88       7 AT1G28290
## 20  1.277339e-30   1.648956 0.647 0.327  3.046197e-26       7 AT1G29418
## 21  1.228976e-15   1.893333 0.599 0.408  2.930861e-11       7 AT1G66280
## 22 8.331900e-115   2.330073 0.541 0.072 1.986992e-110       8 AT5G44530
## 23  2.777944e-55   2.329416 0.519 0.132  6.624841e-51       8 AT1G62480
## 24  1.646121e-38   1.748039 0.616 0.250  3.925669e-34       8 AT4G17390
## 25 6.391708e-175   2.579441 0.547 0.058 1.524295e-170       9 AT1G09200
## 26 2.550174e-171   2.500638 0.773 0.143 6.081655e-167       9 AT1G56110
## 27 3.131177e-136   2.873593 0.883 0.311 7.467230e-132       9 AT1G29418
## 28  0.000000e+00   4.604532 0.941 0.027  0.000000e+00      10 AT4G23800
## 29 1.244835e-232   4.449378 0.542 0.020 2.968683e-228      10 AT4G33270
## 30 6.606529e-183   4.896985 0.551 0.029 1.575525e-178      10 AT4G33260
## 31 1.435710e-297   2.385858 0.561 0.057 3.423882e-293      11 AT4G15393
## 32 2.925117e-263   2.579904 0.727 0.150 6.975819e-259      11 AT4G02520
## 33 1.708945e-260   2.455158 0.816 0.216 4.075492e-256      11 AT4G30270
## 34 8.658822e-184   3.241874 0.588 0.061 2.064956e-179      12 AT5G49360
## 35 1.947868e-109   2.793009 0.570 0.104 4.645277e-105      12 AT3G23470
## 36  7.482203e-84   2.653856 0.842 0.364  1.784356e-79      12 AT4G30170
## 37  0.000000e+00   3.988574 0.658 0.022  0.000000e+00      13 AT4G11290
## 38 3.143627e-259   3.355122 0.663 0.043 7.496923e-255      13 AT5G24580
## 39 2.772143e-153   3.390540 0.842 0.163 6.611006e-149      13 AT1G05260
## 40 1.651738e-291   5.907585 0.694 0.056 3.939064e-287      14 AT1G45201
## 41 3.694241e-209   5.466461 0.532 0.045 8.810027e-205      14 AT5G37690
## 42 1.391073e-198   5.244063 0.560 0.056 3.317431e-194      14 AT2G23540
## 43  0.000000e+00   4.118525 0.992 0.287  0.000000e+00      15 AT3G32980
## 44  0.000000e+00   3.940808 0.859 0.056  0.000000e+00      15 AT2G28470
## 45  0.000000e+00   3.527007 0.912 0.112  0.000000e+00      15 AT3G22600
## 46  0.000000e+00   5.761647 0.982 0.042  0.000000e+00      16 AT2G36100
## 47  0.000000e+00   5.667331 0.901 0.028  0.000000e+00      16 AT5G65530
## 48  0.000000e+00   5.389908 0.973 0.029  0.000000e+00      16 AT5G42180
## 49 3.959335e-197   3.154383 0.751 0.102 9.442222e-193      17 AT1G75500
## 50 1.905619e-146   2.733056 0.631 0.096 4.544521e-142      17 AT2G02100
## 51 6.921911e-141   2.925161 0.524 0.065 1.650737e-136      17 AT3G53100
## 52  0.000000e+00   3.824625 0.779 0.038  0.000000e+00      18 AT3G62040
## 53  0.000000e+00   3.717356 0.831 0.044  0.000000e+00      18 AT1G31050
## 54 3.359451e-282   4.306949 0.563 0.063 8.011618e-278      18 AT1G32450
## 55  0.000000e+00   5.671221 0.837 0.017  0.000000e+00      19 AT4G00670
## 56  0.000000e+00   5.656709 0.977 0.023  0.000000e+00      19 AT2G16740
## 57 4.953494e-253   5.684909 0.942 0.054 1.181309e-248      19 AT5G09220
## 58 3.674197e-137   4.604391 0.929 0.052 8.762225e-133      20 AT2G31900
## 59 4.528058e-128   4.367504 0.738 0.033 1.079851e-123      20 AT1G12080
## 60 4.792214e-118   4.217155 0.952 0.067 1.142847e-113      20 AT1G77690
## 61 3.591989e-214   3.077428 0.610 0.021 8.566176e-210      21 AT1G47410
## 62 1.747501e-150   3.292408 0.732 0.051 4.167440e-146      21 AT4G32880
## 63 1.691360e-116   3.564900 0.512 0.031 4.033555e-112      21 AT5G19530
Exercise: Find the top 10 marker genes for each supercluster. How well do they overlap with the top 3 for each cluster?
Click for answer
Idents(my_seurat) <- my_seurat$supercluster
all_markers <- FindAllMarkers(my_seurat, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5, verbose = FALSE)
top_10_markers <- as.data.frame(top_n(group_by(all_markers, cluster), n = 10, wt = avg_log2FC))
top_10_markers
##            p_val avg_log2FC pct.1 pct.2     p_val_adj            cluster
## 1   0.000000e+00   4.713490 0.683 0.045  0.000000e+00       Trichoblasts
## 2   0.000000e+00   3.518780 0.644 0.044  0.000000e+00       Trichoblasts
## 3   0.000000e+00   3.392073 0.706 0.044  0.000000e+00       Trichoblasts
## 4  4.696692e-252   3.378538 0.781 0.116 1.120067e-247       Trichoblasts
## 5  3.449318e-248   3.483590 0.712 0.090 8.225933e-244       Trichoblasts
## 6  4.581437e-243   3.890838 0.846 0.158 1.092581e-238       Trichoblasts
## 7  1.735074e-238   3.485318 0.703 0.096 4.137804e-234       Trichoblasts
## 8  5.904813e-227   3.463225 0.761 0.123 1.408180e-222       Trichoblasts
## 9  7.477153e-209   3.630900 0.699 0.112 1.783151e-204       Trichoblasts
## 10 3.987865e-164   3.499831 0.585 0.089 9.510261e-160       Trichoblasts
## 11  0.000000e+00   2.329371 0.673 0.124  0.000000e+00      Atrichoblasts
## 12  0.000000e+00   2.286634 0.774 0.251  0.000000e+00      Atrichoblasts
## 13  0.000000e+00   2.274846 0.704 0.180  0.000000e+00      Atrichoblasts
## 14  0.000000e+00   2.251108 0.814 0.290  0.000000e+00      Atrichoblasts
## 15  0.000000e+00   2.207429 0.614 0.124  0.000000e+00      Atrichoblasts
## 16 1.919420e-282   2.124907 0.750 0.273 4.577434e-278      Atrichoblasts
## 17 2.933857e-270   2.474596 0.704 0.255 6.996661e-266      Atrichoblasts
## 18 8.836606e-268   2.619090 0.676 0.219 2.107354e-263      Atrichoblasts
## 19 6.477013e-249   2.162646 0.632 0.190 1.544638e-244      Atrichoblasts
## 20 1.674176e-240   2.349853 0.680 0.229 3.992575e-236      Atrichoblasts
## 21 1.428019e-211   2.316819 0.609 0.121 3.405540e-207 Meristematic cells
## 22 9.043039e-209   2.077853 1.000 0.977 2.156584e-204 Meristematic cells
## 23 4.040242e-201   2.224239 0.725 0.208 9.635169e-197 Meristematic cells
## 24 4.390477e-178   2.114723 0.711 0.221 1.047041e-173 Meristematic cells
## 25 2.750088e-172   2.637504 0.745 0.290 6.558409e-168 Meristematic cells
## 26 9.064828e-166   1.918239 0.655 0.182 2.161780e-161 Meristematic cells
## 27 4.234139e-162   1.881389 0.616 0.161 1.009757e-157 Meristematic cells
## 28 1.610003e-157   1.953758 0.691 0.223 3.839536e-153 Meristematic cells
## 29 1.588954e-156   1.911681 0.711 0.245 3.789338e-152 Meristematic cells
## 30 1.278088e-131   1.966833 0.591 0.187 3.047985e-127 Meristematic cells
## 31  0.000000e+00   1.996938 0.803 0.200  0.000000e+00     Cortical cells
## 32  0.000000e+00   1.852046 0.592 0.036  0.000000e+00     Cortical cells
## 33 3.268771e-246   1.898365 0.727 0.190 7.795364e-242     Cortical cells
## 34 2.324000e-244   1.962013 0.639 0.138 5.542274e-240     Cortical cells
## 35 2.504481e-241   2.126654 0.973 0.692 5.972686e-237     Cortical cells
## 36 1.599894e-240   2.388098 0.817 0.301 3.815427e-236     Cortical cells
## 37 2.878450e-228   1.829322 0.830 0.304 6.864527e-224     Cortical cells
## 38 1.792085e-199   1.805577 0.749 0.275 4.273765e-195     Cortical cells
## 39 3.937027e-192   2.108964 0.668 0.213 9.389021e-188     Cortical cells
## 40 2.001277e-179   2.182676 0.569 0.150 4.772645e-175     Cortical cells
## 41  0.000000e+00   4.542391 0.911 0.213  0.000000e+00   Endodermal cells
## 42  0.000000e+00   3.846652 0.582 0.085  0.000000e+00   Endodermal cells
## 43  0.000000e+00   3.689689 0.633 0.058  0.000000e+00   Endodermal cells
## 44  0.000000e+00   3.567181 0.525 0.044  0.000000e+00   Endodermal cells
## 45  0.000000e+00   3.482247 0.645 0.080  0.000000e+00   Endodermal cells
## 46  0.000000e+00   3.272092 0.512 0.046  0.000000e+00   Endodermal cells
## 47  0.000000e+00   3.084530 0.512 0.032  0.000000e+00   Endodermal cells
## 48  0.000000e+00   3.083063 0.588 0.086  0.000000e+00   Endodermal cells
## 49  0.000000e+00   2.975978 0.554 0.041  0.000000e+00   Endodermal cells
## 50 1.270629e-280   2.864026 0.671 0.218 3.030195e-276   Endodermal cells
## 51  0.000000e+00   3.418204 0.804 0.082  0.000000e+00        Stele cells
## 52  0.000000e+00   3.254163 0.569 0.053  0.000000e+00        Stele cells
## 53  0.000000e+00   3.057052 0.507 0.037  0.000000e+00        Stele cells
## 54 5.860327e-291   2.675344 0.599 0.108 1.397571e-286        Stele cells
## 55 4.894555e-266   3.329565 0.501 0.075 1.167253e-261        Stele cells
## 56 3.483802e-212   2.189974 0.937 0.656 8.308171e-208        Stele cells
## 57 1.972447e-192   2.131032 0.598 0.183 4.703893e-188        Stele cells
## 58 5.662993e-179   2.879122 0.535 0.163 1.350511e-174        Stele cells
## 59 3.720749e-147   2.524200 0.645 0.308 8.873243e-143        Stele cells
## 60 1.243051e-104   1.938841 0.504 0.224 2.964428e-100        Stele cells
##         gene
## 1  AT1G12560
## 2  AT5G22410
## 3  AT1G48930
## 4  AT5G44020
## 5  AT3G54590
## 6  AT3G28550
## 7  AT4G25820
## 8  AT3G54580
## 9  AT3G62680
## 10 AT2G24980
## 11 AT3G12977
## 12 AT1G51830
## 13 AT3G09220
## 14 AT4G37070
## 15 AT1G73300
## 16 AT3G29250
## 17 AT1G66800
## 18 AT2G37130
## 19 AT1G14960
## 20 AT4G15230
## 21 AT1G56110
## 22 AT3G06355
## 23 AT4G17390
## 24 AT3G60245
## 25 AT1G29418
## 26 AT2G18020
## 27 AT1G52300
## 28 AT2G34480
## 29 AT4G16720
## 30 AT1G48920
## 31 AT5G42250
## 32 AT3G46500
## 33 AT5G25460
## 34 AT3G51860
## 35 AT1G21310
## 36 AT4G30170
## 37 AT5G42590
## 38 AT3G26470
## 39 AT4G30270
## 40 AT4G02520
## 41 AT3G32980
## 42 AT1G04220
## 43 AT1G05260
## 44 AT2G48130
## 45 AT3G22600
## 46 AT2G28470
## 47 AT1G08670
## 48 AT1G61590
## 49 AT5G41040
## 50 AT2G16750
## 51 AT4G34600
## 52 AT2G40900
## 53 AT1G31050
## 54 AT2G43910
## 55 AT4G01450
## 56 AT2G21660
## 57 AT3G14990
## 58 AT5G23050
## 59 AT5G59780
## 60 AT1G69850
matching_markers <- intersect(top_3_markers$gene, top_10_markers$gene)
matching_markers
##  [1] "AT3G54590" "AT2G24980" "AT3G54580" "AT1G12560" "AT4G25820" "AT1G66800"
##  [7] "AT2G37130" "AT1G29418" "AT4G17390" "AT1G56110" "AT4G02520" "AT4G30270"
## [13] "AT4G30170" "AT1G05260" "AT3G32980" "AT2G28470" "AT3G22600" "AT1G31050"
18 marker genes appear in both sets, out of ~60 in each set.

7.9 Clustering

We can use SingleR to predict cell types for a Seurat object, as we did for a SingleCellExperiment. However, instead of passing a Seurat object as the test or ref argument, we must pass its normalized counts matrix (data).

library(SingleR)
another_seurat <- readRDS("data/another_seurat.rds")
predicted_clusters <- SingleR(test = another_seurat[["RNA"]]@data, ref = my_seurat[["RNA"]]@data, labels = my_seurat$cluster_id)

Seurat can also assign clusters for you. Its FindClusters() function does not use k-means as in the previous chapter, but a shared nearest neighbor clustering algorithm. We need not specify a number of desired clusters.

my_seurat <- FindNeighbors(my_seurat, dims = 1:10, verbose = FALSE)
my_seurat <- FindClusters(my_seurat, resolution = 0.5, verbose = FALSE)
table(my_seurat$seurat_clusters)
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
## 650 540 521 511 483 467 418 347 248 215 192 134 132 116  91  86
pdf("pdf/umap-by-computed-cluster-seurat.pdf")
DimPlot(my_seurat, label.size = 4, repel = TRUE, label = TRUE)
dev.off()
UMAP plot by computed cluster

Figure 7.20: UMAP plot by computed cluster

This method groups our cells into 16 clusters instead of 21. Note that they start from 0 and go in the seurat_clusters column of the metadata.

7.10 SCTransform normalization

The SCTransform() function (in package Seurat) provides an alternative to log-normalization, based on regularized negative binomial regression. It is designed to better capture the variation due to biology over spurious variation.

my_seurat <- SCTransform(my_seurat, ncells = ncol(my_seurat), verbose = FALSE)
my_seurat
## An object of class Seurat 
## 45005 features across 5151 samples within 2 assays 
## Active assay: SCT (21157 features, 3000 variable features)
##  3 layers present: counts, data, scale.data
##  1 other assay present: RNA
##  3 dimensional reductions calculated: pca, tsne, umap

We have created a new assay called “SCT” and made it the active (default) one. Now we can recompute the reduced dimensions based on SCTransform.

my_seurat <- RunPCA(my_seurat, verbose = FALSE)
my_seurat <- RunUMAP(my_seurat, dims = 1:20, verbose = FALSE)
pdf("pdf/umap-by-cluster-sct-seurat.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-sct-seurat.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
UMAP plot by cluster, using SCTransform normalization

Figure 7.21: UMAP plot by cluster, using SCTransform normalization

UMAP plot by supercluster, using SCTransform normalization

Figure 7.22: UMAP plot by supercluster, using SCTransform normalization

Exercise: The authors of SCTransform hold that after normalizing this way, we should use more principal components when running UMAP. Try it with 40 PCs (dimensions) instead of 20. What do you think?
Click for answer

Rerun UMAP with 40 PCs, then make the dimensionality reduction charts.

my_seurat <- RunUMAP(my_seurat, dims = 1:40, verbose = FALSE)
pdf("pdf/umap-by-cluster-sct-seurat-40-pcs.pdf")
DimPlot(my_seurat, cols = cluster_colors, group.by = "cluster_id", label.size = 4, repel = TRUE, label = TRUE)
dev.off()
pdf("pdf/umap-by-supercluster-sct-seurat-40-pcs.pdf")
DimPlot(my_seurat, cols = supercluster_colors, group.by = "supercluster", label.size = 4, repel = TRUE, label = TRUE) + NoLegend()
dev.off()
UMAP plot by cluster, using SCTransform normalization with 40 PCs

Figure 7.23: UMAP plot by cluster, using SCTransform normalization with 40 PCs

UMAP plot by supercluster, using SCTransform normalization with 40 PCs

Figure 7.24: UMAP plot by supercluster, using SCTransform normalization with 40 PCs

Note that the PCA and UMAP dimensionality reductions now live in the “SCT” assay instead of the original “RNA” assay, leaving only t-SNE in “RNA”. If you wanted to recreate our initial PCA or UMAP plots, you would have to change the active assay back to “RNA” and recompute them there (or reload my_seurat from a previously saved state).

my_seurat@reductions
## $pca
## A dimensional reduction object with key PC_ 
##  Number of dimensions: 50 
##  Number of cells: 5151 
##  Projected dimensional reduction calculated:  FALSE 
##  Jackstraw run: FALSE 
##  Computed using assay: SCT 
## 
## $tsne
## A dimensional reduction object with key tSNE_ 
##  Number of dimensions: 2 
##  Number of cells: 5151 
##  Projected dimensional reduction calculated:  FALSE 
##  Jackstraw run: FALSE 
##  Computed using assay: RNA 
## 
## $umap
## A dimensional reduction object with key UMAP_ 
##  Number of dimensions: 2 
##  Number of cells: 5151 
##  Projected dimensional reduction calculated:  FALSE 
##  Jackstraw run: FALSE 
##  Computed using assay: SCT

7.11 Conversion between Seurat and SingleCellExperiment

After trying both SingleCellExperiment and Seurat, you may develop a preference for one of them and end up using it in your work. In case a collaborator shares a dataset in the other format, you need to be able to convert it to yours. You may also want to use functionality that exists in only one format.

The as.SingleCellExperiment() function (from package Seurat) provides a quick way to convert an existing Seurat object to SingleCellExperiment. (We will of course need to reload the SingleCellExperiment package.)

library(SingleCellExperiment)
fake_SCE_2 <- as.SingleCellExperiment(fake_seurat)
counts(fake_SCE_2)
## 12 x 8 sparse Matrix of class "dgCMatrix"
##         Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1       .      .      2      .      .      .      1      2
## Gene-2       1      .      1      .     13      2      1      .
## Gene-3       .      2      2      .      .      .      2      .
## Gene-4       .      .      .      3      3      .     13      .
## Gene-5       .      .      .      1      2      .      2      .
## Gene-6       2      1      2      1      .      .      .      1
## Gene-7       .      2      1      .      1      3      1      5
## Gene-8       .      .      6      2      .      1      2      3
## Gene-9       2      9      .      3      2      .      2      4
## Gene-10      1      1      .      .      1      .      .      2
## Gene-11      2      9      3      8      .     11      .      1
## Gene-12      4      .      1      2     16      .      .      3
logcounts(fake_SCE_2)
## 12 x 8 sparse Matrix of class "dgCMatrix"
##           Cell-1   Cell-2   Cell-3   Cell-4   Cell-5   Cell-6   Cell-7   Cell-8
## Gene-1  .        .        7.014015 .        .        .        6.034684 6.860015
## Gene-2  6.726633 .        6.321767 .        8.137996 7.071124 6.034684 .       
## Gene-3  .        6.726633 7.014015 .        .        .        6.726633 .       
## Gene-4  .        .        .        7.313887 6.672632 .        8.597420 .       
## Gene-5  .        .        .        6.216606 6.267800 .        6.726633 .       
## Gene-6  7.419181 6.034684 7.014015 6.216606 .        .        .        6.167916
## Gene-7  .        6.726633 6.321767 .        5.576547 7.476306 6.034684 7.775676
## Gene-8  .        .        8.112028 6.908755 .        6.378826 6.726633 7.265130
## Gene-9  7.419181 8.229778 .        7.313887 6.267800 .        6.726633 7.552637
## Gene-10 6.726633 6.034684 .        .        5.576547 .        .        6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 .        8.775177 .        6.167916
## Gene-12 8.112028 .        6.321767 6.908755 8.345580 .        .        7.265130
colData(fake_SCE_2)
## DataFrame with 8 rows and 6 columns
##         orig.ident nCount_RNA nFeature_RNA nCount_CPM nFeature_CPM       ident
##           <factor>  <numeric>    <integer>  <numeric>    <integer>    <factor>
## Cell-1 Fake Seurat         12            6         12            6 Fake Seurat
## Cell-2 Fake Seurat         24            6         24            6 Fake Seurat
## Cell-3 Fake Seurat         18            8         18            8 Fake Seurat
## Cell-4 Fake Seurat         20            7         20            7 Fake Seurat
## Cell-5 Fake Seurat         38            7         38            7 Fake Seurat
## Cell-6 Fake Seurat         17            4         17            4 Fake Seurat
## Cell-7 Fake Seurat         24            8         24            8 Fake Seurat
## Cell-8 Fake Seurat         21            8         21            8 Fake Seurat

Similarly, the as.Seurat() function (from package SeuratObject) converts a SingleCellExperiment object to Seurat.

fake_seurat_2 <- as.Seurat(fake_SCE_2)
fake_seurat_2[["RNA"]]@counts
## 12 x 8 sparse Matrix of class "dgCMatrix"
##         Cell-1 Cell-2 Cell-3 Cell-4 Cell-5 Cell-6 Cell-7 Cell-8
## Gene-1       .      .      2      .      .      .      1      2
## Gene-2       1      .      1      .     13      2      1      .
## Gene-3       .      2      2      .      .      .      2      .
## Gene-4       .      .      .      3      3      .     13      .
## Gene-5       .      .      .      1      2      .      2      .
## Gene-6       2      1      2      1      .      .      .      1
## Gene-7       .      2      1      .      1      3      1      5
## Gene-8       .      .      6      2      .      1      2      3
## Gene-9       2      9      .      3      2      .      2      4
## Gene-10      1      1      .      .      1      .      .      2
## Gene-11      2      9      3      8      .     11      .      1
## Gene-12      4      .      1      2     16      .      .      3
fake_seurat_2[["RNA"]]@data
## 12 x 8 sparse Matrix of class "dgCMatrix"
##           Cell-1   Cell-2   Cell-3   Cell-4   Cell-5   Cell-6   Cell-7   Cell-8
## Gene-1  .        .        7.014015 .        .        .        6.034684 6.860015
## Gene-2  6.726633 .        6.321767 .        8.137996 7.071124 6.034684 .       
## Gene-3  .        6.726633 7.014015 .        .        .        6.726633 .       
## Gene-4  .        .        .        7.313887 6.672632 .        8.597420 .       
## Gene-5  .        .        .        6.216606 6.267800 .        6.726633 .       
## Gene-6  7.419181 6.034684 7.014015 6.216606 .        .        .        6.167916
## Gene-7  .        6.726633 6.321767 .        5.576547 7.476306 6.034684 7.775676
## Gene-8  .        .        8.112028 6.908755 .        6.378826 6.726633 7.265130
## Gene-9  7.419181 8.229778 .        7.313887 6.267800 .        6.726633 7.552637
## Gene-10 6.726633 6.034684 .        .        5.576547 .        .        6.860015
## Gene-11 7.419181 8.229778 7.419181 8.294300 .        8.775177 .        6.167916
## Gene-12 8.112028 .        6.321767 6.908755 8.345580 .        .        7.265130
fake_seurat_2@meta.data
##         orig.ident nCount_RNA nFeature_RNA nCount_CPM nFeature_CPM       ident
## Cell-1 Fake Seurat         12            6         12            6 Fake Seurat
## Cell-2 Fake Seurat         24            6         24            6 Fake Seurat
## Cell-3 Fake Seurat         18            8         18            8 Fake Seurat
## Cell-4 Fake Seurat         20            7         20            7 Fake Seurat
## Cell-5 Fake Seurat         38            7         38            7 Fake Seurat
## Cell-6 Fake Seurat         17            4         17            4 Fake Seurat
## Cell-7 Fake Seurat         24            8         24            8 Fake Seurat
## Cell-8 Fake Seurat         21            8         21            8 Fake Seurat

Be sure to examine the converted object to make sure that all of the data you expect actually exist.

7.12 Further resources

Seurat

Seurat cheat sheet

SCTransform