I have to say that this is the coolest scRNA resource so far in 2019! Dynverse had a suite of tools that helps researcher to select, run and evaluate all these TI methods in R. I admire the ambition of the authors to provide such a user friendly platform to test and use all these 59 differential trajectory inference tool with minimal input and uniform output.
And the other cool stuff is the Docker-R interface they have constructed. So that we can call up these different methods without installing them individually along with different versions of dependencies. That saves a lot of time and frustrations in software installation! Through this project, I had a chance to truely appreciate the power of Docker, that can build light-weight virtual environment to accommodate different software.
So as I have mentioned in the previous post, I had explored a few options to clean up the fibroblast reprogrammed cell data. This time, we filter the data according to the methods they mentioned in the dynverse paper (select features with more than 5% expressing cells (40 cells), at least 3 at the dispersion level, and mean expression at least 0.02). There were 2000 Highly variable genes.
‘’’
library(dyno)
data("fibroblast_reprogramming_treutlein")
fib<-CreateSeuratObject(counts=t(fibroblast_reprogramming_treutlein$counts), assay="RNA",
min.features = 20)
fib<-NormalizeData(fib, normalization.method = "LogNormalize")
fib<-FindVariableFeatures(object = fib, assay = 'RNA',
dispersion.cutoff=c(3, Inf),
mean.cutoff = c(0.02, 10),
selection.method = "vst",
binning.method = "equal_frequency")
### 2000 highly variable features ###
fib<-ScaleData(fib)
fib<-RunPCA(fib, features = fib@assays$RNA@var.features, dims = 50)
PCAPlot(fib, group.by="orig.ident")
fib<-JackStraw(fib, reduction = "pca")
fib<-ScoreJackStraw(fib, dim=1:20)
JackStrawPlot(fib, dim=1:20)
PCASigGenes(fib, pcs.use=1:1)
fib<-FindNeighbors(fib, reduction = "pca", dims = 1:50)
fib<-FindClusters(fib, resolution=0.8)
fib<-RunUMAP(fib, dims = 1:10, reduction="pca")
UMAPPlot(fib, group.by="orig.ident")+ggtitle("fibroblast_reprogramming_treutlein")
‘’’
I used the scaled data of these 2000 genes from Seurat as my input for dyno. I also compared the difference of the TI inference between using the filtered dataset vs using all genes. Note the UMAP from Seurat had a very different clustering than the TI methods.
I tested the dataset using the simplist Comp1 (PC1 from PCA), FateID, and DPT model.
Comp1 did not need any prior information and of course performed most poorly, whereas FateID and DPT has prior information of start and end cell population, which sounds to be quite like cheating. I liked that I can colour the cells with the cell info and see if the inferred trajectory really makes sense.
‘’’
library(tidyverse)
library(dynwrap)
library(dyno)
### be careful of matrix layout, row=cells, col=genes ###
counts_df<-t(as.matrix(fib@assays$RNA@counts[fib@assays$RNA@var.features, ]))
exp_df<-t(as.matrix(fib@assays$RNA@scale.data[fib@assays$RNA@var.features, ]))
dataset <- wrap_expression(id="fib",
counts = counts_df,
expression = exp_df
)
dataset<-add_grouping(dataset, data.frame(group_id=fib@meta.data$orig.ident, cell_id=rownames(fib@meta.data)))
dataset <- dataset %>% add_prior_information(groups_id= data.frame(group_id=fib@meta.data$orig.ident, cell_id=rownames(fib@meta.data)),
start_id = rownames(fib@meta.data)[fib@meta.data$orig.ident==c("MEF")],
end_id = rownames(fib@meta.data)[fib@meta.data$orig.ident%in%c("Neuron", "Myocyte")])
### model that did not require piror ###
model <- infer_trajectory(dataset, ti_comp1())
plot_dimred(model, color_cells = c("grouping"), grouping=grouping_id)+labs(title="Comp1")
grouping_id<-data.frame(group_id=fib@meta.data$orig.ident, cell_id=rownames(fib@meta.data))
fateID<-infer_trajectory(dataset, ti_fateid())
plot_dimred(fateID, color_cells = c("grouping"), grouping=grouping_id)+labs(title="FateID")
##Trying to compute distinct() for variables not found in the data:
## - `divergence_id`, `milestone_id`
paga<-infer_trajectory(dataset, ti_paga())
###PAGA did not compute any trajectory ###
dpt<-infer_trajectory(dataset, ti_dpt())
plot_dimred(dpt, color_cells = c("grouping"), grouping=grouping_id)+labs(title="DPT")
plot_heatmap(dpt, expression_source=exp_df)
‘’’
We can clearly see that the local grouping is more compact with the filtered data, but the overall trajectories are similar. Dyno has a set of function to identify important genes at a branch, a differentiation milestone, and for the overall structure.
‘’’
overall_feature_importances <- dynfeature::calculate_overall_feature_importance(dpt, expression_source=exp_df)
features <- overall_feature_importances %>%
top_n(40, importance) %>%
pull(feature_id)
‘’’
Although the top features did not cover all the milestone genes in the original paper Treutlein B. paper. We found nonetheless a few of them including Hmga2 and Ascl1. I have a feeling that if I could have get the normalised expression dataset from f-scLVM, maybe I can derive a more sensible list of significant genes.
Anyhow, I have to say that the dyno package is just really really awesome and I look forward to see the platform grow and expand. It would be cool that it will finally allow users to input their own parameters to make the package even more flexible.
Comments