Clustering methods are heavily employed in single-cell RNA analysis. In this post, I would like to learn more about the underlining processes before the deconvolution step such as K-nearest neighbor an hierarchical clustering.
1. Build matrix
It is probably most important step in the whole process next to correct normalisation. As discussed in earlier post, the option of cell-cell dissimilarity matrices could greatly influence the clustering results. As I am following this paper (Bautushansky et al. 2016), I focus on Pearson correlation matrice and their application on various biological problems, e.g. gene coexpression network analysis (WGCNA). The primary aim for WGCNA was to find gene that has similar dynamics using network-based approach.
In addition to building the correlation matrix, the paper also suggest building a p-matrix (p-value for each pairwise correlation calculation). Then we will decide whether the correlation is significant based on this adjusted correlation. In my toy example from the pbmc data, we built a pairwise co-expression correlation for the 501 highly variable genes identified.
I then use corr.test() to produce the correlation matrix, then use r.test() to determine which t value reflect the correlation of 0.5, given my data size is 2700. Then we build the adjacency matrix based on this t-value threshold. In my case, I set all correlation with a t-value > 29.99 to 1 (buiding an edge) and the rest as 0 (without an edge).
2. Construct Network
Once I have clean up the adjacency matrix (only showing genes that had at least 2 neighbors), I used igraph the plot the network.
The analysis resulted in two networks. One of them is highly connected (each gene had at least 5 neighbor) while the other is less connected with BIRC5 being the central hub that connecting two smaller modules.
Is the network biologically valid? It is hard to say as I don't know what these genes are really. I will have to look harder into it to decide if there are functional relationship there. Now at least I've got the basics for network construction. I can move on to learn about hierarchical clustering next.
Useful readings:
Comentários