Code by TomMakesThings

About

This is part of a research project "Clustering and Topological Data Analysis of Single-Cell RNA Sequencing Data". For more information visit: https://tommakesthings.github.io/Clustering-and-TDA-of-scRNA-seq-Data/.

In this notebook, a new simulated dataset of scRNA-seq gene counts and group labels is created using the Splat simulator which is part of the R package Splatter. The dataset is seeded by mimicking gene expression from a dataset of human lung adenocarcinoma cells sc_10x by Tian et al. It is then saved as CSV files with Python.

Imports

Citation for Splatter and the original paper.

Get Dataset to Replicate

Use R to download the dataset sc_10x from Luyi Tian's GitHub and open as dataset.

Get the gene counts from dataset and view an extract.

Create Simulated Datasets

Splat Simulation

Simulate a single population of cells that replicates the variation of sc_10x.

Simulate 4 groups of cells seeded by sc_10x.

Get the group labels of each cell.

Compare Simulations

Plot a graph comparing the simulated data to the original dataset based upon the distribution of mean expression. This demonstrates how well the Splatter was able to immitate the data.

Save Data to File

Using R, save the simulated group data and the matching labels as CSV files.

Now download the files using Python.

View the simulated group data and the labels of each cell as DataFrames.