This article introduces the available options in
scMultiSim
.
The following flow chart shows the workflow of
scMultiSim
and each parameter’s role in the simulation.
Options: General
rand.seed
integer (default:
0
)
scMultiSim should produce the same result if all other parameters are the same.
Options: Genes
GRN
A data frame with 3 columns as below. Supply
NA
to disable the GRN effect. (required)
Column | Value |
---|---|
1 | target gene ID: integer or character ; |
2 | regulator gene ID: integer or character ; |
3 | effect: number . |
If num.genes
presents, the gene IDs should not exceed
this number. The gene IDs should start from 1 and should not ship any
intermidiate numbers.
Two sample datasets GRN_params_100
and
GRN_params_1000
from Dibaeinia, P., &
Sinha, S. (2020) are provided for testing and inspection.
num.genes
integer (default:
NULL
)
If a GRN is supplied, override the total number of genes. It should
be larger than the largest gene ID in the GRN. Otherwise, the number of
genes will be determined by N_genes * (1 + r_u)
, where
r_u
is unregulated.gene.ratio
.
If GRN is disabled, this option specifies the total number of genes.
unregulated.gene.ratio
number > 0 (default:
0.1
)
Ratio of unreulated to regulated genes. When a GRN is supplied with
N
genes, scMultiSim will simulate N * r_u
extra (unregulated) genes.
giv.mean, giv.sd, giv.prob
(default:
0, 1, 0.3
)
The parameters used to sample the GIV matrix. With probability
giv.prob
, the value is sampled from
N(giv.mean
, giv.sd
). Otherwise the value is
0.
dynamic.GRN
list (default:
NULL
)
Enables dynamic (cell-specific GRN). Run
scmultisim_help("dynamic.GRN")
to see more
explaination.
hge.prop, hge.mean, hge.sd
(default:
0, 5, 1
)
Treat some random genes as highly-expressed (house-keeping) genes. A
proportion of hge.prop
genes will have expression scaled by
a multiplier sampled from N(hge.mean
,
hge.sd
).
Options: Cells
tree
phylo (default:
Phyla5()
)
The cell differential tree, which will be used to generate cell
trajectories (if discrete.cif = T
) or clusters (if
discrete.cif = F
). In discrete population mode, only the
tree tips will be used. Three demo trees, Phyla5()
,
Phyla3()
and Phyla1()
, are provided.
discrete.cif
logical (default:
FALSE
)
Whether the cell population is discrete (continuous otherwise).
Options: CIF
num.cifs
integer (default:
50
)
Total number of differential and non-differential CIFs, which can be viewed as latent representation of cells.
Options: Simulation - ATAC
atac.effect
number ∈ [0, 1] (default:
0.5
)
The influence of chromatin accessability data on gene expression.
region.distrib
vector of length 3, should sum to 1 (default:
c(0.1, 0.5, 0.4)
)
The probability that a gene is regulated by 0, 1, 2 consecutive regions, respectively.
Customization
mod.cif.giv
function (default:
NULL
)
Modify the generated CIF and GIV. The function takes four arguments: the kinetic parameter index (1=kon, 2=koff, 3=s), the current CIF matrix, the GIV matrix, and the cell metadata dataframe. It should return a list of two elements: the modified CIF matrix and the modified GIV matrix.
sim_true_counts(list(
# ...
mod.cif.giv = function(i, cif, giv, meta) {
# modify cif and giv
return(list(cif, giv))
}
))
ext.cif.giv
function (default:
NULL
)
Add extra CIF and GIV. The function takes one argument, the kinetic
parameter index (1=kon, 2=koff, 3=s). It should return a list of two
elements: the extra CIF matrix (n_extra_cif x n_cells)
and
the GIV matrix (n_genes x n_extra_cif)
. Return
NULL
for no extra CIF and GIV.”
sim_true_counts(list(
# ...
ext.cif.giv = function(i) {
# add extra cif and giv
return(list(extra_cif, extra_giv))
}
))
Optins: Simulation
vary
character (default:
"s"
)
Can be
"all", "kon", "koff", "s", "except_kon", "except_koff", "except_s"
.
It specifies which kinetic parameters to vary across cells, i.e. which
kinetic parameters have differential CIFs sampled from the tree.
bimod
number (default:
0
)
A number between 0 and 1, which adjust the bimodality of the gene expression distribution.
Options: Simulation - RNA Velocity
do.velocity
logical (default:
FALSE
)
When set to TRUE
, simulate using the full kinetic model
and generate RNA velocity data. Otherwise, the Beta-Poission model will
be used.
Options: Simulation - Spatial Cell-Cell Interaction
The simulation of cell-cell interaction can be enabled by passing a
list
as the cci
option. In this list, you can
specify the following options:
layout
“enhanced”, “layers”, “islands”, or a function (default:
"enhanced"
)
Specify the layout of the cell types. scMultiSim provides three
built-in layouts: "enhanced"
, "layers"
, and
"islands"
.
If set to "islands"
, you can specify which cell types
are the islands, e.g. "islands:1,2"
.
If using a custom function, it should take two arguments:
function (grid_size, cell_types)
- grid_size: (integer) The
width and height of the grid. - cell_types: (integer vector) Each cell’s
cell type.
It should return a n_cell x 2
matrix, where each row is
the x and y coordinates of a cell.
step.size
number
If using continuous population, use this step size to further divide
the cell types on the tree. For example, if the tree only has one branch
a -> b
and the branch length is 1 while the step size is
0.34, there will be totally three cell types: a_b_1, a_b_2, a_b_3.
params
data.frame
The spatial effect between a ligand and a receptor gene. It should be
a data frame similar to the GRN parameter, i.e. with columns
receptor
, ligand
, and effect
.
Example:
cci = list(
params = data.frame(
target = c(2, 6, 10, 8, 20, 30),
regulator = c(101, 102, 103, 104, 105, 106),
effect = 20
)
)
cell.type.interaction
“random” or a matrix
Specify which cell types can communicate using which ligand-receptor
pair. It should be a 3d
n_cell_types x n_cell_types x n_ligand_pair
numeric matrix.
The value at (i, j, k) is 1 if there exist CCI of LR-pair k between cell
type i and cell type j.
This matrix can be generated using the
cci_cell_type_params()
function. It can fill the matrix
randomly, or return an empty matrix for you to fill manually. If you
want to fill it randomly, you can simply supply "random"
for this option.
cell.type.lr.pairs
integer vector
If cell.type.interaction
is "random"
,
specify how many LR pairs should be enabled between each cell type pair.
Should be a range, e.g. 4:6
. The actual number of LR pairs
will be uniformly sampled from this range.
max.neighbors
integer
The number of interacting cells for each cell. If the cell’s available neighbor count is not large enough, the actual interacting cells may be smaller than this value.
radius
number (default:
1
), or “gaussian:sigma”
Which cells should be considered as neighbors. The interacting cells are those within these neighbors.
When it is a number, it controls the maximum distance between two cells for them to interact.
When it is a string, it should be in the format
gaussian:sigma
, for example, gaussian:1.2
. In
this case, the probability of two cells interacting is proportional to
the distance with a Gaussian kernel applied.
start.layer
integer
From which layer (time step) the simulation should start. If set to
1, the simulation will start with one cell in the grid and add one more
cell in each following layer. If set to num_cells
, the
simulation will start from all cells available in the grid and only
continues for a few static layers, which will greatly speed up the
simulation.