DigitalDLSorter
object from single-cell RNA-seq
dataR/loadData.R
loadSCProfiles.Rd
Create a DigitalDLSorter
object from single-cell RNA-seq
data from files (formats allowed: tsv, tsv.gz, mtx (sparse matrix) and hdf5)
or a SingleCellExperiment
object. The data will be
stored in single.cell.real
slot. The data provided should consist of
three pieces of information:
Single-cell counts: genes as rows and cells as columns.
Cells metadata: annotations (columns) for each cell (rows).
Genes metadata: annotations (columns) for each gene (rows).
If the data is provided from files, single.cell.real
argument must be a vector of three elements ordered so that the first file
corresponds to the count matrix, the second to the cells metadata and the
last to the genes metadata. On the other hand, if the data is provided as a
SingleCellExperiment
oject, it must contain single-cell
counts in the assay
slot, cells metadata in the colData
slot
and genes metadata in the rowData
. The data must be provided without
any transformation (e.g. log-transformation) and raw counts are preferred.
loadSCProfiles(
single.cell.data,
cell.ID.column,
gene.ID.column,
name.dataset.h5,
min.counts = 0,
min.cells = 0,
file.backend = NULL,
name.dataset.backend = NULL,
compression.level = NULL,
chunk.dims = NULL,
block.processing = FALSE,
verbose = TRUE,
project = "DigitalDLSorterProject"
)
If data is provided from files,
single.cell.real
must be a vector of three elements: single-cell
counts, cells metadata and genes metadata. If data is provided from a
SingleCellExperiment
object, single-cell counts must
be present in the assay
slot, cells metadata in the colData
slot and genes metadata in the rowData
slot.
Name or number of the column in the cells metadata corresponding to cell names in expression matrix.
Name or number of the column in the genes metadata corresponding to the names used for features/genes.
Name of the data set if HDF5 file is provided.
Minimum gene counts to filter (0 by default).
Minimum of cells with more than min.counts
(0 by
default).
Valid file path where to store the loaded data as HDF5
file. If provided, data is stored in HDF5 files as back-end using
DelayedArray and HDF5Array packages instead of being loaded
into RAM. This is suitable for situations where you have large amounts of
data that cannot be stored in memory. Note that operations on these data
will be performed by blocks (i.e subsets of determined size), which may
result in longer execution times. NULL
by default.
Name of the dataset of the HDF5 file to be used.
Note that it cannot exist. If NULL
(by default), a random dataset
name will be used.
The compression level used if file.backend
is
provided. It is an integer value between 0 (no compression) and 9 (highest
and slowest compression). See
?getHDF5DumpCompressionLevel
from the
HDF5Array package for more information.
Specifies dimensions that HDF5 chunk will have. If
NULL
, the default value is a vector of two items: the number of
genes considered by DigitalDLSorter
object during the
simulation, and only one sample in order to increase read times in the
following steps. A larger number of columns written in each chunk may lead
to longer read times.
Boolean indicating whether data should be treated as
blocks (only if data is provided as HDF5 file). FALSE
by default.
Note that using this functionality is suitable for cases where is not
possible to load the data into RAM and therefore execution times will be
longer.
Show informative messages during the execution (TRUE
by
default).
Name of the project for DigitalDLSorter
object.
A DigitalDLSorter
object with the single-cell
RNA-seq data provided loaded into the single.cell.real
slot as a
SingleCellExperiment
object.
This data can be used to simulate new single-cell profiles using the
ZINB-WaVE framework with the estimateZinbwaveParams
function.
In this way, it is possible to increase the signal of cell types that are
underrepresented in the original dataset. If this step is not necessary,
these profiles will be used directly to simulate pseudo-bulk RNA-seq samples
with known cell composition.
set.seed(123) # reproducibility
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(100, lambda = 5), nrow = 40, ncol = 30,
dimnames = list(paste0("Gene", seq(40)), paste0("RHC", seq(30)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(30)),
Cell_Type = sample(x = paste0("CellType", seq(4)), size = 30,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(40))
)
)
DDLS <- loadSCProfiles(
single.cell.data = sce,
cell.ID.column = "Cell_ID",
gene.ID.column = "Gene_ID",
min.cells = 0,
min.counts = 0,
project = "Simul_example"
)