Create a SpatialDDLS
object by providing single-cell
RNA-seq data. Additionally, spatial transcriptomics data contained in
SpatialDDLS
objects can also be provided. It is
recommended to provide both types of data to only use genes shared between
both experiments.
Usage
createSpatialDDLSobject(
sc.data,
sc.cell.ID.column,
sc.cell.type.column,
sc.gene.ID.column,
st.data,
st.spot.ID.column,
st.gene.ID.column,
filter.mt.genes = "^mt-",
sc.filt.genes.cluster = TRUE,
sc.min.mean.counts = 1,
sc.n.genes.per.cluster = 300,
top.n.genes = 2000,
sc.log.FC = TRUE,
sc.min.counts = 1,
sc.min.cells = 1,
st.min.counts = 1,
st.min.spots = 1,
st.n.slides = 3,
shared.genes = TRUE,
sc.name.dataset.h5 = NULL,
sc.file.backend = NULL,
sc.name.dataset.backend = NULL,
sc.compression.level = NULL,
sc.chunk.dims = NULL,
sc.block.processing = FALSE,
verbose = TRUE,
project = "SpatialDDLS-Proj"
)
Arguments
- sc.data
Single-cell RNA-seq profiles to be used as reference. If data are provided from files,
single.cell.real
must be a vector of three elements: single-cell counts, cells metadata and genes metadata. On the other hand, If data are provided from aSingleCellExperiment
object, single-cell counts must be present in theassay
slot, cells metadata in thecolData
slot, and genes metadata in therowData
slot.- sc.cell.ID.column
Name or number of the column in cells metadata corresponding to cell names in expression matrix (single-cell RNA-seq data).
- sc.cell.type.column
Name or column number corresponding to cell types in cells metadata.
- sc.gene.ID.column
Name or number of the column in genes metadata corresponding to the names used for features/genes (single-cell RNA-seq data).
- st.data
Spatial transcriptomics datasets to be deconvoluted. It can be a single
SpatialExperiment
object or a list of them.- st.spot.ID.column
Name or number of the column in spots metadata corresponding to spot names in expression matrix (spatial transcriptomics data).
- st.gene.ID.column
Name or number of the column in the genes metadata corresponding to the names used for features/genes (spatial transcriptomics data).
- filter.mt.genes
Regular expression matching mitochondrial genes to be ruled out (
^mt-
by default). IfNULL
, no filtering is performed.- sc.filt.genes.cluster
Whether to filter single-cell RNA-seq genes according to a minimum threshold of non-zero average counts per cell type (
sc.min.mean.counts
).TRUE
by default.- sc.min.mean.counts
Minimum non-zero average counts per cluster to filter genes. 1 by default.
- sc.n.genes.per.cluster
Top n genes with the highest logFC per cluster (300 by default). See Details section for more details.
- top.n.genes
Maximum number of genes used for downstream steps (2000 by default). In case the number of genes after filtering is greater than
top.n.genes
, these genes will be set according to variability across the whole single-cell dataset.- sc.log.FC
Whether to filter genes with a logFC less than 0.5 when
sc.filt.genes.cluster = TRUE
(TRUE
by default).- sc.min.counts
Minimum gene counts to filter (1 by default; single-cell RNA-seq data).
- sc.min.cells
Minimum of cells with more than
min.counts
(1 by default; single-cell RNA-seq data).- st.min.counts
Minimum gene counts to filter (1 by default; spatial transcriptomics data).
- st.min.spots
Minimum of cells with more than
min.counts
(1 by default; spatial transcriptomics data).- st.n.slides
Minimum number of slides (
SpatialExperiment
objects) in which a gene has to be expressed in order to keep it. This parameter is applicable only when multipleSpatialExperiment
objects are provided. Genes not present in at leastst.n.slides
will be discarded. If no filtering is desired, setst.n.slides = 1
.- shared.genes
If set to
TRUE
, only genes present in both the single-cell and spatial transcriptomics data will be retained for further processing (TRUE
by default).- sc.name.dataset.h5
Name of the data set if HDF5 file is provided for single-cell RNA-seq data.
- sc.file.backend
Valid file path where to store the loaded for single-cell RNA-seq data as HDF5 file. If provided, data are stored in a HDF5 file as back-end using the DelayedArray and HDF5Array packages instead of being loaded into RAM. This is suitable for situations where you have large amounts of data that cannot be stored in memory. Note that operations on these data will be performed by blocks (i.e subsets of determined size), which may result in longer execution times.
NULL
by default.- sc.name.dataset.backend
Name of the HDF5 file dataset to be used. Note that it cannot exist. If
NULL
(by default), a random dataset name will be generated.- sc.compression.level
The compression level used if
sc.file.backend
is provided. It is an integer value between 0 (no compression) and 9 (highest and slowest compression). See?getHDF5DumpCompressionLevel
from the HDF5Array package for more information.- sc.chunk.dims
Specifies dimensions that HDF5 chunk will have. If
NULL
, the default value is a vector of two items: the number of genes considered bySpatialDDLS
object during the simulation, and only one sample in order to increase read times in the following steps. A larger number of columns written in each chunk may lead to longer read times.- sc.block.processing
Boolean indicating whether single-cell RNA-seq data should be treated as blocks (only if data are provided as HDF5 file).
FALSE
by default. Note that using this functionality is suitable for cases where it is not possible to load data into RAM and therefore execution times will be longer.- verbose
Show informative messages during the execution (
TRUE
by default).- project
Name of the project for
SpatialDDLS
object.
Value
A SpatialDDLS
object with the single-cell
RNA-seq data provided loaded into the single.cell.real
slot as a
SingleCellExperiment
object. If spatial
transcriptomics data are provided, they will be loaded into the
spatial.experiments
slot.
Details
Filtering genes
In order to reduce the number of dimensions used for subsequent steps,
createSpatialDDLSobject
implements different strategies aimed at
removing useless genes for deconvolution:
Filtering at the cell level: genes less expressed than a determined cutoff in N cells are removed. See
sc.min.cells
/st.min.cells
andsc.min.counts
/st.min.cells
parameters.Filtering at the cluster level (only for scRNA-seq data): if
sc.filt.genes.cluster == TRUE
,createSpatialDDLSobject
sets a cutoff of non-zero average counts per cluster (sc.min.mean.counts
parameter) and take only thesc.n.genes.per.cluster
genes with the highest logFC per cluster. LogFCs are calculated using normalized logCPM of each cluster with respect to the average in the whole dataset). Finally, if the number of remaining genes is greater thantop.n.genes
, genes are ranked based on variance and thetop.n.genes
most variable genes are used for downstream analyses.
Single-cell RNA-seq data
Single-cell RNA-seq data can be provided from files (formats allowed: tsv,
tsv.gz, mtx (sparse matrix) and hdf5) or a
SingleCellExperiment
object. Data will be stored in the
single.cell.real
slot, and must consist of three pieces of
information:
Single-cell counts: genes as rows and cells as columns.
Cells metadata: annotations (columns) for each cell (rows).
Genes metadata: annotations (columns) for each gene (rows).
If data
are provided from files, single.cell.real
argument must be a vector of
three elements ordered so that the first file corresponds to the count
matrix, the second to the cells metadata, and the last to the genes metadata.
On the other hand, if data are provided as a
SingleCellExperiment
object, it must contain single-cell
counts in assay
, cells metadata in colData
, and genes metadata
in rowData
. Data must be provided without any transformation (e.g.
log-transformation), raw counts are preferred.
Spatial transcriptomics data
It must be a SpatialExperiment
object (or a list of them
if more than one slide is going to be deconvoluted) containing the same
information as the single-cell RNA-seq data: the count matrix, spots
metadata, and genes metadata. Please, make sure the gene identifiers used the
spatial and single-cell transcriptomics data are consistent.
Examples
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(100, lambda = 5), nrow = 40, ncol = 30,
dimnames = list(paste0("Gene", seq(40)), paste0("RHC", seq(30)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(30)),
Cell_Type = sample(x = paste0("CellType", seq(4)), size = 30,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(40))
)
)
counts <- matrix(
rpois(30, lambda = 5), ncol = 6,
dimnames = list(paste0("Gene", 1:5), paste0("Spot", 1:6))
)
coordinates <- matrix(
c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), ncol = 2
)
ste <- SpatialExperiment::SpatialExperiment(
assays = list(counts = as.matrix(counts)),
rowData = data.frame(Gene_ID = paste0("Gene", 1:5)),
colData = data.frame(Cell_ID = paste0("Spot", 1:6)),
spatialCoords = coordinates
)
SDDLS <- createSpatialDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
st.data = ste,
st.spot.ID.column = "Cell_ID",
st.gene.ID.column = "Gene_ID",
project = "Simul_example",
sc.filt.genes.cluster = FALSE
)
#> === 1 SpatialExperiment objects provided
#> === Processing spatial transcriptomics data
#> - Filtering features:
#> - Selected features: 5
#> - Discarded features: 0
#>
#> === Processing single-cell data
#> - Filtering features:
#> - Selected features: 40
#> - Discarded features: 0
#>
#> === Number of shared genes between single-cell and spatial transcriptomics datasets: 5
#> - Original # genes in single-cell data: 40
#> - Original # genes in ST data (object with the greatest # genes): 5
#>
#> === No mitochondrial genes were found by using ^mt- as regrex
#>
#> === Final number of dimensions for further analyses: 5