Deconvolute spatial transcriptomics data using trained model
Source:R/dnnModel.R
deconvSpatialDDLS.Rd
Deconvolute spatial transcriptomics data using the trained model in
the SpatialDDLS
object. The trained model is used
to predict cell proportions of two mirrored transcriptional profiles:
'Intrinsic' profiles: transcriptional profiles of each spot in the ST dataset.
'Extrinsic' profiles: profiles simulated from the surrounding spots of each spot.
After prediction, cell proportions from the intrinsic profiles (intrinsic cell proportions) are regularized based on the similarity between intrinsic and extrinsic profiles in order to maintain spatial consistency. This approach leverages both transcriptional and spatial information. For more details, see Mañanes et al., 2023 and the Details section.
Usage
deconvSpatialDDLS(
object,
index.st,
normalize = TRUE,
scaling = "standardize",
k.spots = 4,
pca.space = TRUE,
fast.pca = TRUE,
pcs.num = 50,
pca.var = 0.8,
metric = "euclidean",
alpha.cutoff = "mean",
alpha.quantile = 0.5,
simplify.set = NULL,
simplify.majority = NULL,
use.generator = FALSE,
batch.size = 64,
verbose = TRUE
)
Arguments
- object
SpatialDDLS
object withtrained.model
andspatial.experiments
slots.- index.st
Name or index of the dataset/slide stored in the
SpatialDDLS
object (spatial.experiments
slot) to be deconvolute. If missing, all datasets will be deconvoluted.- normalize
Normalize data (logCPM) before deconvolution (
TRUE
by default).- scaling
How to scale data before training. Options include
"standardize"
(values are centered around the mean with a unit standard deviation) or"rescale"
(values are shifted and rescaled so that they end up ranging between 0 and 1). Ifnormalize = FALSE
, data are not scaled.- k.spots
Number of nearest spots considered for each spot during regularization and simulation of extrinsic transcriptional profiles. The greater, the smoother the regularization will be (4 by default).
- pca.space
Whether to use PCA space to calculate distances between intrinsic and extrinsic transcriptional profiles (
TRUE
by default).- fast.pca
Whether using the irlba implementation. If
TRUE
, the number of PCs used is defined by theparameter. If
FALSE
, the PCA implementation from the stats R package is used instead (TRUE
by default).- pcs.num
Number of PCs used to calculate distances if
fast.pca == TRUE
(50 by default).- pca.var
Threshold of explained variance (between 0.2 and 1) used to choose the number of PCs used if
pca.space == TRUE
andfast.pca == FALSE
(0.8 by default).- metric
Metric used to measure distance/similarity between intrinsic and extrinsic transcriptional profiles. It may be
'euclidean'
,'cosine'
or'pearson'
('euclidean'
by default).- alpha.cutoff
Minimum distance for regularization. It may be
'mean'
(spots with transcriptional distances shorter than the mean distance of the dataset will be modified) or'quantile'
(spots with transcriptional distances shorter than thealpha.quantile
quantile are used).'mean'
by default.- alpha.quantile
Quantile used if
alpha.cutoff == 'quantile'
. 0.5 by default.- simplify.set
List specifying which cell types should be compressed into a new label with the name of the list item. See examples for details. If provided, results are stored in a list with
'raw'
and'simpli.set'
elements.- simplify.majority
List specifying which cell types should be compressed into the cell type with the highest proportion in each spot. Unlike
simplify.set
, no new labels are created. If provided, results are stored in a list with'raw'
and'simpli.majority'
elements.- use.generator
Boolean indicating whether to use generators for prediction (
FALSE
by default).- batch.size
Number of samples per batch. Only when
use.generator = TRUE
.- verbose
Show informative messages during the execution.
Value
SpatialDDLS
object with a deconv.spots
slot. The output is a list containing 'Regularized', 'Intrinsic' and
'Extrinsic' deconvoluted cell proportions, 'Distances' between intrinsic
and extrinsic transcriptional profiles, and 'Weight.factors' with the
final weights used to regularize intrinsic cell proportions. If
simplify.set
and/or simplify.majority
are provided,
the deconv.spots
slot will contain a list with raw and simplified
results.
Details
The deconvolution process involves two main steps: predicting cell proportions based on transcriptome using the trained neural network model, and regularization of cell proportions based on the spatial location of each spot. In the regularization step, a mirrored version of each spot is simulated based on its N-nearest spots. We refer to these profiles as 'extrinsic' profiles, whereas the transcriptional profiles of each spot are called 'intrinsic' profiles. Extrinsic profiles are used to regularize predictions based on intrinsic profiles. The rationale is that spots surrounded by transcriptionally similar spots should have similar cell compositions, and therefore predicted proportions can be smoothed to preserve their spatial consistency. On the other hand, spots surrounded by dissimilar spots cannot be predicted by their neighbors, and thus they can only be predicted by their own transcriptional profiles likely due to presenting very specific cell compositions.
Regarding the working os SpatialDDLS: first, extrinsic profiles are
simulated based on the N-nearest spots for each spot by summing their
transcriptomes. Distances between extrinsic and intrinsic profiles of each
spot are calculated so that similar/dissimilar spots are identified. These
two sets of transcriptional profiles are used as input for the trained neural
network model, and according to the calculated distances, a weighted mean
between the predicted proportions for each spot is calculated. Spots with
distances between intrinsic and extrinsic profiles greater than
alpha.cutoff
are not regularized, whereas spots with distances less
than alpha.cutoff
contribute to the weighted mean. Weights are
calculated by rescaling distances less than alpha.cutoff
between 0
and 0.5, so that the maximum extent to which a extrinsic profile can
modified the predictions based on intrinsic profiles is 0.5 (a regular
mean). For more details, see Mañanes et al., 2023.
This function requires a SpatialDDLS
object with a
trained deep neural network model (trained.model
slot, and the
spatial transcriptomics datasets to be deconvoluted in the
spatial.experiments
slot. See ?createSpatialDDLSobject
or ?loadSTProfiles
for more details.
References
Mañanes, D., Rivero-García, I., Jimenez-Carretero, D., Torres, M., Sancho, D., Torroja, C., Sánchez-Cabo, F. (2023). SpatialDDLS: An R package to deconvolute spatial transcriptomics data using neural networks. biorxiv. doi: doi:10.1101/2023.08.31.555677 .
Examples
# \donttest{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 20,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(20)),
Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
SDDLS <- createSpatialDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE
)
#> === Spatial transcriptomics data not provided
#> === Processing single-cell data
#> - Filtering features:
#> - Selected features: 15
#> - Discarded features: 0
#>
#> === No mitochondrial genes were found by using ^mt- as regrex
#>
#> === Final number of dimensions for further analyses: 15
SDDLS <- genMixedCellProp(
object = SDDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
num.sim.spots = 50,
train.freq.cells = 2/3,
train.freq.spots = 2/3,
verbose = TRUE
)
#>
#> === The number of mixed profiles that will be generated is equal to 50
#>
#> === Training set cells by type:
#> - CellType1: 3
#> - CellType2: 1
#> - CellType3: 3
#> - CellType4: 2
#> - CellType5: 3
#> - CellType6: 2
#> === Test set cells by type:
#> - CellType1: 1
#> - CellType2: 1
#> - CellType3: 1
#> - CellType4: 1
#> - CellType5: 1
#> - CellType6: 1
#> === Probability matrix for training data:
#> - Mixed spots: 34
#> - Cell types: 6
#> === Probability matrix for test data:
#> - Mixed spots: 16
#> - Cell types: 6
#> DONE
SDDLS <- simMixedProfiles(SDDLS)
#> === Setting parallel environment to 1 thread(s)
#>
#> === Generating train mixed profiles:
#>
#> === Generating test mixed profiles:
#>
#> DONE
# training of SDDLS model
SDDLS <- trainDeconvModel(
object = SDDLS,
batch.size = 15,
num.epochs = 5
)
#> === Training and test from stored data
#> Using only simulated mixed samples
#> Using only simulated mixed samples
#> Model: "SpatialDDLS"
#> _____________________________________________________________________
#> Layer (type) Output Shape Param #
#> =====================================================================
#> Dense1 (Dense) (None, 200) 3200
#> _____________________________________________________________________
#> BatchNormalization1 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation1 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout1 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense2 (Dense) (None, 200) 40200
#> _____________________________________________________________________
#> BatchNormalization2 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation2 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout2 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense3 (Dense) (None, 6) 1206
#> _____________________________________________________________________
#> BatchNormalization3 (BatchNorm (None, 6) 24
#> _____________________________________________________________________
#> ActivationSoftmax (Activation) (None, 6) 0
#> =====================================================================
#> Total params: 46,230
#> Trainable params: 45,418
#> Non-trainable params: 812
#> _____________________________________________________________________
#>
#> === Training DNN with 34 samples:
#>
#> === Evaluating DNN in test data (16 samples)
#> - loss: 1.6916
#> - accuracy: 0.125
#> - mean_absolute_error: 0.2476
#> - categorical_accuracy: 0.125
#>
#> === Generating prediction results using test data
#> DONE
# simulating spatial data
ngenes <- sample(3:40, size = 1)
ncells <- sample(10:40, size = 1)
counts <- matrix(
rpois(ngenes * ncells, lambda = 5), ncol = ncells,
dimnames = list(paste0("Gene", seq(ngenes)), paste0("Spot", seq(ncells)))
)
coordinates <- matrix(
rep(c(1, 2), ncells), ncol = 2
)
st <- SpatialExperiment::SpatialExperiment(
assays = list(counts = as.matrix(counts)),
rowData = data.frame(Gene_ID = paste0("Gene", seq(ngenes))),
colData = data.frame(Cell_ID = paste0("Spot", seq(ncells))),
spatialCoords = coordinates
)
SDDLS <- loadSTProfiles(
object = SDDLS,
st.data = st,
st.spot.ID.column = "Cell_ID",
st.gene.ID.column = "Gene_ID"
)
#> === 1 SpatialExperiment objects provided
#> === Processing spatial transcriptomics data
#> - Filtering features:
#> - Selected features: 27
#> - Discarded features: 0
#>
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
CellGroup2 = c("CellType3", "CellType5"))
SDDLS <- deconvSpatialDDLS(
object = SDDLS,
index.st = 1,
simplify.set = simplify,
simplify.majority = simplify
)
#> === Filtering out 12 features in data that are not present in trained model
#> === Normalizing data (LogCPM)
#> === Predicting cell type proportions
#>
#> === Calculating distances in PCA space
#>
#> === Calculating 50 PCs
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> === Calculating alpha factors based on distances
#> === Note that only regularized proportions will be simplified
#> DONE
# }