Deconvolute spatial transcriptomics data using trained model

Deconvolute spatial transcriptomics data using the trained model in the SpatialDDLS object. The trained model is used to predict cell proportions of two mirrored transcriptional profiles:

'Intrinsic' profiles: transcriptional profiles of each spot in the ST dataset.
'Extrinsic' profiles: profiles simulated from the surrounding spots of each spot.

After prediction, cell proportions from the intrinsic profiles (intrinsic cell proportions) are regularized based on the similarity between intrinsic and extrinsic profiles in order to maintain spatial consistency. This approach leverages both transcriptional and spatial information. For more details, see Mañanes et al., 2023 and the Details section.

Usage

deconvSpatialDDLS(
  object,
  index.st,
  normalize = TRUE,
  scaling = "standardize",
  k.spots = 4,
  pca.space = TRUE,
  fast.pca = TRUE,
  pcs.num = 50,
  pca.var = 0.8,
  metric = "euclidean",
  alpha.cutoff = "mean",
  alpha.quantile = 0.5,
  simplify.set = NULL,
  simplify.majority = NULL,
  use.generator = FALSE,
  batch.size = 64,
  verbose = TRUE
)

Arguments

object: SpatialDDLS object with trained.model and spatial.experiments slots.
index.st: Name or index of the dataset/slide stored in the SpatialDDLS object (spatial.experiments slot) to be deconvolute. If missing, all datasets will be deconvoluted.
normalize: Normalize data (logCPM) before deconvolution (TRUE by default).
scaling: How to scale data before training. Options include "standardize" (values are centered around the mean with a unit standard deviation) or "rescale" (values are shifted and rescaled so that they end up ranging between 0 and 1). If normalize = FALSE, data are not scaled.
k.spots: Number of nearest spots considered for each spot during regularization and simulation of extrinsic transcriptional profiles. The greater, the smoother the regularization will be (4 by default).
pca.space: Whether to use PCA space to calculate distances between intrinsic and extrinsic transcriptional profiles (TRUE by default).
fast.pca: Whether using the irlba implementation. If TRUE, the number of PCs used is defined by the parameter. If FALSE, the PCA implementation from the stats R package is used instead (TRUE by default).
pcs.num: Number of PCs used to calculate distances if fast.pca == TRUE (50 by default).
pca.var: Threshold of explained variance (between 0.2 and 1) used to choose the number of PCs used if pca.space == TRUE and fast.pca == FALSE (0.8 by default).
metric: Metric used to measure distance/similarity between intrinsic and extrinsic transcriptional profiles. It may be 'euclidean', 'cosine' or 'pearson' ('euclidean' by default).
alpha.cutoff: Minimum distance for regularization. It may be 'mean' (spots with transcriptional distances shorter than the mean distance of the dataset will be modified) or 'quantile' (spots with transcriptional distances shorter than the alpha.quantile quantile are used). 'mean' by default.
alpha.quantile: Quantile used if alpha.cutoff == 'quantile'. 0.5 by default.
simplify.set: List specifying which cell types should be compressed into a new label with the name of the list item. See examples for details. If provided, results are stored in a list with 'raw' and 'simpli.set' elements.
simplify.majority: List specifying which cell types should be compressed into the cell type with the highest proportion in each spot. Unlike simplify.set, no new labels are created. If provided, results are stored in a list with 'raw' and 'simpli.majority' elements.
use.generator: Boolean indicating whether to use generators for prediction (FALSE by default).
batch.size: Number of samples per batch. Only when use.generator = TRUE.
verbose: Show informative messages during the execution.

Value

SpatialDDLS object with a deconv.spots

slot. The output is a list containing 'Regularized', 'Intrinsic' and 'Extrinsic' deconvoluted cell proportions, 'Distances' between intrinsic and extrinsic transcriptional profiles, and 'Weight.factors' with the final weights used to regularize intrinsic cell proportions. If simplify.set and/or simplify.majority are provided, the deconv.spots slot will contain a list with raw and simplified results.

Details

The deconvolution process involves two main steps: predicting cell proportions based on transcriptome using the trained neural network model, and regularization of cell proportions based on the spatial location of each spot. In the regularization step, a mirrored version of each spot is simulated based on its N-nearest spots. We refer to these profiles as 'extrinsic' profiles, whereas the transcriptional profiles of each spot are called 'intrinsic' profiles. Extrinsic profiles are used to regularize predictions based on intrinsic profiles. The rationale is that spots surrounded by transcriptionally similar spots should have similar cell compositions, and therefore predicted proportions can be smoothed to preserve their spatial consistency. On the other hand, spots surrounded by dissimilar spots cannot be predicted by their neighbors, and thus they can only be predicted by their own transcriptional profiles likely due to presenting very specific cell compositions.

Regarding the working os SpatialDDLS: first, extrinsic profiles are simulated based on the N-nearest spots for each spot by summing their transcriptomes. Distances between extrinsic and intrinsic profiles of each spot are calculated so that similar/dissimilar spots are identified. These two sets of transcriptional profiles are used as input for the trained neural network model, and according to the calculated distances, a weighted mean between the predicted proportions for each spot is calculated. Spots with distances between intrinsic and extrinsic profiles greater than alpha.cutoff are not regularized, whereas spots with distances less than alpha.cutoff contribute to the weighted mean. Weights are calculated by rescaling distances less than alpha.cutoff between 0 and 0.5, so that the maximum extent to which a extrinsic profile can modified the predictions based on intrinsic profiles is 0.5 (a regular mean). For more details, see Mañanes et al., 2023.

This function requires a SpatialDDLS object with a trained deep neural network model (trained.model slot, and the spatial transcriptomics datasets to be deconvoluted in the spatial.experiments slot. See ?createSpatialDDLSobject or ?loadSTProfiles for more details.

References

Mañanes, D., Rivero-García, I., Jimenez-Carretero, D., Torres, M., Sancho, D., Torroja, C., Sánchez-Cabo, F. (2023). SpatialDDLS: An R package to deconvolute spatial transcriptomics data using neural networks. biorxiv. doi: doi:10.1101/2023.08.31.555677 .

Examples

# \donttest{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
     rpois(30, lambda = 5), nrow = 15, ncol = 20,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(20)),
    Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
SDDLS <- createSpatialDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE
)
#> === Spatial transcriptomics data not provided
#> === Processing single-cell data
#>       - Filtering features:
#>          - Selected features: 15
#>          - Discarded features: 0
#> 
#> === No mitochondrial genes were found by using ^mt- as regrex
#> 
#> === Final number of dimensions for further analyses: 15
SDDLS <- genMixedCellProp(
  object = SDDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  num.sim.spots = 50,
  train.freq.cells = 2/3,
  train.freq.spots = 2/3,
  verbose = TRUE
) 
#> 
#> === The number of mixed profiles that will be generated is equal to 50
#> 
#> === Training set cells by type:
#>     - CellType1: 3
#>     - CellType2: 1
#>     - CellType3: 3
#>     - CellType4: 2
#>     - CellType5: 3
#>     - CellType6: 2
#> === Test set cells by type:
#>     - CellType1: 1
#>     - CellType2: 1
#>     - CellType3: 1
#>     - CellType4: 1
#>     - CellType5: 1
#>     - CellType6: 1
#> === Probability matrix for training data:
#>     - Mixed spots: 34
#>     - Cell types: 6
#> === Probability matrix for test data:
#>     - Mixed spots: 16
#>     - Cell types: 6
#> DONE
SDDLS <- simMixedProfiles(SDDLS)
#> === Setting parallel environment to 1 thread(s)
#> 
#> === Generating train mixed profiles:
#> 
#> === Generating test mixed profiles:
#> 
#> DONE
# training of SDDLS model
SDDLS <- trainDeconvModel(
  object = SDDLS,
  batch.size = 15,
  num.epochs = 5
)
#> === Training and test from stored data
#>     Using only simulated mixed samples
#>     Using only simulated mixed samples
#> Model: "SpatialDDLS"
#> _____________________________________________________________________
#> Layer (type)                   Output Shape               Param #    
#> =====================================================================
#> Dense1 (Dense)                 (None, 200)                3200       
#> _____________________________________________________________________
#> BatchNormalization1 (BatchNorm (None, 200)                800        
#> _____________________________________________________________________
#> Activation1 (Activation)       (None, 200)                0          
#> _____________________________________________________________________
#> Dropout1 (Dropout)             (None, 200)                0          
#> _____________________________________________________________________
#> Dense2 (Dense)                 (None, 200)                40200      
#> _____________________________________________________________________
#> BatchNormalization2 (BatchNorm (None, 200)                800        
#> _____________________________________________________________________
#> Activation2 (Activation)       (None, 200)                0          
#> _____________________________________________________________________
#> Dropout2 (Dropout)             (None, 200)                0          
#> _____________________________________________________________________
#> Dense3 (Dense)                 (None, 6)                  1206       
#> _____________________________________________________________________
#> BatchNormalization3 (BatchNorm (None, 6)                  24         
#> _____________________________________________________________________
#> ActivationSoftmax (Activation) (None, 6)                  0          
#> =====================================================================
#> Total params: 46,230
#> Trainable params: 45,418
#> Non-trainable params: 812
#> _____________________________________________________________________
#> 
#> === Training DNN with 34 samples:
#> 
#> === Evaluating DNN in test data (16 samples)
#>    - loss: 1.6916
#>    - accuracy: 0.125
#>    - mean_absolute_error: 0.2476
#>    - categorical_accuracy: 0.125
#> 
#> === Generating prediction results using test data
#> DONE
# simulating spatial data
ngenes <- sample(3:40, size = 1)
ncells <- sample(10:40, size = 1)
counts <- matrix(
  rpois(ngenes * ncells, lambda = 5), ncol = ncells,
  dimnames = list(paste0("Gene", seq(ngenes)), paste0("Spot", seq(ncells)))
)
coordinates <- matrix(
  rep(c(1, 2), ncells), ncol = 2
)
st <- SpatialExperiment::SpatialExperiment(
  assays = list(counts = as.matrix(counts)),
  rowData = data.frame(Gene_ID = paste0("Gene", seq(ngenes))),
  colData = data.frame(Cell_ID = paste0("Spot", seq(ncells))),
  spatialCoords = coordinates
)
SDDLS <- loadSTProfiles(
  object = SDDLS,
  st.data = st,
  st.spot.ID.column = "Cell_ID",
  st.gene.ID.column = "Gene_ID"
)
#> === 1 SpatialExperiment objects provided
#>    === Processing spatial transcriptomics data
#>       - Filtering features:
#>          - Selected features: 27
#>          - Discarded features: 0
#> 
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
                 CellGroup2 = c("CellType3", "CellType5"))
SDDLS <- deconvSpatialDDLS(
  object = SDDLS,
  index.st = 1,
  simplify.set = simplify, 
  simplify.majority = simplify
)
#> === Filtering out 12 features in data that are not present in trained model
#> === Normalizing data (LogCPM)
#> === Predicting cell type proportions
#> 
#> === Calculating distances in PCA space
#> 
#> === Calculating 50 PCs
#> Warning: You're computing too large a percentage of total singular values, use a standard svd instead.
#> === Calculating alpha factors based on distances
#> === Note that only regularized proportions will be simplified
#> DONE
# }