Calculate gradients of predicted cell types/loss function with respect to input features for interpreting trained deconvolution models

This function enables users to gain insights into the interpretability of the deconvolution model. It calculates the gradients of classes/loss function with respect to the input features used in training. These numeric values are calculated per gene and cell type in pure mixed transcriptional profiles, providing information on the extent to which each feature influences the model's prediction of cell proportions for each cell type.

interGradientsDL(
  object,
  method = "class",
  normalize = TRUE,
  scaling = "standardize",
  verbose = TRUE
)

Arguments

object: DigitalDLSorter object containing a trained deconvolution model (trained.model slot) and pure mixed transcriptional profiles (bulk.simul slot).
method: Method to calculate gradients with respect to inputs. It can be 'class' (gradients of predicted classes w.r.t. inputs), 'loss' (gradients of loss w.r.t. inputs) or 'both'.
normalize: Whether to normalize data using logCPM (TRUE by default). This parameter is only considered when the method used to simulate the mixed transcriptional profiles (simMixedProfiles function) was "AddRawCount". Otherwise, data were already normalized. This parameter should be set according to the transformation used to train the model.
scaling: How to scale data. It can be: "standardize" (values are centered around the mean with a unit standard deviation), "rescale" (values are shifted and rescaled so that they end up ranging between 0 and 1, by default) or "none" (no scaling is performed). This parameter should be set according to the transformation used to train the model.
verbose: Show informative messages during the execution (TRUE by default).

Value

Object containing gradients in the interpret.gradients slot of the DigitalDLSorterDNN object (trained.model slot).

Details

Gradients of classes / loss function with respect to the input features are calculated exclusively using pure mixed transcriptional profiles composed of a single cell type. Consequently, these numbers can be interpreted as the extent to which each feature is being used to predict each cell type proportion. Gradients are calculated at the sample level for each gene, but only mean gradients by cell type are reported. For additional details, see Mañanes et al., 2024.

Examples

# \donttest{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
      rpois(30, lambda = 5), nrow = 15, ncol = 10,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(10)),
    Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
DDLS <- createDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE
)
#> === Bulk RNA-seq data not provided
#> === Processing single-cell data
#>       - Filtering features:
#>          - Selected features: 15
#>          - Discarded features: 0
#> 
#> === No mitochondrial genes were found by using ^mt- as regrex
#> 
#> === Final number of dimensions for further analyses: 15
prop.design <- data.frame(
  Cell_Type = paste0("CellType", seq(2)),
  from = c(1, 30),
  to = c(15, 70)
)
DDLS <- generateBulkCellMatrix(
  object = DDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  prob.design = prop.design, 
  num.bulk.samples = 50,
  verbose = TRUE
)
#> 
#> === The number of bulk RNA-Seq samples that will be generated is equal to 50
#> 
#> === Training set cells by type:
#>     - CellType1: 4
#>     - CellType2: 3
#> === Test set cells by type:
#>     - CellType1: 2
#>     - CellType2: 1
#> === Probability matrix for training data:
#>     - Bulk RNA-Seq samples: 38
#>     - Cell types: 2
#> === Probability matrix for test data:
#>     - Bulk RNA-Seq samples: 12
#>     - Cell types: 2
#> DONE
DDLS <- simBulkProfiles(DDLS)
#> === Setting parallel environment to 1 thread(s)
#> 
#> === Generating train bulk samples:
#> 
#> === Generating test bulk samples:
#> 
#> DONE
DDLS <- trainDDLSModel(
  object = DDLS,
  batch.size = 12,
  num.epochs = 5
)
#> === Training and test from stored data
#>     Using only simulated bulk samples
#>     Using only simulated bulk samples
#> Model: "DigitalDLSorter"
#> _____________________________________________________________________
#> Layer (type)                   Output Shape               Param #    
#> =====================================================================
#> Dense1 (Dense)                 (None, 200)                3200       
#> _____________________________________________________________________
#> BatchNormalization1 (BatchNorm (None, 200)                800        
#> _____________________________________________________________________
#> Activation1 (Activation)       (None, 200)                0          
#> _____________________________________________________________________
#> Dropout1 (Dropout)             (None, 200)                0          
#> _____________________________________________________________________
#> Dense2 (Dense)                 (None, 200)                40200      
#> _____________________________________________________________________
#> BatchNormalization2 (BatchNorm (None, 200)                800        
#> _____________________________________________________________________
#> Activation2 (Activation)       (None, 200)                0          
#> _____________________________________________________________________
#> Dropout2 (Dropout)             (None, 200)                0          
#> _____________________________________________________________________
#> Dense3 (Dense)                 (None, 2)                  402        
#> _____________________________________________________________________
#> BatchNormalization3 (BatchNorm (None, 2)                  8          
#> _____________________________________________________________________
#> ActivationSoftmax (Activation) (None, 2)                  0          
#> =====================================================================
#> Total params: 45,410
#> Trainable params: 44,606
#> Non-trainable params: 804
#> _____________________________________________________________________
#> 
#> === Training DNN with 38 samples:
#> 
#> === Evaluating DNN in test data (12 samples)
#>    - loss: NaN
#>    - accuracy: 0.1667
#>    - mean_absolute_error: NaN
#>    - categorical_accuracy: 0.1667
#> 
#> === Generating prediction results using test data
#> DONE
## calculating gradients
DDLS <- interGradientsDL(DDLS)
# }

Calculate gradients of predicted cell types/loss function with respect to input features for interpreting trained deconvolution models

Arguments

Value

Details

See also

Examples