R/interGradientsDL.R
interGradientsDL.Rd
This function enables users to gain insights into the interpretability of the deconvolution model. It calculates the gradients of classes/loss function with respect to the input features used in training. These numeric values are calculated per gene and cell type in pure mixed transcriptional profiles, providing information on the extent to which each feature influences the model's prediction of cell proportions for each cell type.
interGradientsDL(
object,
method = "class",
normalize = TRUE,
scaling = "standardize",
verbose = TRUE
)
DigitalDLSorter
object containing a trained
deconvolution model (trained.model
slot) and pure mixed
transcriptional profiles (bulk.simul
slot).
Method to calculate gradients with respect to inputs. It can be
'class'
(gradients of predicted classes w.r.t. inputs),
'loss'
(gradients of loss w.r.t. inputs) or 'both'
.
Whether to normalize data using logCPM (TRUE
by
default). This parameter is only considered when the method used to
simulate the mixed transcriptional profiles (simMixedProfiles
function) was "AddRawCount"
. Otherwise, data were already
normalized. This parameter should be set according to the transformation
used to train the model.
How to scale data. It can be: "standardize"
(values are centered around the mean with a unit standard deviation),
"rescale"
(values are shifted and rescaled so that they end up
ranging between 0 and 1, by default) or "none"
(no scaling is
performed). This parameter should be set according to the transformation
used to train the model.
Show informative messages during the execution (TRUE
by
default).
Object containing gradients in the interpret.gradients
slot of
the DigitalDLSorterDNN
object (trained.model
slot).
Gradients of classes / loss function with respect to the input features are calculated exclusively using pure mixed transcriptional profiles composed of a single cell type. Consequently, these numbers can be interpreted as the extent to which each feature is being used to predict each cell type proportion. Gradients are calculated at the sample level for each gene, but only mean gradients by cell type are reported. For additional details, see Mañanes et al., 2024.
# \donttest{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 10,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(10)),
Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
DDLS <- createDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE
)
#> === Bulk RNA-seq data not provided
#> === Processing single-cell data
#> - Filtering features:
#> - Selected features: 15
#> - Discarded features: 0
#>
#> === No mitochondrial genes were found by using ^mt- as regrex
#>
#> === Final number of dimensions for further analyses: 15
prop.design <- data.frame(
Cell_Type = paste0("CellType", seq(2)),
from = c(1, 30),
to = c(15, 70)
)
DDLS <- generateBulkCellMatrix(
object = DDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
prob.design = prop.design,
num.bulk.samples = 50,
verbose = TRUE
)
#>
#> === The number of bulk RNA-Seq samples that will be generated is equal to 50
#>
#> === Training set cells by type:
#> - CellType1: 4
#> - CellType2: 3
#> === Test set cells by type:
#> - CellType1: 2
#> - CellType2: 1
#> === Probability matrix for training data:
#> - Bulk RNA-Seq samples: 38
#> - Cell types: 2
#> === Probability matrix for test data:
#> - Bulk RNA-Seq samples: 12
#> - Cell types: 2
#> DONE
DDLS <- simBulkProfiles(DDLS)
#> === Setting parallel environment to 1 thread(s)
#>
#> === Generating train bulk samples:
#>
#> === Generating test bulk samples:
#>
#> DONE
DDLS <- trainDDLSModel(
object = DDLS,
batch.size = 12,
num.epochs = 5
)
#> === Training and test from stored data
#> Using only simulated bulk samples
#> Using only simulated bulk samples
#> Model: "DigitalDLSorter"
#> _____________________________________________________________________
#> Layer (type) Output Shape Param #
#> =====================================================================
#> Dense1 (Dense) (None, 200) 3200
#> _____________________________________________________________________
#> BatchNormalization1 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation1 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout1 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense2 (Dense) (None, 200) 40200
#> _____________________________________________________________________
#> BatchNormalization2 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation2 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout2 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense3 (Dense) (None, 2) 402
#> _____________________________________________________________________
#> BatchNormalization3 (BatchNorm (None, 2) 8
#> _____________________________________________________________________
#> ActivationSoftmax (Activation) (None, 2) 0
#> =====================================================================
#> Total params: 45,410
#> Trainable params: 44,606
#> Non-trainable params: 804
#> _____________________________________________________________________
#>
#> === Training DNN with 38 samples:
#>
#> === Evaluating DNN in test data (12 samples)
#> - loss: NaN
#> - accuracy: 0.1667
#> - mean_absolute_error: NaN
#> - categorical_accuracy: 0.1667
#>
#> === Generating prediction results using test data
#> DONE
## calculating gradients
DDLS <- interGradientsDL(DDLS)
# }