Train a deep neural network model using training data from the
SpatialDDLS
object. This model will be used to
deconvolute spatial transcriptomics data from the same biological context as
the single-cell RNA-seq data used to train it. In addition, the trained
model is evaluated using test data, and prediction results are obtained to
determine its performance (see ?calculateEvalMetrics
).
Usage
trainDeconvModel(
object,
type.data.train = "mixed",
type.data.test = "mixed",
batch.size = 64,
num.epochs = 60,
num.hidden.layers = 2,
num.units = c(200, 200),
activation.fun = "relu",
dropout.rate = 0.25,
loss = "kullback_leibler_divergence",
metrics = c("accuracy", "mean_absolute_error", "categorical_accuracy"),
normalize = TRUE,
scaling = "standardize",
norm.batch.layers = TRUE,
custom.model = NULL,
shuffle = TRUE,
sc.downsampling = NULL,
use.generator = FALSE,
on.the.fly = FALSE,
agg.function = "AddRawCount",
threads = 1,
view.metrics.plot = TRUE,
verbose = TRUE
)
Arguments
- object
SpatialDDLS
object withsingle.cell.real
/single.cell.simul
,prob.cell.types
, andmixed.profiles
slots (the last only ifon.the.fly = FALSE
).- type.data.train
Type of profiles to be used for training. It can be
'both'
,'single-cell'
or'mixed'
('mixed'
by default).- type.data.test
Type of profiles to be used for evaluation. It can be
'both'
,'single-cell'
or'mixed'
('mixed'
by default).- batch.size
Number of samples per gradient update (64 by default).
- num.epochs
Number of epochs to train the model (60 by default).
- num.hidden.layers
Number of hidden layers of the neural network (2 by default). This number must be equal to the length of
num.units
argument.- num.units
Vector indicating the number of neurons per hidden layer (
c(200, 200)
by default). The length of this vector must be equal to thenum.hidden.layers
argument.- activation.fun
Activation function (
'relu'
by default). See the keras documentation to know available activation functions.- dropout.rate
Float between 0 and 1 indicating the fraction of input neurons to be dropped in layer dropouts (0.25 by default). By default, SpatialDDLS implements 1 dropout layer per hidden layer.
- loss
Character indicating loss function selected for model training (
'kullback_leibler_divergence'
by default). See the keras documentation to know available loss functions.- metrics
Vector of metrics used to assess model performance during training and evaluation (
c("accuracy", "mean_absolute_error", "categorical_accuracy")
by default). See the keras documentation to know available performance metrics.- normalize
Whether to normalize data using logCPM (
TRUE
by default). This parameter is only considered when the method used to simulate mixed transcriptional profiles (simMixedProfiles
function) was"AddRawCount"
. Otherwise, data were already normalized.- scaling
How to scale data before training. It can be:
"standardize"
(values are centered around the mean with a unit standard deviation),"rescale"
(values are shifted and rescaled so that they end up ranging between 0 and 1) or"none"
(no scaling is performed)."standardize"
by default.- norm.batch.layers
Whether to include batch normalization layers between each hidden dense layer (
TRUE
by default).- custom.model
It allows to use a custom neural network architecture. It must be a
keras.engine.sequential.Sequential
object in which the number of input neurons is equal to the number of considered features/genes, and the number of output neurons is equal to the number of cell types considered (NULL
by default). If provided, the arguments related to the neural network architecture will be ignored.- shuffle
Boolean indicating whether data will be shuffled (
TRUE
by default).- sc.downsampling
It is only used if
type.data.train
is equal to'both'
or'single-cell'
. It allows to set a maximum number of single-cell profiles of a specific cell type for training to avoid an unbalanced representation of classes (NULL
by default).- use.generator
Boolean indicating whether to use generators during training and test. Generators are automatically used when
on.the.fly = TRUE
or HDF5 files are used, but it can be activated by the user on demand (FALSE
by default).- on.the.fly
Boolean indicating whether simulated data will be generated 'on the fly' during training (
FALSE
by default).- agg.function
If
on.the.fly == TRUE
, function used to build mixed transcriptional profiles. It may be:"AddRawCount"
(by default): single-cell profiles (raw counts) are added up across cells. Then, log-CPMs are calculated."MeanCPM"
: single-cell profiles (raw counts) are transformed into logCPM and cross-cell averages are calculated."AddCPM"
: single-cell profiles (raw counts) are transformed into CPMs and are added up across cells. Then, log-CPMs are calculated.
- threads
Number of threads used during simulation of mixed transcriptional profiles if
on.the.fly = TRUE
(1 by default).- view.metrics.plot
Boolean indicating whether to show plots of loss and evaluation metrics during training (
TRUE
by default). keras for R allows to see model progression during training if you are working in RStudio.- verbose
Boolean indicating whether to display model progression during training and model architecture information (
TRUE
by default).
Value
A SpatialDDLS
object with trained.model
slot containing a DeconvDLModel
object. For more
information about the structure of this class, see
?DeconvDLModel
.
Details
Simulation of mixed transcriptional profiles 'on the fly'
trainDeconvModel
can avoid storing simulated mixed spot profiles by
using the on.the.fly
argument. This functionality aims at reducing the
the simMixedProfiles
function's memory usage: simulated profiles are
built in each batch during training/evaluation.
Neural network architecture
It is possible to change the model's architecture: number of hidden layers,
number of neurons for each hidden layer, dropout rate, activation function,
and loss function. For more customized models, it is possible to provide a
pre-built model through the custom.model
argument (a
keras.engine.sequential.Sequential
object) where it is necessary that
the number of input neurons is equal to the number of considered
features/genes, and the number of output neurons is equal to the number of
considered cell types.
Examples
# \donttest{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 10,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(10)),
Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
SDDLS <- createSpatialDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE
)
#> === Spatial transcriptomics data not provided
#> === Processing single-cell data
#> - Filtering features:
#> - Selected features: 15
#> - Discarded features: 0
#>
#> === No mitochondrial genes were found by using ^mt- as regrex
#>
#> === Final number of dimensions for further analyses: 15
SDDLS <- genMixedCellProp(
object = SDDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
num.sim.spots = 50,
train.freq.cells = 2/3,
train.freq.spots = 2/3,
verbose = TRUE
)
#>
#> === The number of mixed profiles that will be generated is equal to 50
#>
#> === Training set cells by type:
#> - CellType1: 4
#> - CellType2: 3
#> === Test set cells by type:
#> - CellType1: 2
#> - CellType2: 1
#> === Probability matrix for training data:
#> - Mixed spots: 34
#> - Cell types: 2
#> === Probability matrix for test data:
#> - Mixed spots: 16
#> - Cell types: 2
#> DONE
SDDLS <- simMixedProfiles(SDDLS)
#> === Setting parallel environment to 1 thread(s)
#>
#> === Generating train mixed profiles:
#>
#> === Generating test mixed profiles:
#>
#> DONE
SDDLS <- trainDeconvModel(
object = SDDLS,
batch.size = 12,
num.epochs = 5
)
#> === Training and test from stored data
#> Using only simulated mixed samples
#> Using only simulated mixed samples
#> Model: "SpatialDDLS"
#> _____________________________________________________________________
#> Layer (type) Output Shape Param #
#> =====================================================================
#> Dense1 (Dense) (None, 200) 3200
#> _____________________________________________________________________
#> BatchNormalization1 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation1 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout1 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense2 (Dense) (None, 200) 40200
#> _____________________________________________________________________
#> BatchNormalization2 (BatchNorm (None, 200) 800
#> _____________________________________________________________________
#> Activation2 (Activation) (None, 200) 0
#> _____________________________________________________________________
#> Dropout2 (Dropout) (None, 200) 0
#> _____________________________________________________________________
#> Dense3 (Dense) (None, 2) 402
#> _____________________________________________________________________
#> BatchNormalization3 (BatchNorm (None, 2) 8
#> _____________________________________________________________________
#> ActivationSoftmax (Activation) (None, 2) 0
#> =====================================================================
#> Total params: 45,410
#> Trainable params: 44,606
#> Non-trainable params: 804
#> _____________________________________________________________________
#>
#> === Training DNN with 34 samples:
#>
#> === Evaluating DNN in test data (16 samples)
#> - loss: NaN
#> - accuracy: 0.5
#> - mean_absolute_error: NaN
#> - categorical_accuracy: 0.5
#>
#> === Generating prediction results using test data
#> DONE
# }