R/dnnModel.R
deconvDDLSPretrained.Rd
Deconvolute bulk gene expression samples (bulk RNA-Seq) to enumerate and
quantify the proportion of cell types present in a bulk sample using Deep
Neural Network models. This function is intended for users who want to use
pre-trained models integrated in the package. So far, the available models
allow to deconvolute the immune infiltration of breast cancer (using data from
Chung et al., 2017) and the immune infiltration of colorectal cancer (using
data from Li et al., 2017) samples. For the former, two models are available
at two different levels of specificity: specific cell types
(breast.chung.specific
) and generic cell types
(breast.chung.generic
). See breast.chung.generic
,
breast.chung.specific
, and colorectal.li
documentation from the
digitalDLSorteRdata package for more details.
deconvDDLSPretrained(
data,
model = NULL,
normalize = TRUE,
scaling = "standardize",
simplify.set = NULL,
simplify.majority = NULL,
use.generator = FALSE,
batch.size = 64,
verbose = TRUE
)
Matrix or data frame with bulk RNA-Seq samples with genes as rows in SYMBOL notation and samples as columns.
Pre-trained DNN model to use to deconvolute data
. Up to
now, the available models are intended to deconvolute samples from breast
cancer (breast.chung.generic
and breast.chung.specific
) and
colorectal cancer (colorectal.li
). These pre-trained models are
stored in the digitalDLSorteRdata package, so it must be installed
together with digitalDLSorteR to use this function.
Normalize data before deconvolution (TRUE
by default).
How to scale data before training. It may be:
"standardize"
(values are centered around the mean with a unit
standard deviation) or "rescale"
(values are shifted and rescaled so
that they end up ranging between 0 and 1). If normalize = FALSE
, data
is not scaled.
List specifying which cell types should be compressed into a new label whose name will be the list name item. See examples and vignettes for details.
List specifying which cell types should be compressed
into the cell type with the highest proportion in each sample. Unlike
simplify.set
, this argument allows to maintain the complexity of the
results while compressing the information, as no new labels are created.
Boolean indicating whether to use generators for
prediction (FALSE
by default).
Number of samples per batch. Only when use.generator
= TRUE
.
Show informative messages during execution.
A data frame with samples (\(i\)) as rows and cell types (\(j\)) as columns. Each entry represents the predicted proportion of cell type \(j\) in sample \(i\).
This function is intended for users who want to use digitalDLSorteR to
deconvolute their bulk RNA-Seq samples using pre-trained models. For users who
want to build their own models from other scRNA-Seq datasets, see the
createDDLSobject
and deconvDDLSObj
functions.
Chung, W., Eum, H. H., Lee, H. O., Lee, K. M., Lee, H. B., Kim, K. T., et al. (2017). Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8 (1), 15081. doi: doi:10.1038/ncomms15081 .
if (FALSE) { # \dontrun{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 20,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(20)),
Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
DDLS <- createDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE,
sc.log.FC = FALSE
)
probMatrixValid <- data.frame(
Cell_Type = paste0("CellType", seq(6)),
from = c(1, 1, 1, 15, 15, 30),
to = c(15, 15, 30, 50, 50, 70)
)
DDLS <- generateBulkCellMatrix(
object = DDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
prob.design = probMatrixValid,
num.bulk.samples = 50,
verbose = TRUE
)
# training of DDLS model
tensorflow::tf$compat$v1$disable_eager_execution()
DDLS <- trainDDLSModel(
object = DDLS,
on.the.fly = TRUE,
batch.size = 15,
num.epochs = 5
)
# simulating bulk RNA-Seq data
countsBulk <- matrix(
stats::rpois(100, lambda = sample(seq(4, 10), size = 100, replace = TRUE)),
nrow = 40, ncol = 15,
dimnames = list(paste0("Gene", seq(40)), paste0("Bulk", seq(15)))
)
# this is only an example. See vignettes to see how to use pre-trained models
# from the digitalDLSorteRmodels data package
results1 <- deconvDDLSPretrained(
data = countsBulk,
model = trained.model(DDLS),
normalize = TRUE
)
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
CellGroup2 = c("CellType3", "CellType5"))
# in this case the names of the list will be the new labels
results2 <- deconvDDLSPretrained(
countsBulk,
model = trained.model(DDLS),
normalize = TRUE,
simplify.set = simplify
)
# in this case the cell type with the highest proportion will be the new label
results3 <- deconvDDLSPretrained(
countsBulk,
model = trained.model(DDLS),
normalize = TRUE,
simplify.majority = simplify
)
} # }