Deconvolute bulk gene expression samples (bulk RNA-Seq) to enumerate and quantify the proportion of cell types present in a bulk sample using Deep Neural Network models. This function is intended for users who want to use pre-trained models integrated in the package. So far, the available models allow to deconvolute the immune infiltration of breast cancer (using data from Chung et al., 2017) and the immune infiltration of colorectal cancer (using data from Li et al., 2017) samples. For the former, two models are available at two different levels of specificity: specific cell types (breast.chung.specific) and generic cell types (breast.chung.generic). See breast.chung.generic, breast.chung.specific, and documentation from the digitalDLSorteRdata package for more details.

  model = NULL,
  normalize = TRUE,
  scaling = "standardize",
  simplify.set = NULL,
  simplify.majority = NULL,
  use.generator = FALSE,
  batch.size = 64,
  verbose = TRUE



Matrix or data frame with bulk RNA-Seq samples with genes as rows in SYMBOL notation and samples as columns.


Pre-trained DNN model to use to deconvolute data. Up to now, the available models are intended to deconvolute samples from breast cancer (breast.chung.generic and breast.chung.specific) and colorectal cancer ( These pre-trained models are stored in the digitalDLSorteRdata package, so it must be installed together with digitalDLSorteR to use this function.


Normalize data before deconvolution (TRUE by default).


How to scale data before training. It may be: "standardize" (values are centered around the mean with a unit standard deviation) or "rescale" (values are shifted and rescaled so that they end up ranging between 0 and 1). If normalize = FALSE, data is not scaled.


List specifying which cell types should be compressed into a new label whose name will be the list name item. See examples and vignettes for details.


List specifying which cell types should be compressed into the cell type with the highest proportion in each sample. Unlike simplify.set, this argument allows to maintain the complexity of the results while compressing the information, as no new labels are created.


Boolean indicating whether to use generators for prediction (FALSE by default).


Number of samples per batch. Only when use.generator = TRUE.


Show informative messages during execution.


A data frame with samples (\(i\)) as rows and cell types (\(j\)) as columns. Each entry represents the predicted proportion of cell type \(j\) in sample \(i\).


This function is intended for users who want to use digitalDLSorteR to deconvolute their bulk RNA-Seq samples using pre-trained models. For users who want to build their own models from other scRNA-Seq datasets, see the createDDLSobject and deconvDDLSObj functions.


Chung, W., Eum, H. H., Lee, H. O., Lee, K. M., Lee, H. B., Kim, K. T., et al. (2017). Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8 (1), 15081. doi: doi:10.1038/ncomms15081 .

See also


if (FALSE) {
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
      rpois(30, lambda = 5), nrow = 15, ncol = 20,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(20)),
    Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
                       replace = TRUE)
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
DDLS <- createDDLSobject( = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE, 
  sc.log.FC = FALSE
probMatrixValid <- data.frame(
  Cell_Type = paste0("CellType", seq(6)),
  from = c(1, 1, 1, 15, 15, 30),
  to = c(15, 15, 30, 50, 50, 70)
DDLS <- generateBulkCellMatrix(
  object = DDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type", = probMatrixValid,
  num.bulk.samples = 50,
  verbose = TRUE
# training of DDLS model
DDLS <- trainDDLSModel(
  object = DDLS, = TRUE,
  batch.size = 15,
  num.epochs = 5
# simulating bulk RNA-Seq data
countsBulk <- matrix(
  stats::rpois(100, lambda = sample(seq(4, 10), size = 100, replace = TRUE)),
  nrow = 40, ncol = 15,
  dimnames = list(paste0("Gene", seq(40)), paste0("Bulk", seq(15)))
# this is only an example. See vignettes to see how to use pre-trained models
# from the digitalDLSorteRmodels data package
results1 <- deconvDDLSPretrained(
  data = countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
                 CellGroup2 = c("CellType3", "CellType5"))
# in this case the names of the list will be the new labels
results2 <- deconvDDLSPretrained(
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.set = simplify
# in this case the cell type with the highest proportion will be the new label
results3 <- deconvDDLSPretrained(
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.majority = simplify