Deconvolute bulk gene expression samples (bulk RNA-Seq) to enumerate and quantify the proportion of cell types present in a bulk sample using Deep Neural Network models. This function is intended for users who want to use pre-trained models integrated in the package. So far, the available models allow to deconvolute the immune infiltration of breast cancer (using data from Chung et al., 2017) and the immune infiltration of colorectal cancer (using data from Li et al., 2017) samples. For the former, two models are available at two different levels of specificity: specific cell types (breast.chung.specific) and generic cell types (breast.chung.generic). See breast.chung.generic, breast.chung.specific, and colorectal.li documentation from the digitalDLSorteRdata package for more details.

deconvDDLSPretrained(
  data,
  model = NULL,
  normalize = TRUE,
  scaling = "standardize",
  simplify.set = NULL,
  simplify.majority = NULL,
  use.generator = FALSE,
  batch.size = 64,
  verbose = TRUE
)

Arguments

data

Matrix or data frame with bulk RNA-Seq samples with genes as rows in SYMBOL notation and samples as columns.

model

Pre-trained DNN model to use to deconvolute data. Up to now, the available models are intended to deconvolute samples from breast cancer (breast.chung.generic and breast.chung.specific) and colorectal cancer (colorectal.li). These pre-trained models are stored in the digitalDLSorteRdata package, so it must be installed together with digitalDLSorteR to use this function.

normalize

Normalize data before deconvolution (TRUE by default).

scaling

How to scale data before training. It may be: "standardize" (values are centered around the mean with a unit standard deviation) or "rescale" (values are shifted and rescaled so that they end up ranging between 0 and 1). If normalize = FALSE, data is not scaled.

simplify.set

List specifying which cell types should be compressed into a new label whose name will be the list name item. See examples and vignettes for details.

simplify.majority

List specifying which cell types should be compressed into the cell type with the highest proportion in each sample. Unlike simplify.set, this argument allows to maintain the complexity of the results while compressing the information, as no new labels are created.

use.generator

Boolean indicating whether to use generators for prediction (FALSE by default).

batch.size

Number of samples per batch. Only when use.generator = TRUE.

verbose

Show informative messages during execution.

Value

A data frame with samples (\(i\)) as rows and cell types (\(j\)) as columns. Each entry represents the predicted proportion of cell type \(j\) in sample \(i\).

Details

This function is intended for users who want to use digitalDLSorteR to deconvolute their bulk RNA-Seq samples using pre-trained models. For users who want to build their own models from other scRNA-Seq datasets, see the createDDLSobject and deconvDDLSObj functions.

References

Chung, W., Eum, H. H., Lee, H. O., Lee, K. M., Lee, H. B., Kim, K. T., et al. (2017). Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8 (1), 15081. doi: doi:10.1038/ncomms15081 .

See also

Examples

if (FALSE) { # \dontrun{
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
      rpois(30, lambda = 5), nrow = 15, ncol = 20,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(20)),
    Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
DDLS <- createDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE, 
  sc.log.FC = FALSE
)
probMatrixValid <- data.frame(
  Cell_Type = paste0("CellType", seq(6)),
  from = c(1, 1, 1, 15, 15, 30),
  to = c(15, 15, 30, 50, 50, 70)
)
DDLS <- generateBulkCellMatrix(
  object = DDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  prob.design = probMatrixValid,
  num.bulk.samples = 50,
  verbose = TRUE
)
# training of DDLS model
tensorflow::tf$compat$v1$disable_eager_execution()
DDLS <- trainDDLSModel(
  object = DDLS,
  on.the.fly = TRUE,
  batch.size = 15,
  num.epochs = 5
)
# simulating bulk RNA-Seq data
countsBulk <- matrix(
  stats::rpois(100, lambda = sample(seq(4, 10), size = 100, replace = TRUE)),
  nrow = 40, ncol = 15,
  dimnames = list(paste0("Gene", seq(40)), paste0("Bulk", seq(15)))
)
# this is only an example. See vignettes to see how to use pre-trained models
# from the digitalDLSorteRmodels data package
results1 <- deconvDDLSPretrained(
  data = countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE
)
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
                 CellGroup2 = c("CellType3", "CellType5"))
# in this case the names of the list will be the new labels
results2 <- deconvDDLSPretrained(
  countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.set = simplify
)
# in this case the cell type with the highest proportion will be the new label
results3 <- deconvDDLSPretrained(
  countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.majority = simplify
)
} # }