A function to deconvolute of bulk samples to their origin proportions using data from reference atlas (e.g. methylation signatures) Results of model are returned in dataframe "results". Summary of partial R-squared values of model (min, median, mean, max) are printed upon completion.

deconvolute(reference, vec = NULL, bulk, model = "nnls")

Arguments

reference

A dataframe containing signatures of different cell types (e.g. methylation signature) used to train the model. The first column should contain a unique ID (e.g. target ID) to match rows of the reference to rows of the bulk. All subsequent columns are cell types. One row per unit of the signature (e.g. CpG). Each cell contains the value for the cell type of this unit (e.g. methylation value of the CpG). If not given, defaults to a reference atlas which is included in this package. This reference atlas comes from Moss et al. (2018)

vec

The user may provide a vector with which partial R-squared of the results will be calculated. The length must match the number of rows of the reference and bulk tables merged on the ID column (with NAs removed). Defaults to row means of reference.

bulk

A dataframe containing signatures of bulk samples used to test to model. Should be dataframe with first column with unique IDs (does not need to exactly match list of IDs in reference,but should have significant overlap), and rest of columns = samples. Should not have duplicate IDs. May use simulateCellMix function to create this dataframe.

model

A string indicating which model is used to deconvolute the samples. Can be either "nnls" (for non-negative least squares) or "svr" (support vector regression) or "qp" (quadratic programming) or "rlm" (robust linear regression). If not given, defaults to "nnls".

Value

A list, first is a dataframe called proportions which contains predicted cell-type proportions of bulk sample profiles in "bulk", second is called rsq,containing partial-rsq values of results, one value per sample.

Details

deconvolute checks if deconvolution brings advantages on top of the basic bimodal profiles through partial R-squares. The reference matrix usually follows a bimodal distribution in the case of methylation,and taking the average of the rows of methylation matrix might give a pretty similar profile to the bulk methylation profile you are trying to deconvolute.If the deconvolution is advantageous, partial R-squared is expect to be high.

References

Moss, J. et al. (2018). Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nature communications, 9(1), 1-12. https://doi.org/10.1038/s41467-018-07466-6

Examples

data("HumanCellTypeMethAtlas")
bulk_data <- simulateCellMix(10, reference = HumanCellTypeMethAtlas)[[1]]
# non-least negative square regression
results_nnls <- deconvolute(
  bulk = bulk_data,
  reference = HumanCellTypeMethAtlas
)
#> DECONVOLUTION WITH NNLS
#> Warning: executing %dopar% sequentially: no parallel backend registered
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> Warning: essentially perfect fit: summary may be unreliable
#> SUMMARY OF PARTIAL R-SQUARED VALUES FOR NNLS: 
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       1       1       1       1       1       1 
# Quadric programming
results_qp <- deconvolute(
  reference = HumanCellTypeMethAtlas,
  bulk = bulk_data, model = "qp"
)
#> DECONVOLUTION WITH QP
#> SUMMARY OF PARTIAL R-SQUARED VALUES FOR QP: 
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       1       1       1       1       1       1