A function to construct a signature matrix

findSignatures(
  samples,
  sampleMeta,
  atlas = NULL,
  variation_cutoff = NULL,
  K = 100,
  IDs = NULL,
  tissueSpecCpGs = FALSE,
  tissueSpecDMPs = FALSE
)

Arguments

samples

dataframe, has first column IDs, rest of columns are samples (must have column name as sample accession ID which should be found in sampleMeta), rows are units of signature (e.g. CpGs)

sampleMeta

dataframe, must have first column for accession ID of each sample, and second column for cell type of sample, rows are samples

atlas

dataframe, the reference atlas to which new signatures can be added, if not present then a new reference atlas will be created using sample(s). Should be dataframe with column for each cell type, rows units of signature (e.g. CpGs)

variation_cutoff

either a number between 0 to 1, or NULL.For multiple samples from the same cell type, ignore CpGs with variation > variation_cutoff with that cell type. defaults to NULL (i.e. no cutoff)

K

only valid when tissueSpecCpGs is TRUE. K is the number of top methylation signature to be extracted.

IDs

the name of the column indicates ids

tissueSpecCpGs

if TRUE and atlas provided, it will extract tissue specific CpGs.

tissueSpecDMPs

if TRUE and atlas provided, it will extract tissue specific DMPs. Note that both tissueSpecCpGs and tissueSpecDMPs can't be TRUE at the same time.

Value

A dataframe extendedAtlas which contains all cell types in atlas (if given), and those in samples added by cell type, has first column "IDs", rest of columns are cell types, rows are have first cell with the ID (e.g. CpG ID) and then values of signature (e.g. methylation values) If tissueSpecCpGs is TRUE, it will return a list of list containing tissue specific methylation signatures. If tissueSpecDMPs is TRUE, it will return a list containing tissue specific DMPs.

Examples

data("HumanCellTypeMethAtlas")
exampleSamples <- simulateCellMix(1,
  reference = HumanCellTypeMethAtlas
)$simulated
exampleMeta <- data.table(
  "Experiment_accession" = "example_sample",
  "Biosample_term_name" = "example_cell_type"
)
colnames(exampleSamples) <- c("CpGs", "example_sample")
colnames(HumanCellTypeMethAtlas)[1] <- c("CpGs")

signatures <- findSignatures(
  samples = exampleSamples,
  sampleMeta = exampleMeta,
  atlas = HumanCellTypeMethAtlas,
  IDs = "CpGs", tissueSpecCpGs = FALSE
)
#> CELL TYPES IN EXTENDED ATLAS: 
#> example_cell_type 
#> Monocytes_EPIC 
#> B.cells_EPIC 
#> CD4T.cells_EPIC 
#> NK.cells_EPIC 
#> CD8T.cells_EPIC 
#> Neutrophils_EPIC 
#> Erythrocyte_progenitors 
#> Adipocytes 
#> Cortical_neurons 
#> Hepatocytes 
#> Lung_cells 
#> Pancreatic_beta_cells 
#> Pancreatic_acinar_cells 
#> Pancreatic_duct_cells 
#> Vascular_endothelial_cells 
#> Colon_epithelial_cells 
#> Left_atrium 
#> Bladder 
#> Breast 
#> Head_and_neck_larynx 
#> Kidney 
#> Prostate 
#> Thyroid 
#> Upper_GI 
#> Uterus_cervix 
signatures <- findSignatures(
  samples = exampleSamples,
  sampleMeta = exampleMeta,
  atlas = HumanCellTypeMethAtlas,
  IDs = "CpGs", K = 100, tissueSpecCpGs = TRUE
)
#> CELL TYPES IN EXTENDED ATLAS: 
#> example_cell_type 
#> Monocytes_EPIC 
#> B.cells_EPIC 
#> CD4T.cells_EPIC 
#> NK.cells_EPIC 
#> CD8T.cells_EPIC 
#> Neutrophils_EPIC 
#> Erythrocyte_progenitors 
#> Adipocytes 
#> Cortical_neurons 
#> Hepatocytes 
#> Lung_cells 
#> Pancreatic_beta_cells 
#> Pancreatic_acinar_cells 
#> Pancreatic_duct_cells 
#> Vascular_endothelial_cells 
#> Colon_epithelial_cells 
#> Left_atrium 
#> Bladder 
#> Breast 
#> Head_and_neck_larynx 
#> Kidney 
#> Prostate 
#> Thyroid 
#> Upper_GI 
#> Uterus_cervix 
#> Unique used IDs: 0
#> example_cell_type
#> Unique used IDs: 200
#> Monocytes_EPIC
#> Unique used IDs: 400
#> B.cells_EPIC
#> Unique used IDs: 600
#> CD4T.cells_EPIC
#> Unique used IDs: 800
#> NK.cells_EPIC
#> Unique used IDs: 1000
#> CD8T.cells_EPIC
#> Unique used IDs: 1200
#> Neutrophils_EPIC
#> Unique used IDs: 1400
#> Erythrocyte_progenitors
#> Unique used IDs: 1600
#> Adipocytes
#> Unique used IDs: 1800
#> Cortical_neurons
#> Unique used IDs: 2000
#> Hepatocytes
#> Unique used IDs: 2200
#> Lung_cells
#> Unique used IDs: 2400
#> Pancreatic_beta_cells
#> Unique used IDs: 2600
#> Pancreatic_acinar_cells
#> Unique used IDs: 2800
#> Pancreatic_duct_cells
#> Unique used IDs: 3000
#> Vascular_endothelial_cells
#> Unique used IDs: 3200
#> Colon_epithelial_cells
#> Unique used IDs: 3400
#> Left_atrium
#> Unique used IDs: 3600
#> Bladder
#> Unique used IDs: 3800
#> Breast
#> Unique used IDs: 4000
#> Head_and_neck_larynx
#> Unique used IDs: 4200
#> Kidney
#> Unique used IDs: 4400
#> Prostate
#> Unique used IDs: 4600
#> Thyroid
#> Unique used IDs: 4800
#> Upper_GI
#> Unique used IDs: 5000
#> Uterus_cervix