Welcome to genomation
This is an R package that contains a collection of tools for visualizing and analyzing genome-wide data sets. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations.
Features
- Ability to work with multiple flat file formats such as BED, GFF, BAM, bigWig and generic tabular text files
- Annotation of genomic intervals. For example, you can see what percentage of your genomic intervals overlaps with exon/intron/promoter annotation.
- Summary of genomic scores or read coverages over pre-defined regions. The pre-defined regions can be equi-width such as region around TSS or could be a set of regions with varying widths such as CpG islands. This operation will result in a matrix or set of matrices containing scores for each pre-defined region
- Visualisation of summary matrices such as meta-region(meta-gene, meta-promoter, etc.) plots or heatmaps. These functions can employ K-means clustering of rows of summary matrices as well.
Documentation
See the package vignette here. Every function has proper documentation and examples reachable via help() function in R.
See blogposts on genomation http://zvfak.blogspot.com/search/label/genomation
Citation
Akalin A, Franke V, Vlahoviček K, Mason CE, Schübeler D. genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics. 2014 Nov 21. pii: btu775
Installation
Install via Bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("genomation")
Install the latest version via devtools::install_github
You can install genomation via install_github() function from devtools package.
# Install dependencies
install.packages( c("data.table","plyr","reshape2","ggplot2","gridBase","devtools"))
source("http://bioconductor.org/biocLite.R")
biocLite(c("GenomicRanges","rtracklayer","impute","Rsamtools"))
# install the packages
library(devtools)
install_github("BIMSBbioinfo/genomation",build_vignettes=FALSE)
# install the data package to be able to run examples in the vignette
install_github("frenkiboy/genomationData",build_vignettes=FALSE)
Data import
Functions such as readBed, gff2GRanges, readTranscriptFeatures and readGeneric can read multiple flat file formats as GRanges objects into R.
# Read a BED12 file and return a GRangesList object with promoters, exons and introns
my.bed12.file = system.file("extdata/chr21.refseq.hg19.bed", package = "genomation")
my.bed12.file
feats = readTranscriptFeatures(my.bed12.file)
# Read a generic tabular text file containing genomic locations
my.file=system.file("extdata","chr21.refseq.hg19.bed",package="genomation")
refseq = readGeneric(my.file,chr=1,start=2,end=3,strand=NULL,
meta.cols=list(score=5,name=4),
keep.all.metadata=FALSE, zero.based=TRUE)
Summarize GRanges object on defined regions such as promoters
You can summarize GRanges objects that overlap with a set of promoters and return a ScoreMatrix object. The object will contain scores for each base in each promoter, columns correspond to bases and rows correspond to promoters.
data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE,
weight.col="tpm")
Summarize BAM files on pre-defined regions
BAM files can also be used in ScoreMatrix() function as well.
bam.file = system.file('tests/test.bam', package='genomation')
windows = GRanges(rep(c(1,2),each=2), IRanges(rep(c(1,2), times=2), width=5))
scores3 = ScoreMatrix(target=bam.file,windows=windows, type='bam')
Summarize BigWig files on pre-defined regions
You can also use bigWig files in ScoreMatrix() function.
bw.file = system.file('tests/test.bw', package='rtracklayer')
windows = GRanges(rep('chr2',each=4), IRanges(start=c(250,350,450,550), width=50))
scores3 = ScoreMatrix(target=bw.file ,windows=windows, type='bigWig')
Visualize summary matrices as heatmap
ScoreMatrix or ScoreMatrixList objects can be visualized with heatMatrix, multiHeatMatrix, plotMeta and heatMeta functions.
data(cage)
data(promoters)
scores1=ScoreMatrix(target=cage,windows=promoters,strand.aware=TRUE)
data(cpgi)
scores2=ScoreMatrix(target=cpgi,windows=promoters,strand.aware=TRUE)
sml=new("ScoreMatrixList",list(a=scores1,b=scores2))
multiHeatMatrix(sml,kmeans=TRUE,k=2,matrix.main=c("cage","CpGi"),cex.axis=0.8)
Visualize summary matrices as meta-region plots
plotMeta(mat=sml,overlay=TRUE,main="my plotowski")
heatMeta(mat=sml,main="my plotowski")
Authors and Contributors
Vedran Franke (@frenkiboy) and Altuna Akalin (@al2na) initially authored this package. Check here to see other contributors . You can contribute by checking out the "development" branch, making changes and submitting a pull request.
Support or Contact
send an e-mail to genomation@googlegroups.com or use the web interface to post a question https://groups.google.com/forum/#!forum/genomation