Master’s/Internship projects

BIMSB Bioinformatics platform has multiple project themes available for Master’s or internship projects. Applicants must have the relevant background for each project.

Research Software sustainability

Reproducibility of scientific workflows is a general problem across all fields of science including computation and data analysis heavy fields (see 1 for reference). For data analysis or computational work it is desirable to install the exact same version as published research software in order to enable reproduction of published data and controlled manipulation or augmentation of the software system. At the MDC, we have been using GNU Guix for more than three years to build scientific software at different versions and variants, and to manage software environments in a reproducible fashion. We have also team members who are main contributors to the GNU Guix project. We are looking for new members who can help improve our workflow. Our goal is to implement a system based on GNU Guix and Cuirass, by which we can build a wide range of scientific software continuously and automatically in a bit-for-bit reproducible fashion and offer the build results to Guix users.

Specific tasks

What will you get out of this?

What do you need to know?

Contact

Please contact ricardo.wurmus@mdc-berlin.de or altuna.akalin@mdc-berlin.de for this project

Other projects

1) Methods for DNA modification analysis

DNA methylation and other DNA modifications such as hydroxymethylation are implicated in gene regulation and their mis- regulation is shown to cause cancer. With the advent of then next- generation sequencing, measuring genome-wide DNA methylation levels became possible. However, this also created a demand for high-quality software for analysis of large-scale DNA methylation data sets. In this project, the aim is to help develop data processing, machine learning and statistical modeling tools for DNA methylation analysis to be integrated to our existing software methylKit (https://code.google.com/p/methylkit/)

Multiple sub-projects available

2) Methods for genomics data integration and visualization

Data integration and processing is a vital tool in genomics for knowledge discovery. The number of public datasets are increasing by the day thanks to multiple large consortiums producing genomics data sets, such as ENCODE, Roadmap Epigenomics and EU Blueprint. We are building data integration and visualization methods. One example is our genomation package (http://www.bioconductor.org/packages/devel/bioc/html/genomation .html). The aim of the projects in this theme is to further develop genomation or other unpublished packages adding new methods and increasing data processing and visualization capabilities.

Multiple sub-projects available

3) Pure data analysis projects

Our lab has broad interest in gene regulation and epigenomics. We have more data analysis oriented projects that require less method development but more data processing, integration and applied statistics.

Multiple sub-projects available

4) Developing bioinformatics tools and workflows for Galaxy

We are also aiming to integrate and develop BIMSB bioinformatics tools to Galaxy framework. These projects will include integrating tools with galaxy and making complete workflows where the user can interact through a web-browser.

Multiple sub-projects available

What do you need to know?

What will you get out of this?


Designed by Altuna Akalin, powered by foundation