Documentation

The project use the Jupyter software package (JupyterHub, JupyterLab, and the Jupyter Notebook). The tools hosted inside the project are all free and mostly open-source, some of them you may already know. The roadmap of future development can be found on the project’s website.

The notebooks used by the Jupyter software provide convenient way to develop reusable, reproducible, and well documented code. In one cell you can document your code in Markdown format, in other cells you can write code of your preferred language. See the Environment section for currently available kernels and software packages.

If you need help with the JupyterLab / Jupyter Notebook, please visit their website(s) on the links below for detailed documentation.

https://jupyterlab.readthedocs.io/en/stable/user/interface.html https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface

IMPORTANT NOTICE

Please note that this service is in very early stage of development, and changes will come unannounced. Idle containers will be terminated and deleted to free up resources.

Please keep in mind the following:

  • Changes during maintenance may corrupt or delete your work
  • Use the `private` directory to save your work. This directory resides on the server outside the container.
  • Your personal container is shut down and deleted when you log out to free up resources for other users.
  • Use the `public` directory to share your work / files with others. This directory is also persistent, and is synchronized with the `shared` directory.
  • The `shared` directory is where you find files shared by others. This directory is accessible to every user, but is read-only. If you want to modify these files you have to make a copy to your own directories and work there.

General

Every user is provided with a sandboxed environment (Docker container), and the containers will be destroyed during updates / upgrades. Data saved in the `private` and `public` directories are saved outside this container to the persistent storage of the underlying server, thus will survive deletion of the container.

The `public` directory is for sharing work with others.Each user is provided with their own instance of this directory that they can use to put data / files they wish to share with others. The contents of this directory are synchronized with a common `shared` folder.

The `shared` folder is accessible by every user, and it contains the union of every user’s `public` folder. This folder is mounted as read-only. This prevents concurrent modification and accidental overwriting of other’s work. If you need to make adjustments to someone else’s work please make copy of it into your own private folder, and apply your modifications there.

The `tools` folder holds binaries you can use with `!` cell magic, or from the terminal.

The `EMBL` folder contains the python client files for EBML webservices. You can use them from a terminal as regular python files or from code.

Environment

Base

  • Python 3.7 kernel
  • Biopython 1.76
  • R 3.6.3 Kernel
  • Bioconductor package manager
  • File upload limit 100 MB

Python packages

  • pandas 0.25.1
  • nump 1.17.2
  • matplotlib 3.1.1
  • scipy 1.3.1
  • scikit-image 0.15.0
  • scikit-learn 0.21.3
  • statsmodels 0.10.1
  • Bokeh 2.0.0

Extras

  • RISE 5.6.1 –> for making interactive presentations
  • pandoc 2.9.2-1 for exporting notebooks in various formats (eg PDF)
  • EMBL Python client (from https://github.com/ebi-wp/webservice-clients)

How to run the standalone version

The core of the project, that is the jupyter-notebook image, can be run standalone using Docker. If you don’t have Docker please refer to the following website on how to install it: https://docs.docker.com/install/ .

Once you have Docker installed you can run the standalone version of the notebook image with the following command: docker run --name datahub-bio-notebook -p 8888:8888 datahubproject/datahub:bio-notebook-standalone

Explanation:

  • –name –> optional argument to name the container for easier identification later on
  • -p 8888:8888 –> publish the internal port 8888 to localhost:8888

After this your instance of the notebook image running the jupyter-lab interface and all the pre-installed goodies will be reachable at http://localhost:8888/

Important! Running the image without mounting any local directory will have limited persistence. To mount your home directory (assuming Linux operating system) use the following argument in the command line: -v {path-to-your-homedir}:/home/biodatahub/data.
This will mount the directory you specify as {path-to-your-homedir} into the container under the path /home/biodatahub/data as read-write.

The whole command to start the container with mounted directory is: docker run --name datahub-bio-notebook -p 8888:8888 -v {path-to-your-homedir}:/home/biodatahub/data datahubproject/datahub:bio-notebook-standalone

I wish you productive work!

Best wishes, Dr. Nandor Poka