Practical tools for quick data visualization#

When starting a new project, designing new methods for data analysis or even when preparing a new publication, data visualization is essential to both explore new hypotheses and to validate our findings.

With this crash talk I aim to spread the word on the existence of a series of packages that have made my life much easier when it comes to exploratory data analysis.

In case you are reading this from this project”s repository checkout the webpage for a nicer experience.

Before starting#

Please, make sure to have R and RStudio installed with the following packages:

  • Data Wrangling

    • here: to automate locating your root directory.

    • tidyverse: to read, wrangle and write tables.

    • Biology-specific

      • clusterProfiler: to perform gene set enrichment analyses.

      • seqinr: to read and write files with biological sequences.

      • ape: to compute distances between aligned sequences.

      • Biostrings: to wrangle biological sequences.

  • Visualization

    • ggplot2: powerful suite for general data visualization in R.

    • ggpubr: ggplot metapackage that wraps many funcitonalities together.

    • ggtree: to visualize multiple sequence alignments as trees.

    • ComplexHeatmap: to make heatmaps.

    • gridExtra: to arrange multiple plots not created through ggplot.

    • ggplotify: converts any plot into a ggplot.

    • showtext: edit fonts more easily in R graphs.

    • countrycode: get country code names.

Here”s the code snipped in case you need to install some of them.

# install packages from CRAN
# wrangling
install.packages("here", dependency=TRUE)
install.packages("tidyverse", dependency=TRUE)
install.packages("seqinr", dependency=TRUE)
install.packages("ape", dependency=TRUE)

# visualization
install.packages("ggplot2", dependency=TRUE)
install.packages("ggpubr", dependency=TRUE) # requires libcurl4 and libnlopt-dev in ubuntu
install.packages("gridExtra", dependency=TRUE)
install.packages("ggplotify", dependency=TRUE)

# extra (optional)
install.packages("showtext", dependecy=TRUE)
install.packages("countrycode", dependecy=TRUE)

# install packages from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)){ install.packages("BiocManager") }
BiocManager::install("clusterProfiler")
BiocManager::install("ggtree")
BiocManager::install("Biostrings")
BiocManager::install("ComplexHeatmap")
BiocManager::install("org.Hs.eg.db")

Then, you can also download the notebooks for the talk with:

git clone https://github.com/MiqG/practical_tools_for_quick_data_visualization.git

or just by clicking here.

Reproducibility#

In case you”d like to re-create all the outputs from the repository, you will need to have a github account and to install jupyter-book and ghp-import. Then, you can run

bash run_all.sh

Enjoy!