Skip to content

Getting the data

RStudio setup

A virtual machine with the required tools has been setup and can be accessed via an RStudio server instance at https://rcourse.linux.crg.es. You need to fill in your CRG login credentials to have access to the server. If you are not physically at CRG, you also need to be connected to the CRG VPN.

The RStudio web interface has an integrated linux terminal that can be used to run interactive shell commands when needed. It also has a convenient file browser to access files on the VM. For the whole hands-on we will assume students are connected via the RStudio server.

Get the data

All the data needed for the hands-on is stored in a Github repository. Clone the repository to get the files locally:

git clone https://github.com/guigolab/rnaseq-course

A new folder named rnaseq-course has just been created. The folder contains the reference gene annotation in GTF format (compressed with gzip), and the gene quantification matrices and metadata in TSV format.

You can list the directory contents in a tree-like format with the tree command:

tree rnaseq-course
.
├── quantification
│   ├── metadata.tsv
│   ├── quantification_data.tar.gz
│   ├── raw_counts_full.tsv
│   ├── raw_counts.tsv
│   └── README.md
├── README.md
└── refs
    ├── gencode.v29.primary_assembly.annotation_UCSC_names.gtf.gz
    ├── gencode.v29.tRNAs.gtf.gz
    └── gene_annotation.tsv

3 directories, 9 files