Getting the data
RStudio setup
A virtual machine with the required tools has been setup and can be accessed via an RStudio server instance at https://rcourse.linux.crg.es. You need to fill in your CRG login credentials to have access to the server. If you are not physically at CRG, you also need to be connected to the CRG VPN.
The RStudio web interface has an integrated linux terminal that can be used to run interactive shell commands when needed. It also has a convenient file browser to access files on the VM. For the whole hands-on we will assume students are connected via the RStudio server.
Get the data
All the data needed for the hands-on is stored in a Github repository. Clone the repository to get the files locally:
git clone https://github.com/guigolab/rnaseq-course
A new folder named rnaseq-course
has just been created. The folder contains
the reference gene annotation in
GTF format (compressed
with gzip), and the gene quantification
matrices and metadata in TSV format.
You can list the directory contents in a tree-like format with the tree
command:
tree rnaseq-course
.
├── quantification
│ ├── metadata.tsv
│ ├── quantification_data.tar.gz
│ ├── raw_counts_full.tsv
│ ├── raw_counts.tsv
│ └── README.md
├── README.md
└── refs
├── gencode.v29.primary_assembly.annotation_UCSC_names.gtf.gz
├── gencode.v29.tRNAs.gtf.gz
└── gene_annotation.tsv
3 directories, 9 files