Contents

  1. Installation
  2. Genome simulation
    1. Reference genome
    2. Sandy example for genome
    3. Sandy example for genome on Docker
  3. Transcriptome simulation
    1. Reference annotation
    2. Sandy example for transcriptome
    3. Sandy example for transcriptome on Docker

We have designed Sandy based on three principles:

  • to be easy to install;
  • to be easy to use;
  • to resemble variabilities found in a real NGS assay.

Installation

Sandy is easy to install in the three most commonly used operating systems (OS): Linux, Apple’s macOS, and Microsoft Windows. For more details, see the section Installation.

Genome simulation

Sandy is easy to use because it requires only an input (fasta) file in a streamline command line to simulate DNA and RNA sequencing for Illumina’s, PacBio, and Oxford Nanopore platforms. The user needs to provide only the reference genomic (for simulating DNA sequencing) or transcriptomic data (for simulating RNA sequencing) in fasta format and run Sandy command-line interface. For example, to simulate a whole-genome sequencing (human genome) in an Illumina HiSeq platform, users need to type the following command only:

Reference genome

If you don’t have the reference genome, first follow this step:

$ wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

or

$ curl https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

Sandy example for genome

with quality-profile for Illumina HiSeq 101 read length and coverage of 1x.

$ sandy genome -v -q hiseq_101 -c 1 hg38.fa.gz

Sandy example for genome on Docker

$ docker run \
    --rm \
    -u $(id -u):$(id -g) \
    -v $(pwd -P):/mnt \
    -w /mnt \
    galantelab/sandy genome -v -q hiseq_101 -c 1 hg38.fa.gz

Transcriptome simulation

It is also straightforward to simulate an RNA sequencing (RNAseq) run using Sandy. The line below is an example of an RNAseq simulation for the Illumina HiSeq platform with 30 million paired-end reads of 101 bases in length.

Reference annotation

If you don’t have the transcripts fasta file, first follow this step:

$ wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.transcripts.fa.gz;

or

$ curl http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.transcripts.fa.gz;

Sandy example for transcriptome

$ sandy transcriptome -v -q hiseq_101 -f liver -n 30000000 gencode.v40.transcripts.fa.gz

Sandy example for transcriptome on Docker

$ docker run \
    --rm \
    -u $(id -u):$(id -g) \
    -v $(pwd -P):/mnt \
    -w /mnt \
    galantelab/sandy transcriptome -v -q hiseq_101 -f liver -n 30000000 gencode.v40.transcripts.fa.gz