Text Box: JR-Assembler

 

 

 

Home

Introduction

Download

Installation

How to use

Instructions

Patch log

FAQ

Last modified

2014-09-17

 

 

 

How to use JR-Assembler?

 

0. Confirm the complete of installation

1. Copy or link read files to the corresponding subdirectory in WORKSPACE as follows

Single read library to Ò./WORKSPACE/singleÓ

Overlapping paired-end library to Ò./WORKSPACE/overlapÓ

Non-overlapping paired-end library to Ò./WORKSPACE/paired-endÓ

Mate-pair library to Ò./WORKSPACE/mate-pairÓ

2. Edit the sample "configure.txt" file in the home directory of JR-Assembler (see below)

3. Generate the execution script

> ./JR_script configure.txt MyProject.sh

Note: Users can specify names for the configure (configure.txt) and script file (MyProject.sh), e.g., "./JR_script E.coli.txt Ecoli.sh".

4. Run the execution script

> sh MyProject.sh

5. The assembly results will be in the "output" subdirectory of WORKSPACE.

=======================================================================================

How to edit a configure file?

1. We suggest making a copy of "configure.txt" and editing on the copy.

2. Please do not change the order of all items in the configure file.

3. Lines starting with an "#" are comments, which will be skipped. Users can add their own comments to the configure file.

4. There are five major parts in the configure file (light green regions below, with default values shown).

    (a) Paths to JR-Assembler and third-party executables

# A. Paths to JR-Assembler and third-party executables

JR_Assembler_BIN= ./bin

Project_HOME_DIR= .

SSPACE_PATH= ./public_tools/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl

MDUST_PATH= ./public_tools/mdust/mdust

FLASH_PATH= ./public_tools/FLASH-1.2.7/flash

SOAPec_DIR= ./public_tools/SOAPec_v2.01

Note: If the third-party tools are installed at other locations, users should set the paths accordingly.

 

    (b) Paths to input data files

# B. Raw read information

Project name= test

Number of single-end libraries= 0

# path_to_file read length

 

# Overlapping paired-end library: number

Number of overlapping paired-end libraries= 1

# path_to_file1 path_to_file2 insert_size read_length

WORKSPACE/overlap/frag_1.fastq WORKSPACE/overlap/frag_2.fastq 180 101

 

# Non-overlapping paired-end library: number

Number of non-overlapping paired-end libraries= 0

# path_to_file1 path_to_file2 insert_size read_length

# WORKSPACE/paired-end/pe_1.fastq WORKSPACE/paired-end/pe_2.fastq 400 101

 

# Mate-pair library: number

Number of mate-pair libraries= 1

# path_to_file1 path_to_file2 insert_size read_length

WORKSPACE/mate-pair/shortjump_1.fastq WORKSPACE/mate-pair/shortjump_2.fastq 3500 101

Note:

* Input raw data can be single read, overlapping or non-overlapping paired-end, and mate-pair libraries in the corresponding folders in WORKSPACE.

* Instead of copying or linking the raw data to the corresponding folders in WORKSPACE, users can set the paths to the folders holding the raw data directly.

(c) Raw data processing.

# C. Raw data processing

# Detect and mark low-complexity regions on reads by mdust

Low-complexity region removal= Yes

 

# Trim low quality bases at 5Õend

Number of bases at 5' end to be trimmed= 0

 

# Trim low quality bases at 3Õend

Assembly Read Length= 76

 

# Correct base by SOAPec

Base correction= Yes

 

# Connect the overlapping paired read by Flash

Overlapping paired read connection= Yes

 

# If Yes, the connected reads will be resampled into

# a set of reads with a fixed length as ÓAssembly Read LengthÓ

Resampling read shift= 5

Note:

* Low-complexity region removal (Yes/No)

This removes reads containing one or more low-complexity regions, detected by mdust, from assembly.

* Number of bases at 5' end to be trimmed (non-negative integer)

When sequencing errors are expected at the 5Õ ends of reads, set the number of bases at 5' endÓ to be trimmed (default=0).

* Assembly Read Length (positive integer)

Reads longer than this length will be trimmed from the 3Õ-end to this length for assembly by JR-Assembler. Thus, it also allows users to trim low quality bases at 3Õ-ends of reads. Note that reads shorter than this length will be used only for scaffolding.

* Base correction (Yes/No)

With this option, JR-Assembler will call the third-party tool, SOAPec, to correct putative error bases in reads.

* Overlapping paired-end reads connection (Yes/No)

With this option and the presence of an overlapping paired-end library, JR-Assembler will call the third-party tool, FLASH, to connect overlapping paired-end reads. This generates longer reads of various lengths, which will be resampled.

* Step size of sliding window for resampling (positive integer)

For connected reads, JR-Assembler uses a sliding window to resample reads of a fixed-length. The window size is the "Assembly Read Length" and users can set the step size.

 

(d) JR-Assembler kernel parameters

# D. JR-Assembler kernel parameters

Minimum overlap length= 30

Maximum overlap length= 45

Minimum remapping ratio= 0.2

Minimum contig length= 300

Number of thread= 8

Note:

* Minimum overlap length (positive integer).

For extension, JR-Assembler uses only the reads that overlap at least this length at the tails (default=30).

* Maximum overlap length (positive integer).

The maximumal overlap between reads for extension. Theoretically, this value can be set to (Assembly Read Length)-1. However, the searching space and time will increase dramatically. After a comprehensive test, we suggest setting the minimum and maximum overlap lengths to 30 and 45, respectively, which results in good performance for reads >100 bp in general.

* Minimum remapping ratio (a float between 0 and 1).

The default value is 0.2. For high read coverage, e.g., >200X, setting this value to 0.3 speeds up assembly and decreases false seed extensions.

* Minimum contig length (positive integer).

Contigs shorter than this length will not be output.

 

(e) JR-Assembler kernel's parameters

# E. Contig order construction:

Contig order construction= Yes

Small gap filling= Yes

Note:

* Contig order construction (Yes/No)

If ÒYesÓ, JR-Assembler will call the third-party tool SSPACE to construct contig order.

* Small gap filling (Yes/No)

If ÒYesÓ, JR-Assembler will close the regions with few read support in the final step.

 

 

 

 

 

 

Questions: jr-assembler@iis.sinica.edu.tw