How to use
JR-Assembler?
0. Confirm the complete of installation
1. Copy or link read files to the corresponding
subdirectory in WORKSPACE as follows
Single read library to Ò./WORKSPACE/singleÓ
Overlapping paired-end library to Ò./WORKSPACE/overlapÓ
Non-overlapping paired-end library to Ò./WORKSPACE/paired-endÓ
Mate-pair library to Ò./WORKSPACE/mate-pairÓ
2. Edit the sample
"configure.txt" file in the home directory of JR-Assembler (see below)
3. Generate the execution script
> ./JR_script configure.txt MyProject.sh
Note: Users can specify names for the
configure (configure.txt) and script file (MyProject.sh), e.g., "./JR_script E.coli.txt Ecoli.sh".
4. Run the execution script
> sh MyProject.sh
5. The
assembly results will be in the "output" subdirectory of
WORKSPACE.
=======================================================================================
How to edit a configure file?
1. We suggest making a copy of "configure.txt" and editing on the copy.
2. Please do not change the order of all items
in the configure file.
3. Lines starting with an "#"
are comments, which will be skipped. Users can add their own comments to
the configure file.
4. There are five major parts in the configure
file (light green regions below, with default values shown).
(a) Paths to
JR-Assembler and third-party executables
# A. Paths to JR-Assembler and third-party executables
JR_Assembler_BIN= ./bin
Project_HOME_DIR= .
SSPACE_PATH= ./public_tools/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl
MDUST_PATH= ./public_tools/mdust/mdust
FLASH_PATH= ./public_tools/FLASH-1.2.7/flash
SOAPec_DIR= ./public_tools/SOAPec_v2.01
Note: If the third-party tools are installed
at other locations, users should set the paths accordingly.
(b) Paths to
input data files
# B. Raw read
information
Project name= test
Number of single-end libraries= 0
# path_to_file read length
# Overlapping paired-end library: number
Number of overlapping paired-end libraries= 1
# path_to_file1 path_to_file2 insert_size read_length
WORKSPACE/overlap/frag_1.fastq
WORKSPACE/overlap/frag_2.fastq 180 101
# Non-overlapping paired-end library: number
Number of non-overlapping paired-end libraries= 0
# path_to_file1 path_to_file2 insert_size read_length
# WORKSPACE/paired-end/pe_1.fastq
WORKSPACE/paired-end/pe_2.fastq 400 101
# Mate-pair library: number
Number of mate-pair libraries= 1
# path_to_file1 path_to_file2 insert_size read_length
WORKSPACE/mate-pair/shortjump_1.fastq
WORKSPACE/mate-pair/shortjump_2.fastq 3500 101
Note:
* Input raw data can be
single read, overlapping or non-overlapping paired-end, and mate-pair libraries in the corresponding folders in WORKSPACE.
* Instead of copying or linking the raw data to the corresponding folders in WORKSPACE,
users can set the paths to the folders holding the raw data directly.
(c) Raw data processing.
# C. Raw
data processing
# Detect and mark low-complexity regions on reads by mdust
Low-complexity region removal=
Yes
# Trim low quality bases at 5Õend
Number of bases at 5' end to be trimmed= 0
# Trim low quality bases at 3Õend
Assembly Read Length= 76
# Correct base by SOAPec
Base correction= Yes
# Connect the overlapping paired read by Flash
Overlapping paired read connection=
Yes
# If Yes, the connected reads will be resampled
into
# a set of reads with
a fixed length as ÓAssembly Read LengthÓ
Resampling read shift= 5
Note:
* Low-complexity region removal (Yes/No)
This removes reads containing one or more
low-complexity regions, detected by mdust, from assembly.
* Number of bases at 5' end to be trimmed (non-negative integer)
When sequencing errors are expected at the 5Õ ends of reads, set
the number of bases at 5' endÓ to be trimmed (default=0).
* Assembly Read Length (positive integer)
Reads longer than this length will be trimmed
from the 3Õ-end to this length for assembly by JR-Assembler. Thus, it also allows users
to trim low quality bases at 3Õ-ends of reads. Note that reads shorter than this length will be used only
for scaffolding.
* Base correction (Yes/No)
With this option, JR-Assembler will call the third-party tool,
SOAPec, to correct putative error bases in reads.
* Overlapping paired-end reads connection (Yes/No)
With this option and the presence of an overlapping
paired-end library, JR-Assembler will call the third-party tool, FLASH, to connect overlapping paired-end reads.
This generates longer reads of various lengths, which will be resampled.
* Step size of sliding window for resampling (positive integer)
For connected reads, JR-Assembler uses a sliding window to resample reads of
a fixed-length. The window size is the "Assembly Read Length" and users can
set the step size.
(d) JR-Assembler kernel parameters
# D. JR-Assembler kernel parameters
Minimum overlap length= 30
Maximum overlap length= 45
Minimum remapping ratio= 0.2
Minimum contig length= 300
Number of thread= 8
Note:
* Minimum overlap length (positive integer).
For extension, JR-Assembler uses only the reads
that overlap at least this length at the tails (default=30).
* Maximum overlap length (positive integer).
The maximumal overlap between reads for extension.
Theoretically, this value can be set
to (Assembly Read Length)-1. However, the searching space and time will increase dramatically.
After a comprehensive test, we suggest setting the minimum and maximum overlap lengths to 30 and 45,
respectively, which results in good performance for reads >100 bp in general.
* Minimum remapping ratio (a float between 0 and 1).
The default value is 0.2. For high
read coverage, e.g., >200X, setting this value to 0.3 speeds up assembly
and decreases false seed extensions.
* Minimum contig length (positive integer).
Contigs shorter than this length will not
be output.
(e) JR-Assembler kernel's parameters
# E. Contig order construction:
Contig order construction= Yes
Small gap filling= Yes
Note:
* Contig order construction
(Yes/No)
If ÒYesÓ, JR-Assembler will call the third-party tool SSPACE
to construct contig order.
* Small gap filling (Yes/No)
If ÒYesÓ, JR-Assembler will close the regions
with few read support in the final step.
|