Detailed Description of JR-Assembler
The complete JR-Assembler pipeline consists of 11 modules (source codes in the ÒsrcÓ
directory). Besides resorting to the configure file,
users can customize their own pipeline using the modules. To build a customized pipeline, six major steps should
be included:
(A) Pre-processing of raw reads (module 2 to
module 5),
(B) Building of a read table (module 6),
(C) Assembly of contigs (module 7),
(D) Assembly of scaffolds (use third-party tools),
(E) Pre-processing of gap closing (module 8 to
module 10), and
(F) Closing of gaps (module 11).
Below is the
detailed description of each module:
1.
JR_script.cpp
(a) Description
JR_script generates a shell script file
to execute the whole assembly process in batch.
(b) Usage:
JR_script configure.txt myAssembly.sh
-configure.txt configure file for
assembly, which should be edited for each project
-myAssembly.sh the generated shell script file
2.
toFasta.cpp
(a) Description
toFasta converts reads in
fastq format to fasta format.
(b) Usage:
cat myReads.fq | toFasta
> myReads.fa
-myReads.fq input fastq file
-myReads.fa output fasta file
3.
filterN.cpp
(a) Description
fileterN filters out reads that
contains more than a number of ÔNÕ bases.
(b) Usage:
cat
myReads.fa | filterN numOfN > myReads_filterN.fa
-myReads.fa read fasta file
-numOfN maximum
number of N's allowed in a read
When setting <numofN> to 0, no ÔNÕ is
allowed in a read.
-Myreads_filterN.fa output fasta file
4.
trimReadKmer.cpp
(a) Description
trimReadKmer trims reads from the 5Õ
end and 3Õ end. Also, it generates kmers via scanning a read by a sliding window, which
can be omitted by setting <Kmer> to the read length.
(b) Usage:
trimReadKmer 5Õlen 3Õlen Kmer shift
-5Õlen number of bases at 5Õ end to be trimmed
-3Õlen number of bases at 3Õ end to be trimmmed
-Kmer kmer size
-shift step size of sliding window for
generating kmers
5.
filterPolymer.cpp
(a) Description
filterPolymer filters out low-complexity reads. The module first
calculates the ratio of each nucleotide in a read. If the highest
nucleotide ratio is greater than a cutoff, the read is filtered
out.
(b) Usage:
cat
myReads.fa |filterPolymer maxRatio
> myReads_filterPolymer.fa
-myReads.fa read fasta file
-maxRatio cutoff
ratio
-myReads_filterPolymer.fa output fasta file
6.
buildTable.cpp
(a) Description
buildTable generates a read table from
reads in fasta format.
(b) Usage:
cat
myReads.fa | buildTable > myReads.his
-myReads.fa input read file
-myReads.his the generated read table file
The
first line of the read table indicates total number of reads and the read
length. Read information starts from the second line of the file.
7.
JR.cpp
(a) Description
JR is the kernel module for
assembly.
(b) Usage:
JR fileName minOverlap maxOverlap ratio threadNum
-fileName read table file, generated by buildTable
-minOverlap minimum overlap length required between
reads during assembly
-maxOverlap maximum overlap length required
between reads during assembly
-ratio minimum
remapping ratio required during assembly
-threadNum number of threads used for
assembly
8.
contigLenFilter.cpp
(a) Description
contigLenFilter filters out contigs by
length. This module sets two length parameters
<minLength> and <maxLength>. The contigs shorter than
<minlength> or longer than <maxLength> are filtered out.
(b) Usage:
contigLenFilter myContigs.fa minLength
maxLength > myContigs_filtered.fa
-myContigs.fa contig file
-minLength minimum contig length, the contigs shorter than which are filtered out.
-maxLength maximum contig length, the contigs longer than which are filtered out
-myContigs_filtered.fa output fasta file
9.
trimContigEnd.cpp
(a) Description
trimContigEnd trims contigs from the
5Õ end and 3Õ end
(b) Usage:
trimContigEnd myContigs.fa trimLength > myContigs_trimmed.fa
-myContigs.fa contig file
-trimLenth the number of
bases at the 5Õ end and 3Õ end to be trimmed
-myContigs_trimmed.fa output read fasta
10.
getUnused.cpp
(a) Description
getUnused outputs the reads that are
not assembled in the contigs.
(b) Usage:
getUnused MyContigs.fa MyReadTable.his
> Myreads_unused.fa
-MyContigs.fa contig fasta file
-MyReadTable.his read table file, output by buildTable
-Myreads_unused.fa output read file
11.
JRgapcloser.cpp
(a) Description
JRgapcloser closes gaps between contigs
using un-assembled reads or short contigs that can bridge two adjacent
contigs.
(b) Usage:
JRgapcloser myContigs.fa gaps.fa
myScaffold.evidence > myContigs_gapcloser.fa
-myContigs.fa contig fasta file
-gaps.fa file
of the un-assembled reads and/or short contigs
-myScaffold.evidence evidence file, containing the order
information of contigs and is output by the third-party tool
SSPACE
-myContig_gapcloser.fa output fasta file
|