Text Box: JR-Assembler








Detailed Description of JR-Assembler


The complete JR-Assembler pipeline consists of 11 modules (source codes in the ÒsrcÓ directory). Besides resorting to the configure file, users can customize their own pipeline using the modules. To build a customized pipeline, six major steps should be included:

(A) Pre-processing of raw reads (module 2 to module 5),

(B) Building of a read table (module 6),

(C) Assembly of contigs (module 7),

(D) Assembly of scaffolds (use third-party tools),

(E) Pre-processing of gap closing (module 8 to module 10), and

(F) Closing of gaps (module 11).


Below is the detailed description of each module:

1.     JR_script.cpp

(a) Description

JR_script generates a shell script file to execute the whole assembly process in batch.


(b) Usage:

JR_script configure.txt myAssembly.sh

-configure.txt       configure file for assembly, which should be edited for each project

-myAssembly.sh  the generated shell script file


2.     toFasta.cpp

(a) Description

toFasta converts reads in fastq format to fasta format.


(b) Usage:

cat myReads.fq | toFasta > myReads.fa

-myReads.fq         input fastq file

-myReads.fa         output fasta file


3.     filterN.cpp

(a) Description

fileterN filters out reads that contains more than a number of ÔNÕ bases.


(b) Usage:

cat myReads.fa | filterN numOfN > myReads_filterN.fa

-myReads.fa         read fasta file

-numOfN              maximum number of N's allowed in a read

When setting <numofN>  to 0, no ÔNÕ is allowed in a read.

-Myreads_filterN.fa        output fasta file


4.     trimReadKmer.cpp

(a) Description

trimReadKmer trims reads from the 5Õ end and 3Õ end. Also, it generates kmers via scanning a read by a sliding window, which can be omitted by setting <Kmer> to the read length.


(b) Usage:

trimReadKmer 5Õlen 3Õlen Kmer shift

-5Õlen      number of bases at 5Õ end to be trimmed

-3Õlen      number of bases at 3Õ end to be trimmmed

-Kmer     kmer size

-shift       step size of sliding window for generating kmers


5.     filterPolymer.cpp

(a) Description

filterPolymer filters out low-complexity reads. The module first calculates the ratio of each nucleotide in a read. If the highest nucleotide ratio is greater than a cutoff, the read is filtered out.


(b) Usage:

cat myReads.fa |filterPolymer maxRatio > myReads_filterPolymer.fa

-myReads.fa       read fasta file

-maxRatio          cutoff ratio

-myReads_filterPolymer.fa     output fasta file


6.     buildTable.cpp

(a) Description

buildTable generates a read table from reads in fasta format.


(b) Usage:

cat myReads.fa | buildTable > myReads.his

-myReads.fa       input read file

-myReads.his     the generated read table file

The first line of the read table indicates total number of reads and the read length. Read information starts from the second line of the file.


7.     JR.cpp

(a) Description

JR is the kernel module for assembly.


(b) Usage:

JR fileName minOverlap maxOverlap ratio threadNum

-fileName           read table file, generated by buildTable

-minOverlap      minimum overlap length required between reads during assembly

-maxOverlap      maximum overlap length required between reads during assembly

-ratio                   minimum remapping ratio required during assembly

-threadNum       number of threads used for assembly


8.     contigLenFilter.cpp

(a) Description

contigLenFilter filters out contigs by length. This module sets two length parameters <minLength> and <maxLength>. The contigs shorter than <minlength> or longer than <maxLength> are filtered out.


(b) Usage:

contigLenFilter myContigs.fa minLength maxLength > myContigs_filtered.fa

-myContigs.fa    contig file

-minLength        minimum contig length, the contigs shorter than which are filtered out.

-maxLength       maximum contig length, the contigs longer than which are filtered out

-myContigs_filtered.fa   output fasta file


9.     trimContigEnd.cpp

(a) Description

trimContigEnd trims contigs from the 5Õ end and 3Õ end


(b) Usage:

trimContigEnd myContigs.fa trimLength > myContigs_trimmed.fa

-myContigs.fa    contig file

-trimLenth         the number of bases at the 5Õ end and 3Õ end to be trimmed

-myContigs_trimmed.fa output read fasta


10.  getUnused.cpp

(a) Description

getUnused outputs the reads that are not assembled in the contigs.


(b) Usage:

getUnused MyContigs.fa MyReadTable.his > Myreads_unused.fa

-MyContigs.fa                 contig fasta file

-MyReadTable.his          read table file, output by buildTable

-Myreads_unused.fa       output read file


11.  JRgapcloser.cpp

(a) Description

JRgapcloser closes gaps between contigs using un-assembled reads or short contigs that can bridge two adjacent contigs.


(b) Usage:

JRgapcloser myContigs.fa gaps.fa myScaffold.evidence > myContigs_gapcloser.fa

-myContigs.fa    contig fasta file

-gaps.fa               file of the un-assembled reads and/or short contigs

-myScaffold.evidence     evidence file, containing the order information of contigs and is output by the third-party tool SSPACE

-myContig_gapcloser.fa output fasta file









Questions: jr-assembler@iis.sinica.edu.tw