JR-Assembler
Frequently Asked Questions (FAQs)
Q1. Is JR-Assembler a free software?
Yes, it is free for academic usage.
Q2. What kind of machine do I need to run JR-Assembler?
JR-Assembler was
developed on a 64-bit Linux and has been tested on Linux
and Mac OS..
Q3. What kinds of reads can I use for JR-Assembler?
JR-Assembler
accepts single-end, overlapping and non-overlapping paired-end, and mate-pair reads.
Q4. Can I use Illumina data along with 454
data?
JR-Assembler was
designed mainly for Illumina data. One way to incorporate
454 data is to use the module trimReadKmer
(see Instructions) to transform 454 data into fixed-length kmers because currently JR-Assembler only treats reads of the same
length.
Generation of the executable
script
Q1. Why can't I use JR_script to generate the executable script?
JR_script
first checks whether all the required tools can be accessed before generating
an execution script file from the configure file. If any of the tools cannot be found
, the execution script will not be generated.
Execution problem
Q1. Why JR-Assembler assembly fails when using reads of various lengths?
The
current version of JR-Assembler only treats reads of the
same length because the kernel was designed for a
fixed length data structure.
Q2. Why JR-Assembler produces no or
only small subset of the assembly?
A possible reason
is that the input reads have too many sequencing errors, thus only few overlaps between reads can be found. There
are two solutions for such a case: base correction and 3Õ-end trimming. If the input reads contain
moderate sequencing errors, a base correction tool is suggested because it does not shorten the reads. In contrast, if the sequencing error rate is high, trimming reads at the 3Õ-end
is recommended because most errors occur at the 3Õ-end. Users can explore the quality score
to determine the number of bases to be trimmed. Alternatively, one can try several values
and pick the assembly with the longest N50 length.
We plan to make
JR-Assembler output the proportion of seed usage, which is the number of seeds used for either seed or contig
extension over the total number of seeds. A high ratio indicates that most
seeds can either find overlapping reads for extension or are included in the assembled contigs. Thus, the sequencing
quality of input reads should be good. In contrast, a low ratio
indicates that most seeds cannot find overlapping reads for
extension, which implies high sequencing errors. In
this case, trimming reads at the 3Õ-end is recommended.
Sequencing experiment
design
Q1. Is there a sequencing strategy suggested
for using JR-Assembler?
Similar to the strategies recommended by ALLPATHS-LG (7), we recommend to input an overlapping paired-end library to generate
longer connected reads for contig assembly and
several mate-pair reads with various insert lengths for
long distant jumping during scaffolding. The average genome coverage should be at
least 100X~150X or higher. For overlapping
paired-end
reads, we provide a simple formula to calculate the insert
size.
Insert size = (Read
length (L) − # of bases to
be trimmed) × 2 − maximum
overlap length (m).
For example, if the read
length is L =150 bp, the maximum overlap length is m = 40 bp, and the number
of bases to be trimmed at the 3Õ end is 30, then the recommended insert size is 200 bp.
For a small or medium genome (² 50 Mb), we suggest to use MiSeq to sequence the genome Because JR-Assembler
uses whole reads to assemble contigs.
and MiSeq produces longer reads
than Hi-Seq200, the MiSeq reads can jump over
repeats more frequently. On the other hand, for a large genome (~1 Gb or larger)
we recommend HiSeq2000 because of its much higher data throughput for genome assembly.
|