Sequence dependent difficulties |
![]() Fig.4: Sequence dependent difficulties |
|
Many people will sequence DNA that contains repeat sequences. The
problem with a repeat region is in how it uses the labelled dNTP, and
also how the sequencing enzyme reacts to this region. In a homopolymer
region the problems are two-fold. One, the rate at which the relevant
labelled nucleotide is used up will be very high - especially when the
repeat region is quite long or lots of target DNA is present and this
may affect incorporation further on in the sequence run. This often
leads to the sequence run petering out midway through a homopolymer
region. Two, the enzyme used in sequencing (TaqFS+) will be subject to
a certain amount of slippage - usually you get n+/- 1, 2, 3
etc. ladders in your sequence as the enzyme keeps falling off and
rejoining in random places in the homopolymer region. In worst case
scenario you will just see waves of all four colours after the repeat
has stopped that do not relate to anything at all - only that your
reaction has failed at this point.
As for multiple base repeat sequences, these problems are usually dependant upon the base composition. The worst repeats tend to be GC based as these have the greatest difficulty in incorporating the bases during the sequencing reaction, if not forming hairpin loops and killing the sequencing altogether (see below). Whereas AT rich repeat regions tend to sequence better than most other regions. So, what to do with sequence data such as this? A good try to solve this is to put a number of transposons into this fragment and use those to sequence off - as this artificially shortens the repeat region without actually having to cut it physically. If this doesn't work, you might at least want start sequencing from this region. So, you can have poly(N) X, Y, Z primers made (where N is the repeat base and X, Y and Z are the remaining bases) and use these in your sequencing reaction. Then you will be able to generate sequence off the end of the homopolymer region as the last base will anchor the primer. Another issue is the GC-content of the template DNA in general. GC rich regions can be extremely hard to sequence through. The reason for this is the incorporation of the dye labelled terminators is not as efficient as it usually is leading to compression's (especially in ...gcgcgcgcgc... type repeat regions), and where you get runs of G's followed by runs of C's you can get hairpin loops that will kill a sequencing reaction dead. |