Data Interpretation

In order to assess the quality of a sequencing reaction one must view the electropherogram. From the electropherogram, several parameters which affect the results can be observed. These parameters cannot be determined from the text files. Please keep in mind that the peaks are the actual data, the base calls can be (and indeed often are) wrong. Listed below are factors to consider when interpreting your sequencing results:

Signal strength

As the signal to noise ratio decreases, the likelihood for miscalls increases. The most likely reasons for low signal strength include insufficient template or primer, the presence of salt or other contaminant in the reaction mix, absence of the primer's annealing site, and secondary structure. Alternatively, if the signal strength is too high this can also lead to miscalls due to the system's detection system being overwhelmed.

Compressions, gaps, N's, during fluorescent automated sequencing

The software analyzes the data produced and assigns that for a given capillary that a base call is expected at a certain time interval. If the fluorescently labeled bases migrate more quickly or slowly than is expected then either a compression, a gap or a N can occur. Always examine these regions very carefully to determine that the software has actually made the correct call. Always remember, your eyes are able to detect much more than a computer program.

Dye blobs occur due to incomplete removal of the unincorporated fluorescently labeled ddNTP's. Dye blobs can be a problem especially in very weak reactions. In most cases the correct sequence can be read just under the dye blob. In cases where the template is too weak, repeating the reaction with a higher template concentration will help.

Topheavy data can occur for a few reasons. The first explanation is that too much template has been added to the sequencing reaction. In this scenario the signal intensity decreases due to depletion of the fluorescently labeled ddNTP's. The second possibility is the presence of secondary structure. The presence of G rich templates, long repeat regions or hairpins can lead to a gradual (though sometimes abrupt) decrease in signal. For G rich templates dGTP chemistry is helpful. Other cycling parameters can also be employed to get through repeats and hairpins.

Multiple products are seen as peaks within peaks. In such a sample the signal strength is good but more than one peak exists. If the multiple products begin after the multiple cloning sight for plasmid DNA more than likely there are two clones present. If the multiple products exist from the beginning of the sequence than two priming sites probably exist. For PCR products there is also the possibility of an insertion or deletion.

Slippage, most likely to occur in hompolymeric regions, is proposed to occur because the two DNA strands do not stay paired correctly during polymerization through the homopolymer region. Decreasing the extension temperature in cycle sequencing , sequencing from the opposite direction, or designing an anchored primer (20T followed by an A, C, G, or T) are possible solutions. Slippage in a PCR product is best remedied by subcloning the fragment

Many good websites exist that include information on interpretation and troubleshooting of sequence data. A brief list is included below:

The Qiagen Guide to Template Purification and DNA Sequencing (2nd Edition)
Interpreting ABI 377 Chromatograms
Analysed Data Trouble Shooting
ABRF '97: Techniques at the Genome/Proteome Interface
DNASeq Data Evaluation

Electropherogram Viewers (freeware)

Chromas, for PC
Edit View, for MAC
Conversion Utility - needed to convert electropherogram files to be Mac compatible.

Troubleshooting Guide