Returns SeqRecord objects from an ACE file.
This uses the Bio.Sequencing.Ace module to do the hard work. Note
that by iterating over the file in a single pass, we are forced to ignore
any WA, CT, RT or WR footer tags.
Ace files include the base quality for each position, which are taken
to be PHRED style scores. Just as if you had read in a FASTQ or QUAL file
using PHRED scores using Bio.SeqIO, these are stored in the SeqRecord's
letter_annotations dictionary under the "phred_quality"
key.
>>> from Bio import SeqIO
>>> handle = open("Ace/consed_sample.ace", "rU")
>>> for record in SeqIO.parse(handle, "ace"):
... print record.id, record.seq[:10]+"...", len(record)
... print max(record.letter_annotations["phred_quality"])
Contig1 agccccgggc... 1475
90
However, ACE files do not include a base quality for any gaps in the
consensus sequence, and these are represented in Biopython with a quality
of None. Using zero would be misleading as there may be very strong
evidence to support the gap in the consensus.
>>> from Bio import SeqIO
>>> handle = open("Ace/contig1.ace", "rU")
>>> for record in SeqIO.parse(handle, "ace"):
... print record.id, "..." + record.seq[85:95]+"..."
... print record.letter_annotations["phred_quality"][85:95]
... print max(record.letter_annotations["phred_quality"])
Contig1 ...AGAGG-ATGC...
[57, 57, 54, 57, 57, None, 57, 72, 72, 72]
90
Contig2 ...GAATTACTAT...
[68, 68, 68, 68, 68, 68, 68, 68, 68, 68]
90
|