DNA Library Part 2

The DNA library we made last time had a few disadvantages. In this assignment, we will hopefully fix some of them, as well as add some additional features.

As a reminder, you are limited to the following methods on strings:

indexing (x[i]) and slicing (x[i:j])
append (with the += operator)
len, in, not in
equality comparison (with the == operator)

but you may use other structures provided by Python (tuples, dicts, sets, etc.) and any methods/functions associated with them.

Assignment

Your first goal will be to convert the DNA library we made last time into a class-based library. Start by writing the following class:

class DNASeq(object):
    def __init__(self, seq):
        self.sequence = seq

Then convert each of the methods from the previous assignment into class methods. The __init__() method should call isValid() and return None if the result is not a valid DNASeq.

All the other methods we implemented last time should be turned into methods that modify the DNASeq object (except for isValid() and countBases()). As a reminder, these are the methods from last time:

isValid()
addBase()
countBases()
extendDNA()
insertBase()
removeBase()

Remember that class methods must take self as the first argument when you are writing the function.

Then, add the following methods to the class:

myFind()

myFind takes two arguments: self, and a base (A, C, G, or T). It returns a list of all indices in self.sequence where that base occurs. That is, the following code should never print anything:

for i in dna.myFind('A'):
  if dna.sequence[i] != 'A':
    print("Something went wrong!")

myReverse()

myReverse takes self as its only argument, and reverses the DNA sequence in-place.

complement()

myComplement takes self as its only argument, and does not modify self. Instead, it generates the complement sequence of DNA, puts it in a new DNASeq object, and returns that. The DNA complement sequence is the original sequence subjected to the following rules:

All occurrences of A in the original sequence become T in the complement
All occurrences of T in the original sequence become A in the complement
All occurrences of C in the original sequence become G in the complement
All occurrences of G in the original sequence become C in the complement

main()

Finally, write a main() function that does the following actions, in order:

Generates the DNA string "ACGAGCATGGACTACTGACGAGGAACCCTTTT"
Checks that this is a valid string
Appends "A"
Appends "G"
Appends "T"
Prints how many "C"s are in the string
Extends the DNA string with "AGCTAGGAT"
Inserts "C" at index 4
Removes 4 bases from the end of the string
Reverses the sequence
Prints the complement sequence

and call this from the top level of your program.

Insight

Compare the difficulty of writing the main() function in this assignment versus the last assignment. Did you find it easier or harder? Why? Write one or two sentences summarizing your thoughts.

Submission

Submit a single file named DNA2.py on Canvas. Your file needs to compile and run. It should also have a header with the following information (this goes in your source file, not in the program output):

# File: DNA2.py
# Student: 
# Course: Intro to Programming
# 
# Date:
# Description of Program:
# Was the main() function easier or harder to write? Why?

The description should be a short (1-3 sentence) description of what the program does. Do not describe how it's written!