Skip to main content
  1. Teaching/
  2. teaching/

·3 mins

DNA Library

DNA is the chemical which encodes genetic information for all life on Earth. It consists of four bases: Adenine, Cytosine, Guanine, and Thymine, in a linear structure.

Interestingly, when doing computations, we often prefer to treat DNA as strings, because of its structural similarity to strings in computer science. We treat each base as a letter (A, C, G, T), which allows us to use functions meant for strings to manipulate DNA sequences.

In this assignment, you will implement a simple library for working with DNA.

Assignment

For this assignment, you will be limited in the methods of strings that you can use. You may use the following functions/methods on strings:

  • indexing (x[i]) and slicing (x[i:j])
  • append (with the += operator)
  • len, in, not in
  • equality comparison (with the == operator)

You may also use other data structures (e.g. tuples, sets, dicts), and any of their associated methods. You may assume that all letters are always uppercase.

Implement the following functions on DNA sequences:

isValid()

isValid takes in a string and returns whether it is a valid DNA sequence. A string is a valid DNA sequence if it only contains the characters A, C, G, or T.

addBase()

addBase takes a DNA string and a single base (A, C, G, or T) and returns a new DNA string with the base added to the end.

countBases()

countBases takes a DNA string and a base, and counts how many times that base occurs in the DNA string.

extendDNA()

extendDNA takes two DNA strings and returns the second DNA string appended to the first.

insertBase()

insertBase() takes three arguments: a DNA string, a base, and an index. It returns a new DNA string which has inserted the given base at the given index.

removeBase()

removeBase() takes a single DNA string and removes the last base from the string.

main()

Finally, write a main() function that does the following actions, in order:

  • Generates the DNA string "ACGAGCATGGACTACTGACGAGGAACCCTTTT"
  • Checks that this is a valid string
  • Appends "A"
  • Appends "G"
  • Appends "T"
  • Prints how many "C"s are in the string
  • Extends the DNA string with "AGCTAGGAT"
  • Inserts "C" at index 4
  • Removes 4 bases from the end of the string

Each of these steps should be done with a single function call to one of the functions that you've written (except the last one, which will need 4 function calls).

Insight

None for this assignment. Keep this code around for the next assignment though!

Submission

Submit a single file named DNA1.py on Canvas. Your file needs to compile and run. It should also have a header with the following information (this goes in your source file, not in the program output):

# File: DNA1.py
# Student: 
# Course: Intro to Programming
# 
# Date:
# Description of Program:

The description should be a short (1-3 sentence) description of what the program does. Do not describe how it's written!