Programming Assignment 4

  1. Create the following variables:

    w = 10.2
    x = 1.3
    y = 2.8
    z = 17.5
    dna1 = 'attattaggaccaca'
    dna2 = 'attattaggaacaca'
    species1 = 'diplodocus'
    species2 = 'tyrannosaurus'
    

    and use them to print whether or not the following statements are True or False:

    1. w is greater than 10
    2. w + x is less than 15
    3. x is greater than y
    4. 2 * x + 0.2 is equal to y
    5. dna1 is the same as dna2
    6. dna1 is not the same as dna2
    7. The number of occurrences of the base t is the same in dna1 and dna2
    8. x times w is between 13.2 and 13.5
    9. species2 comes before species1 alphabetically
    10. w is greater than x, and y is greater than z
    11. dna1 is longer than 5 bases, or z is less than w * x, or both
    12. The combined length of the two dna sequences is greater than or equal to 30
    13. (w + x + y) divided by the logarithm (base 10) of 100 is equal to 7.15
    14. The GC content (which is always a percentage) of dna1 is not the same as the GC content of dna2
  2. The following function is intended to check if two geographic points are close to one another. If they are it should return True. If they aren’t, it should return False. Two points are considered near to each other if the absolute value of the difference in their latitudes is less than one and the absolute value of the difference in their longitudes is less than one. Fill in the _________ in the function to make it work and then use it to check if the following pairs of points are near or not and print out the answers.

    1. Point 1: latitude = 29.65, longitude = -82.33. Point 2: latitude = 41.74, longitude = -111.83.
    2. Point 1: latitude = 29.65, longitude = -82.33. Point 2: latitude = 30.5, longitude = -82.8.
    3. Point 1: latitude = 48.86, longitude = 2.35. Point 2: latitude = 41.89, longitude = 2.5.
    def near(lat1, long1, lat2, long2):
        """Check if two geographic points are near each other""" 
        if (abs(lat1 - lat2) < 1) and (_________):
            near = True
        else:
            near = _________
        return near
    
  3. Write a function, dna_or_rna(sequence), that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. Since all the function will know about the material is the sequence the only way to tell the difference between DNA and RNA is that RNA has the base Uracil (u) instead of the base Thymine (t). Have the function return one of three outputs: ‘DNA’, ‘RNA’, or ‘UNKNOWN’. Use the function and a for loop to print the type of the sequences in the following list.

    sequences = ['ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg',
                 'gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau',
                 'gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc',
                 'guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu',
                 'gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc']
    

    Optional: For a little extra challenge make your function work with both upper and lower case letters, or even strings with mixed capitalization

  4. Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. This research is part of a larger area of research trying to understand carbon storage by plants. She has conducted a small preliminary experiment looking at the effect of three different treatments on shrub volume at four different locations. She wants to conduct a preliminary analysis of these data to include in a grant proposal and she would like you to conduct the analysis for her (she might be a world renowned expert in carbon storage in plants, but she sure doesn’t know much about computers). She has placed a data file on the web for you to download.

    You might be able to do this analysis by hand in Excel, but Dr. Granger seems to always get funded meaning that you’ll be doing this again soon with a much larger dataset. So, you decide to write a script so that it will be easy to do the analysis again.

    Write a Python script that:

    1. Imports the data using numpy. It has a header row so you’ll need to tell numpy.loadtxt() to ignore it by providing the optional argument skiprows=1.
    2. Loops over the rows in the dataset
    3. For each row in the dataset checks to see if the plant is tall (height > 5), medium (2 <= height < 5), or short (height < 2), and determines the total amount of carbon in the shrub. The total amount of carbon is equal to 1.8 + 2 * log(volume) where volume is the volume of the shrub (i.e., its length times its width times its height).
    4. Stores this information as table in a nested list (i.e., a list that contains a bunch of lists, with each of these sub-lists holding the results for one shrub) where the first column has the experiment number, the second column contains the string ‘tall’, ‘medium’ or ‘short’ depending on the height of the shrub, and the third column contains the shrub carbon.
    5. Exports this table to a CSV (comma delimited text) file titled shrubs_experiment_results.csv.

    This code should use functions to break the code up into manageable pieces. To help you get started here is a function for exporting the results to a csv file. To use it you’ll need to copy and paste it into your code. It uses the csv module so you’ll need to remember to import it.

    def export_to_csv(data, filename):
        """Export list of lists to comma delimited text file"""
    	outputfile = open(filename, 'wb')
    	datawriter = csv.writer(outputfile)
    	datawriter.writerows(data)
    	outputfile.close()
    

    Optional: If you’d like to test your skills a little more, try: 1. Adding a header row to you output file; and 2. Determining the average carbon in a shrub for each of the different experiments and printing those values to the screen.