Programming Assignment 3

  1. The length of an organism is typically strongly correlated with it’s body mass. This is useful because it allows us to estimate the mass of an organism even if we only know its length. This relationship generally takes the form:

    Mass = a * Lengthb

    Where the parameters a and b vary among groups. This allometric approach is regularly used to estimate the mass of dinosaurs since we cannot weigh something that is only preserved as bones.

    The following function estimates the mass of an organism in kg based on it’s length in meters for a particular set of parameter values, those for Theropoda (where a has been estimated as 0.73 and b has been estimated as 3.63; Seebacher 2001).

    def get_mass_from_length_theropoda(length):
        mass = 0.73 * length ** 3.63
    	return mass
    
    1. Use this function to print out the mass of a Spinosaurus that is 16 m long based on it’s reassembled skeleton. Spinosaurus is a predator that is bigger, and therefore, by definition, cooler, than that stupid Tyrannosaurus that everyone likes so much.
    2. Create a new version of this function called get_mass_from_length() that estimates the mass of an organism in kg based on it’s length in meters by taking length, a, and b as parameters. To be clear we want to pass the function all 3 values that it needs to estimate a mass as parameters. This makes it much easier to reuse for all of the non-theropod species. Use this new function to estimate the mass of a Sauropoda (a = 214.44, b = 1.46) that is 26 m long.
  2. Write a function that converts pounds to grams (there are 453.592 grams in one pound). Use that function and a built in function to print out how many grams there are in 3.75 pounds, rounded to the nearest gram. Don’t do any printing or rounding inside your function. You want each function to do one thing and do it well, and in this case that thing is converting pounds to grams. Have the function do the conversion and then do the rounding and printing outside of the function.

  3. This is a follow up to Strings 6.

    A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype=str to tell loadtxt() that the data is composed of strings.

    Calculate the GC content* of each sequence by writing a function that takes a dna sequence as input and returns the GC-content of that sequence. Print the result for each sequence. Before we knew about functions we had to take each dna sequence one at a time and then rewrite or copy-paste the same code to analyze each one. Isn’t this better?

    * The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs).

  4. This is a follow up to Functions 6.

    A colleague has produced a file with one DNA sequence on each line. So far you’ve been manually extracting each DNA sequence and calculating it’s GC content, which as worked OK with five sequences, but isn’t going to work very well when the sequencer really gets going and you have to handle 100s-1000s of sequences.

    Use a for loop and your function from Functions 6 to calculate the GC content of each sequence and print them out. The function should work on a single sequence at a time and the for loop should repeatedly call the function and print out the result.

  5. The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following list. Counts are entered in order and sites are numbered starting at one. Cut and paste the list into your assignment and then answer the following questions by printing them to the screen. Some Python functions that will come in hand include len(), max(), min(), and sum().

    number_of_birds = [28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36,
                       25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55,
    				   62, 98, 32, 900, 33, 14, 39, 56, 81, 29, 38,
    				   1, 0, 143, 37, 98, 77, 92, 83, 34, 98, 40,
    				   45, 51, 17, 22, 37, 48, 38, 91, 73, 54, 46,
    				   102, 273, 600, 10, 11]
    
    1. How many sites are there? Hint: the function len() works on lists as well as strings.
    2. How many birds were counted at site 42? Remember, the number of the site and the number of its position may not be exactly the same.
    3. How many birds were counted at the last site? Have the computer choose the last site automatically in some way, not by manually entering its position.
    4. What is the total number of birds counted across all of the sites?
    5. What is the smallest number of birds counted?
    6. What is the largest number of birds counted?
    7. What is the average number of birds seen at a site? You will need to combine two built-in functions.
  6. One of your collaborators has posted a comma-delimited text file online for you to analyze. The file contains dimensions of a series of shrubs (Length, Width, Height) and they need you to determine the associated volumes. You could do this using a spreadsheet, but the project that you are working on is going to be generating lots of these files so you decide to write a program to automate the process.

    Download the data, use np.loadtxt() to import it into Python, and then use a for loop to calculate the volumes and return a list the volumes. There should be one value in the list for each shrub. Once you have created this list, use another for loop to print out each volume on it’s own line in a string like ‘The volume the shrub is 22.5.’