This problem is related to Combining Basics 2, but using the Pandas library.
Dr. Granger is interested in studying the factors controlling the size and carbon storage of shrubs. This research is part of a larger area of research trying to understand carbon storage by plants. She has conducted a small preliminary experiment looking at the effect of three different treatments on shrub volume at four different locations. She wants to conduct a preliminary analysis of these data to include in a grant proposal and she would like you to conduct the analysis for her (she might be a world renowned expert in carbon storage in plants, but she sure doesn’t know much about computers). She has placed a data file on the web for you to download.
.head()
method.1.8 + 2 * log(volume)
where volume
is the volume of the shrub. You’ll need
to use the numpy
version of the log()
function.data_means = data.groupby('site').mean()
data_means['height']
Modify the code to calculate the average height of a plant in each experiment type.
There were a relatively large number of extinctions of mammalian species roughly 10,000 years ago. To help understand why these extinctions happened scientists are interested in understanding whether there were differences in the body size of those species that went extinct and those that did not. Since we’re starting to get pretty good at this whole programming thing let’s stop messing around with made up datasets and do some serious analysis.
Download the largest dataset on mammalian body size in the world. Fortunately this dataset has data on the mass of recently extinct mammals as well as extant mammals (i.e., those that are still alive today). Take a look at the metadata to understand the structure of the data. One key thing to remember is that species can occur on more than one continent, and if they do then they will occur more than once in this dataset. Also let’s ignore species that went extinct in the very recent past (designated by the word ‘historical’ in the ‘status’ column).
Import the data into Python. If you’ve looked at a lot of data you’ll realize
that this dataset is tab delimited. The special character to indicate tab in
Python is \t
.
To start let’s explore the data a little and then start looking at the major question.
len(data.groupby(['genus']))
. Modify this code to determine
the number of species. Remember that a species is uniquely defined by the
combination of its genus name and its species name. Print the result to
the screen. The number should be between 4000 and 5000.mean()
should help you here. It is available as both a numpy function and a Pandas
DataFrame method. Don’t worry about species that occur more than once. We’ll
consider the values on different continents to represent independent data
points. Print out the results in the following sentence: “The average mass of
extant species is X and the average mass of extinct species is Y.” with the
appropriate values filled in for X and Y.This is a follow up to the Scientific Python 1.
Looking at the average mass of extinct and extant species overall is useful, but
there are lots of different processes that could cause size-biased extinctions
so it’s not as informative as we might like. However, if we see the exact same
pattern on each of the different continents that might really tell us
something. Repeat the analysis in
Scientific Python 1, but this time compare the
mean masses within each of the different continents. Export your results to a
csv file where the first entry on each line is the continent, the second entry
is the average mass of the extant species on that continent, the third entry is
the average mass of the extinct species on that continent, and the forth entry
is the difference between the average extant and average extinct masses. Call
the file continent_mass_differences.csv
. If you notice anything
strange think about what’s going on and present the final data in the way that
makes the most sense to you.