For this and many of the following problems you will create queries that retrieve the relevant information from the Portal small mammal survey database. You can create the SQLite version of this database using the EcoData Retriever by first installing the software and then running:
retriever install sqlite PortalMammals
from the command line. Or you can download an already assembled copy of the database.
You will need to know some details regarding what is in this database in order to answer the questions. For example, you may need to know what species is associated with the two character species ID or you may need to know the units for the individual’s weight. This type of information associated with data is called metadata and the metadata for this dataset is available online at Ecological Archives.
All Survey Data
.Size Differences Among
Sexes Data
.A population biologist (Dr. Undómiel) who studies the population
dynamics of Dipodomys spectabilis would like to use some data from
Portal, but she doesn’t know how to work with large datasets. Being the
kind and benevolent person that you are, write a query to extract the
data that she needs. She wants only the data for her species of
interest, when each individual was trapped, and what sex it was. She
doesn’t care about where it was trapped within the site because she is
going to analyze the entire site as a whole and she doesn’t care about
the size of the individuals. Obviously she doesn’t need the species
codes because you’re only providing her with the data for one species,
and besides since she isn’t actually looking at the database itself the
two character abbreviation would probably be confusing. Save this query
as a view with the name Spectabilis Population Data
.
Scroll through the results of your query. Do you notice anything that might be an issue for the scientist to whom you are providing this data? [you should] Think about what you should do in this situtation…
You decide that to avoid invoking her wrath, you’ll send her a short
e-mail* requesting clarification regarding what she would like you to
do regarding this complexity. Dr. Undómiel e-mails you back and asks
that you create two additional queries so that she can decided what to
do about this issue later. She would like you to add a query to the same
data as above, but only for cases where the sex is known to be male, and
an additional query with the same data, but only where the sex is known
to be female. Save these as views with the names Spectabilis
Population Data Males
and Spectabilis Population Data Females
.
*Short for elven-mail
The graduate students that work at the Portal site are hanging out late one evening drinking… soda pop… and they decide it would be an epically awesome idea to put together a list of the 100 largest rodents ever sampled at the site. Since you’re the resident “computer genius” they text you, and since you’re up late working and this sounds like a lot more fun that than homework you’re working on (which isn’t really saying much, if you know what I’m saying) you decide you’ll make the list of them.
The rules that the Portal students have come up with (and they did spend a sort of disturbingly long time coming up with these rules; I guess you just had to be there) are:
You may find the SQL command LIMIT to be helpful. Save the final query
as 100 Largest Individuals
.
Write a query that returns a list of the dates that mammal surveys took
place at Portal (with no duplicates). Save it as Dates Sampled
.
Write a query that returns the number of individuals of each species
captured in each year and the total biomass of those individuals (the
sum of the wgt column). The units for biomass should be in kilograms.
Include the year and species ID in the output. Sort the result
chronologically by year and then alphabetically by species. Save as
Mass-Abundance Data
.
The Plots table in the Portal database can be joined to the main table
by joining plot
to PlotID
and the species table can be joined to
the main table by joining species
to new_code
.
The Portal mammal data include data from a number of different
experimental manipulations. You want to do a time-series analysis of the
natural population dynamics of all of the rodent species at the site, so
write a query that returns the year, month, day, and full species name
of every individual captured on the control plots. Exclude all
non-rodent species (i.e., species for which the Rodent field in the
species table is equal to 0) and all individuals that have not been
identified to species (i.e., species for which the Unknown field in the
species table is equal to 1). Save this query as Data From Controls
.
Using the Portal dataset write a query that returns the average number
of individuals sampled on each plot type on a per plot basis. In other
words, how many individuals are sampled on average on a plot of a given
plot type. Save this query as Per plot individuals per plot type
.
Write a query that returns the year, month, day, species, and mass for
every record were there is no missing data in any of these fields. Save
it as No Missing Data
.
Start a new Python file that you will use to answer the rest of the questions for this assignment.
Using Python create a new table in the Portal database called FieldNotes and should include information on the date of sampling that the notes apply to (designed in such a way as to allow it to be linked to the main table) and a large text field for entering notes.
I would definitely recommend using VARCHAR for the notes field because you have no idea how extensive future notes might be and the notes will probably vary wildly in length.
Add the following note to the FieldNotes table in the Portal database for the date 04/01/1963:
“Just completed the April 1963 census of the site. The region is teeming with Dipodomys spectabilis. Using the time machine to conduct trapping prior to the start of the study is working out great!”
Add the following note to the FieldNotes table in the Portal database for the date 10/1/2013:
“Vegetation seems to have returned to normal for this time of year. The landscape isn’t exactly green, but there is a decent amount of plant activity and there should be enough food for the rodents to the winter”
Commit this change to the database. As soon as you’ve commited it you realized that this was the update for October 2012, not October 2013 (obviously; I mean it’s not like we have a time machine…). Update the record so that it contains the appropriate value.
Before starting this problem make sure that you have a backup of your database or that it has recently been committed to version control so that you can revert the changes if necessary. In fact, when developing the code it’s probably best to work on a copy of the database that is specifically just for development. Once it’s working then apply it to your actual database. Always, always, backup your databases before messing with them in this manner. Seriously.
The Species table in the Portal database has a structural problem in that the
oldcode
column often contains multiple pieces of information in a single
cell. This means that we can’t really run queries that use the oldcode
field
effectively. Think about what the best structure would be for this table. It
might include splitting the table into two separate tables (wink, wink, nudge,
nudge). Feel free to check with Ethan to make sure you’ve got the right
idea. Using Python, restructure the database and store the new species table as
PortalMammal_species
(i.e., the same name it has now; you may need to learn
about the DROP TABLE command in
order to do this) and naming any new tables you create with easy to understand
names.