Biodiversity and Data Science

From a quick review of measurements of biodiversity that I found on Wikipedia, there appear to be at least a couple of mathematical formulas for calculating it. Yet I don't think of biodiversity in such an abstract way. To me the term relates more to how natural our earth environment is. When we leave a rainforest alone, it thrives in a bio-diverse way. Many interdependent species of plants, insects, animals, birds, and fungi. When we burn it down, and plant a cash crop such as tobacco, we have lost the biodiversity of the region and replaced it with a single plant.

Based on this observation, I can start thinking about measuring biodiversity by looking at the percentage of land and sea, or earth surface, that has been left alone, untouched by human interference. That does not feel sufficient, however, since we as one of many species on the planet should be considered in the biodiversity equation. A more thoughtful approach is needed.

One of the things I learned from taking Data Science courses was the value of exploratory data analysis. By starting to collect data based on my intuition and observations, I can piece together hypotheses and test them with further analysis.

Biodiversity is not measured in terms of how much of the earth humans have modified. That is a very human-centric point of view. In Origin of Species, Darwin points out that there is greater diversity in cultivated plants than there is in their natural habitat. I don't recall, or haven't come to, his explanation for this but I think it has to do with the experimental nature of cultivation. Within our naturally short human lifetimes, we want to see the effect of particular stimuli or inputs on plants we are cultivating. We don't have the luxury of time to wait for changes to occur in unattended nature. For example, if you give a plant twenty percent more water in a given week, will that affect its growth?

We can conclude that we can engineer biodiversity in the laboratory, or greenhouse. In the natural world, we can only observe over time in order to measure biodiversity. Or, as Darwin did, we can draw conclusions based on observations of flora and fauna in different environments.

To begin to put together some data analysis, I will need to draw on some data sets, or create some data sets, that represent plants and/or animals in their natural habitats. Another way to get my head around this issue is to survey the literature on biodiversity.

One article that shows up on the first Google results page for the term 'biodiversity' is http://www.nature.com/scitable/knowledge/library/biodiversity-and-ecosystem-stability-17059965.

Just the first paragraph of that article is full of meaning! There are a 'variety of scales' at which biological diversity can be understood, it states, and goes on to say that the article is focused on species diversity.