Friday, November 1, 2013

Distribution modelling, or: muwahaha, vindication!

Distribution modelling is an increasingly important technique in which one tries to infer where a species occurs, or at least can occur. The principle is at the same time mathematically complex and conceptually simple:

You take a number of variables that might influence whether a species can occur in an area or not - this could be lots of different things such as average annual rainfall, average temperature, average temperature in hottest month, minimum temperature, phosphate content of the soil, pH value of the soil, etc. Then you need geocoded known occurrences of the species you are interested in, the more the better, and in some models also known absences, and a fancy piece of software can produce the distribution model for you, projecting onto a map the likelihood with which the species will be found in each of the map's grid cells.

Potential uses of distribution modelling are many. You could be interested in how far a newly arrived invasive organism is potentially going to spread in your country, so you model its potential distribution based on data showing where it can survive in its area of origin. You might be interested in where a species can live and where it cannot live in the year 2100 given this or that climate change scenario. Going back in time, you might want to know where a species was able to live during the last ice age. If you can reconstruct the probable niche of ancestral species, you might also want to know where they would have been able to live given paleoclimatic assumptions.

In a paper published last year, I and a few colleagues made another use of distribution modelling. I was interested in the distribution of species richness of daisies across the continent. The problem is, inferring the species richness of grid cells from known occurrences will often be an under-estimate of the real species numbers because some areas are vastly under-sampled. In extreme cases, you may infer a species poor area to be as rich as a hotspot of diversity if the former is very intensively sampled and the latter is rarely visited by field biologists.

So what we did was to construct distribution models of all species of my study group and then stack them on top of each other to see how many species would be in each grid cell. As mentioned above, what you get is a probability of occurrence. How do you add them up? We did it quite directly: If a cell had, hypothetically, ten species with probability of occurrence of 50% each, we would have added that up to five species.

Some reviewers did not like the idea at first and argued that we should use some cut-off: Set all species that have more than X% probability of occurrence to present, all others to absent, and then count the presences. In the end, however, we convinced the editor that our approach made more sense.

And guess what? Last week I found a paper presenting a meta-analysis on the issue because they cited our study. Calabrese et al. examined ours and numerous similar studies that used stacked distribution models to infer species numbers, and they concluded that what we did was exactly how it should be done! The cut-off approaches used in many other papers vastly over-estimate the real numbers of species.

Feels good. As a colleague said, "it is nice to get feedback in a citation!" And next time I do something like this, I will have a reference to justify why I am doing it that way.

No comments:

Post a Comment