M4: Data
Compilation, Assessment, and Treatment
Learning
Objectives: This module explains proper data collection, assessment, and
treatment of the data in systematic conservation planning. The learner is able to conceptualize the
theoretical and practical issues that are part of the data collection,
assessment, and treatment process.
Systematic Conservation Planning data consists of areas
and “features” of those areas:
Areas are geographical units defined
in space. Areas can be large or
small, and can have different shapes.
Examples are: grid
cells, catchments,
habitat remnants, and tenure parcels.
Features are
properties of areas including the biodiversity surrogates that can be used for
conservation planning (see M5: Surrogacy Identification and
Analysis).
Example
features are: species, characters of species, or more heterogeneous entities
such as species assemblages,
ecological communities, habitat types,
environmental classes, etc.
Species
assemblages are groups of species that occur together in a habitat.
Other
important features include economic and social aspects of an area.
Systematic
Conservation Planning requires the collecting of good data on the distribution
and abundance patterns of biodiversity features.
The minimum
data required in conservation planning is: the information on what species there
are and where they are.
“What species there are” is often information on
biodiversity surrogates such as species assemblages, habitats, and environments
(terrain, climate, chemical or physical properties).
Data sets
need to be consistent in the type of data they contain and in the measurement of
the relevant features across the localities, region, biomes, etc.
The collecting of “good” biological data will
ultimately be the information from which one assesses
complementarity of one area vs. another area.
Supplementary
data on abundance and accurate predictions of species/habitat responses to
environmental changes/conditions are desirable BUT rarely obtainable with the
limited funds and time normally available for planning exercises.
|
Example
4.1
Papua New Guinea
(Margules and Sarkar 2007; Faith et al.
2001; Nix et al. 2000; Bellamy and McAlpine 1995; Keig and Quigley 1995;
Margules et al. 1995)
In 1996, the Global
Environment Facility (GEF) sponsored a study to identify biodiversity
priority areas in Papua
New Guinea to evaluate and assist the
country with its biodiversity planning and management. The Papua New Guinea Resource
Information System (PNGRIS) has information on natural resources, land use
and human population densities for the whole country (Bellamy and McAlpine
1995; Keig and Quigley 1995). Land units for which this information is
recorded are called resource mapping units (RMUs). Because these land units
are widely used by government agencies in Papua New Guinea, they were
adopted as the candidate conservation areas for this study. The Resource Mapping Units were mapped
from aerial photographs from extensive land resource surveys carried out by
the Australian Commonwealth Scientific and Industrial Research Organisation
(CSIRO) in the 1970s. The RMUs served as mapping resources for
vegetation types, whereas climate, terrain, and
lithographical data served as resources for the environmental data.
Together with vegetation types, environmental data, and the
Papua New Guinea
government's 1997 list of endangered and threatened species, these surrogates
for biodiversity were used to identify conservation areas for the country.
Land uses that compete with conservation in Papua New Guinea include
agriculture and forestry.
|
Three types of data are available: (i) survey results,
(ii) remote-sensed data, and (iii) modeled data.
All of these
can be use for conservation planning, not just survey results.
Surveys may introduce sampling
biases which will percolate through to the planning process.
Types of survey data include
presence-absence data, presence-only data, and abundance data.
Presence-only data: is when species have been recorded
in some areas but there is no indication of abundance, and the lack of recorded
presence in other areas does not necessarily mean it is not there. It means that it has not been recorded
there, so we do not know if it is there or not.
Presence-absence data: absences are real within the
limits of sampling intensity and thoroughness.
Surrogates were looked for and recorded as present where they were found
or as absent where they were not found.
Abundance data: estimates the abundance (the population
size/number) or extent (e.g., percentage of canopy cover) of the surrogates. Zero abundance indicates the absence of
surrogates.
There is more
confidence in plans based on abundance data than those based on presence-absence
data, and more confidence in those based on presence-absence data than those
based on presence-only data.
Remote-sensed
data: These are data obtained by
monitoring technology, especially satellites; these data can be used in several
stages of systematic conservation planning.
Such data are increasingly available for all areas of
the world, allowing systematic conservation planning to take place anywhere.
The website of the U.S.
Geological Survey provides a digital elevation model (DEM) for the entire
world with a horizontal grid spacing of 30 arc seconds (approximately 1
kilometer).
The website of
Worldclim
provides climate information for the entire world at a 1 sq. km. resolution.
Modeled data:
These are data obtained from models of species distributions–see Example 4.2.
In data collection, it is important to check to see if
data points are unduly correlated with major roads, waterways, etc.
Existing data collections can be
found in museums and herbaria, from various departments of government (natural
resource management agencies), and from non-government organizations (NGOs).
Data
collections often map road networks.
Many records come from near roads or townships. Because of this, it is difficult to find
accurate distribution patterns of species.
Even results
of systematic surveys may provide presence-only data if the taxon in question
was not the focus of the survey.
The reality
of data collection sets is that they are far from ideal. While planners must make full use of the
data, they must also acknowledge the limitations.
|
Example
4.2
Koala Data in New South Wales, Australia
(Margules and Austin 1994; Margules and
Sarkar 2007)
In the case of Koala records collected in
New South Wales, Australia, after review the data
seemed to mimic road networks.
The Koala records were compiled from museum record data as well as a field
survey done by volunteers who recorded sightings of the animals – the
equivalent of presence-only data.
When compared with the road network map, most of the data were near the
roads and townships – see Figure 4.2 (Margules and Austin 1994).
“No
systematic state-wide survey has ever been conducted of this very high
profile, charismatic species; so, the limits of its geographic range still
cannot easily be defined.” (Margules and Sarkar 2007, 65) |
|
Figure 4.2
|
Data Treatment:
these are ways in which raw field data are systematically modified for
use in conservation planning.
Data
treatment may include the rejection of obviously erroneous data.
It may
include the re-sampling of data using Geographical Information Systems (GIS)
software packages so that all the data are available at a uniform spatial
resolution.
Most
importantly, data treatment includes modeling to fill in species distributions,
when there is presence-only data.
Such models
are important because there exists very little reliable
presence-absence data for most taxa in most places in the world.
Modeling species distributions, often called “niche
modeling” (Moritz et. al 2001; Peterson et. al 1999), has emerged as a major
strategy of data treatment to fill in species distributions for areas that have
never been adequately surveyed so that conservation planning can proceed as
efficiently as possible.
All popular
data treatment models use as input: environmental data on the planning region,
and as many presence-absence or presence-only geo-referenced records of species’
occurrences as possible.
A variety of
modeling techniques exist and these are being extended in research efforts in
many laboratories around the world.
Ecologically specific models: use the best available
ecological knowledge to predict species distributions but have had, at best, a
moderate record of success besides often requiring data that are not available
in most conservation contexts.
Regression and other statistical association
(correlation) models simply try to use correlations to predict species
distributions—see Example 4.3.
Heuristic methods are based on different statistical
associations, such as Bioclim (Nix 1986; Busby 1991), and have been successfully
used in the past.
Machine Learning Methods have led to software packages
such as Maxent (Phillips et. al 2006) and
GARP (Stockwell and Peters 1999) which are being increasingly used with
predictive success to model species distributions. These packages use algorithms
that learn from past successes (in finding species at particular times and
spaces) to predict future association between species’ presences and
environmental variables –see Example 4.4.
|
Example
4.3
Distribution of Eucalyptus radiata
in New South Wales
(Nicholls 1989)
When there is only presence-absence data, regression
analyses such as Generalized Linear Modelling (GLM) can be used to estimate
species distribution patterns.
Nicholls (1989) used the GLM analysis with presence-absence data on
coastal hardwood forests in New South Wales,
Australia
(Figure 4.3a). Each presence or
absence record was then plotted in a plane in which the x-axis represented
the mean annual rainfall of the record and the y-axis represented the mean
annual temperature of the record. (Figure 4.3b).
A forward stepwise procedure was used to find the
variables that were important for explaining the distribution of the
species. It was found that
temperature, altitude, and lithology (sediment characteristics) were the
most important variables for predicting the distribution of E. radiata.
Since temperature and altitude are strongly correlated with each other
(e.g., lower altitudes are warmer and higher altitudes are cooler), the two
final variables used to predict E.
radiata
distribution were temperature and lithology.
However, upon further examination of the data, it was
found that the lithological data for all the tested categories was not
complete, and therefore the predicted distribution was null (false). The model was re-fitted and the
variables were tested again in the stepwise procedure.
Next, it was found that after running the GLM, if the known
lithological variables were counted separately (in this case fine grained
sediments) and compared to temperature, the distribution (for fine grained
sediments in particular) looked contoured (Figure 4.3c). Finally, after running the procedure
again, temperature was compared to all known lithological data types and a
resulting distribution was found (Figure 4.3d)
|
|
Figure 4.3a
Geographical Distribution of E.
radiata from Survey Records
|
|
Figure 4.3b
The Mean Annual Temperature and Mean
Annual Rainfall of the E. radiata Records
|
|
Figure 4.3c
Predicted Distribution of E.
radiata in Environmental Space based on Fine-grained Lithology
|
|
Figure 4.3d
Predicted Geographical Distribution
and Probability of Occurrence with All Sediment Types
|
|
Example
4.4
Distribution of the brown-throated three-toed sloth, Bradypus variegatus and a small-bodied rodent, Microryzomys minutus in South
America
(Phillips et al. 2006)
The Maximum Entropy (Maxent) technique is a niche-modeling tool that uses
algorithms that learn from past success to predict future association
between species’ presences and environmental variables. It is most useful for presence-only
data.
In this case of the brown-throated three-toed sloth, Bradypus variegatus and a small-bodied
rodent,
Microryzomys minutus, the geographical locations from presence-only data
were mapped (see Figure 4.4a).
|
|
Figure 4.4a
Occurrence Records for Bradypus variegatus and Microryzomys
minutus
Records for Bradypus
variegatus (left) and Microryzomys minutus (right), as derived
from vouchered (verified) museum specimens (Phillips et. al 2006).
|
|
The following explanatory variables were used by MaxEnt to construct the
models of the species' potential distributions: climatic variables,
elevation, and potential vegetation.
The climatic variables were annual cloud cover, annual diurnal
(daytime) temperature range, annual frost frequency, annual vapor pressure,
January, April, July, October, annual precipitation, and minimum, maximum,
and mean annual temperature.
Maxent used two environmental “suites”: (i) climate, elevation and
potential vegetation together and (ii) climate and elevation. The final predictions, showing the
likelihood of occurrence of each species in each site, are pictured in
Figure 4.4b.
|
|
Figure 4.4b
Predicted Potential Geographic Distributions for Bradypus variegatus and Microryzomys minutus
Distributions for Bradypus variegatus (left) and
Microryzomys minutus (right), as predicted by Maxent. Four colors are
used to indicate the strength of the predictions with darker colors
indicating stronger predictions (Phillips et. al 2006).
|