Home Page Image

Data Collection, Channel Islands, California, United States. The image shows Jenn Caselle, Partnership for Interdisciplinary Studies of Coastal Oceans (PISCO)/ University of California at Santa Barbara (UCSB), who is the science coordinator at Anacapa Island. This example is also discussed in Module 3. ©2006 Brad Doane.





Home Page Image
Two-Toed Sloth, Costa Rica. Female Choloepus didactylus with young at the Canadian Organization for Tropical Forest Research station near Tortuguero, Costa Rica. This species is rarely active during the day. ©1995 Sahotra Sarkar.




© 2006 Vanessa Lujan, Trevon Fuller, Alex Moffett, and Sahotra Sarkar. Tutorial written by Vanessa Lujan, Trevon Fuller, Alex Moffett, and Sahotra Sarkar with assistance from James Justus, Chris Kelley, Chris Margules, and Samraat Pawar.
 

 


M4: Data Compilation, Assessment, and Treatment Print Friendly PDF

M4: Data Compilation, Assessment, and Treatment

 

Learning Objectives: This module explains proper data collection, assessment, and treatment of the data in systematic conservation planning.  The learner is able to conceptualize the theoretical and practical issues that are part of the data collection, assessment, and treatment process.


 

*      Systematic Conservation Planning data consists of areas and “features” of those areas:


      Areas are geographical units defined in space.  Areas can be large or small, and can have different shapes.


       Examples are: grid cells, catchments, habitat remnants, and tenure parcels.


      Features are properties of areas including the biodiversity surrogates that can be used for conservation planning (see M5: Surrogacy Identification and Analysis).


       Example features are: species, characters of species, or more heterogeneous entities such as species assemblages, ecological communities, habitat types, environmental classes, etc.


       Species assemblages are groups of species that occur together in a habitat.


       Other important features include economic and social aspects of an area.



*      Systematic Conservation Planning requires the collecting of good data on the distribution and abundance patterns of biodiversity features.


      The minimum data required in conservation planning is: the information on what species there are and where they are. 


       “What species there are” is often information on biodiversity surrogates such as species assemblages, habitats, and environments (terrain, climate, chemical or physical properties).


      Data sets need to be consistent in the type of data they contain and in the measurement of the relevant features across the localities, region, biomes, etc.


       The collecting of “good” biological data will ultimately be the information from which one assesses complementarity of one area vs. another area.


      Supplementary data on abundance and accurate predictions of species/habitat responses to environmental changes/conditions are desirable BUT rarely obtainable with the limited funds and time normally available for planning exercises.

 

 

Example 4.1

 

Papua New Guinea

(Margules and Sarkar 2007; Faith et al. 2001; Nix et al. 2000; Bellamy and McAlpine 1995; Keig and Quigley 1995; Margules et al. 1995)

 

In 1996, the Global Environment Facility (GEF) sponsored a study to identify biodiversity priority areas in Papua New Guinea to evaluate and assist the country with its biodiversity planning and management.  The Papua New Guinea Resource Information System (PNGRIS) has information on natural resources, land use and human population densities for the whole country (Bellamy and McAlpine 1995; Keig and Quigley 1995). Land units for which this information is recorded are called resource mapping units (RMUs). Because these land units are widely used by government agencies in Papua New Guinea, they were adopted as the candidate conservation areas for this study.  The Resource Mapping Units were mapped from aerial photographs from extensive land resource surveys carried out by the Australian Commonwealth Scientific and Industrial Research Organisation (CSIRO) in the 1970s.  The RMUs served as mapping resources for vegetation types, whereas climate, terrain, and lithographical data served as resources for the environmental data.  Together with vegetation types, environmental data, and the Papua New Guinea government's 1997 list of endangered and threatened species, these surrogates for biodiversity were used to identify conservation areas for the country. Land uses that compete with conservation in Papua New Guinea include agriculture and forestry.

 


 

*      Three types of data are available: (i) survey results, (ii) remote-sensed data, and (iii) modeled data.


      All of these can be use for conservation planning, not just survey results.


      Surveys may introduce sampling biases which will percolate through to the planning process.


      Types of survey data include presence-absence data, presence-only data, and abundance data.


       Presence-only data: is when species have been recorded in some areas but there is no indication of abundance, and the lack of recorded presence in other areas does not necessarily mean it is not there.  It means that it has not been recorded there, so we do not know if it is there or not.


       Presence-absence data: absences are real within the limits of sampling intensity and thoroughness.  Surrogates were looked for and recorded as present where they were found or as absent where they were not found.


       Abundance data: estimates the abundance (the population size/number) or extent (e.g., percentage of canopy cover) of the surrogates.  Zero abundance indicates the absence of surrogates.


      There is more confidence in plans based on abundance data than those based on presence-absence data, and more confidence in those based on presence-absence data than those based on presence-only data.


      Remote-sensed data:  These are data obtained by monitoring technology, especially satellites; these data can be used in several stages of systematic conservation planning.


       Such data are increasingly available for all areas of the world, allowing systematic conservation planning to take place anywhere.


       The website of the U.S. Geological Survey provides a digital elevation model (DEM) for the entire world with a horizontal grid spacing of 30 arc seconds (approximately 1 kilometer).


       The website of Worldclim provides climate information for the entire world at a 1 sq. km. resolution.


      Modeled data: These are data obtained from models of species distributions–see Example 4.2.


 

*      In data collection, it is important to check to see if data points are unduly correlated with major roads, waterways, etc.


      Existing data collections can be found in museums and herbaria, from various departments of government (natural resource management agencies), and from non-government organizations (NGOs).


      Data collections often map road networks.  Many records come from near roads or townships.  Because of this, it is difficult to find accurate distribution patterns of species.


      Even results of systematic surveys may provide presence-only data if the taxon in question was not the focus of the survey.


      The reality of data collection sets is that they are far from ideal.  While planners must make full use of the data, they must also acknowledge the limitations. 

 

 

Example 4.2

 

Koala Data in New South Wales, Australia

(Margules and Austin 1994; Margules and Sarkar 2007)

 

In the case of Koala records collected in New South Wales, Australia, after review the data seemed to mimic road networks.  The Koala records were compiled from museum record data as well as a field survey done by volunteers who recorded sightings of the animals – the equivalent of presence-only data.  When compared with the road network map, most of the data were near the roads and townships – see Figure 4.2 (Margules and Austin 1994).  No systematic state-wide survey has ever been conducted of this very high profile, charismatic species; so, the limits of its geographic range still cannot easily be defined.” (Margules and Sarkar 2007, 65) 

 

Figure 4.2

 


 

*      Data Treatment:  these are ways in which raw field data are systematically modified for use in conservation planning.


      Data treatment may include the rejection of obviously erroneous data.


      It may include the re-sampling of data using Geographical Information Systems (GIS) software packages so that all the data are available at a uniform spatial resolution.


      Most importantly, data treatment includes modeling to fill in species distributions, when there is presence-only data.


      Such models are important because there exists very little reliable presence-absence data for most taxa in most places in the world.


 

*      Modeling species distributions, often called “niche modeling” (Moritz et. al 2001; Peterson et. al 1999), has emerged as a major strategy of data treatment to fill in species distributions for areas that have never been adequately surveyed so that conservation planning can proceed as efficiently as possible.


      All popular data treatment models use as input: environmental data on the planning region, and as many presence-absence or presence-only geo-referenced records of species’ occurrences as possible.


      A variety of modeling techniques exist and these are being extended in research efforts in many laboratories around the world.


       Ecologically specific models: use the best available ecological knowledge to predict species distributions but have had, at best, a moderate record of success besides often requiring data that are not available in most conservation contexts.


       Regression and other statistical association (correlation) models simply try to use correlations to predict species distributions—see Example 4.3.


       Heuristic methods are based on different statistical associations, such as Bioclim (Nix 1986; Busby 1991), and have been successfully used in the past.


       Machine Learning Methods have led to software packages such as Maxent (Phillips et. al 2006) and GARP (Stockwell and Peters 1999) which are being increasingly used with predictive success to model species distributions. These packages use algorithms that learn from past successes (in finding species at particular times and spaces) to predict future association between species’ presences and environmental variables –see Example 4.4.

 

 

Example 4.3

 

Distribution of Eucalyptus radiata in New South Wales

(Nicholls 1989)

 

When there is only presence-absence data, regression analyses such as Generalized Linear Modelling (GLM) can be used to estimate species distribution patterns.  Nicholls (1989) used the GLM analysis with presence-absence data on coastal hardwood forests in New South Wales, Australia (Figure 4.3a).  Each presence or absence record was then plotted in a plane in which the x-axis represented the mean annual rainfall of the record and the y-axis represented the mean annual temperature of the record. (Figure 4.3b). 

 

A forward stepwise procedure was used to find the variables that were important for explaining the distribution of the species.  It was found that temperature, altitude, and lithology (sediment characteristics) were the most important variables for predicting the distribution of E. radiata.   Since temperature and altitude are strongly correlated with each other (e.g., lower altitudes are warmer and higher altitudes are cooler), the two final variables used to predict E. radiata distribution were temperature and lithology. 

 

However, upon further examination of the data, it was found that the lithological data for all the tested categories was not complete, and therefore the predicted distribution was null (false).  The model was re-fitted and the variables were tested again in the stepwise procedure.  Next, it was found that after running the GLM, if the known lithological variables were counted separately (in this case fine grained sediments) and compared to temperature, the distribution (for fine grained sediments in particular) looked contoured (Figure 4.3c).  Finally, after running the procedure again, temperature was compared to all known lithological data types and a resulting distribution was found (Figure 4.3d)

 

 

Figure 4.3a

Geographical Distribution of E. radiata from Survey Records

 

Figure 4.3b

The Mean Annual Temperature and Mean Annual Rainfall of the E. radiata Records

 

 

Figure 4.3c

Predicted Distribution of E. radiata in Environmental Space based on Fine-grained Lithology

              

 

 

Figure 4.3d

Predicted Geographical Distribution and Probability of Occurrence with All Sediment Types

 

 

 

 

Example 4.4

 

Distribution of the brown-throated three-toed sloth, Bradypus variegatus and a small-bodied rodent, Microryzomys minutus in South America

(Phillips et al. 2006)

 

The Maximum Entropy (Maxent) technique is a niche-modeling tool that uses algorithms that learn from past success to predict future association between species’ presences and environmental variables.  It is most useful for presence-only data.  In this case of the brown-throated three-toed sloth, Bradypus variegatus and a small-bodied rodent, Microryzomys minutus, the geographical locations from presence-only data were mapped (see Figure 4.4a). 

Figure 4.4a

Occurrence Records for Bradypus variegatus and Microryzomys minutus

Records for Bradypus variegatus (left) and Microryzomys minutus (right), as derived from vouchered (verified) museum specimens (Phillips et. al 2006).

 

The following explanatory variables were used by MaxEnt to construct the models of the species' potential distributions: climatic variables, elevation, and potential vegetation.    The climatic variables were annual cloud cover, annual diurnal (daytime) temperature range, annual frost frequency, annual vapor pressure, January, April, July, October, annual precipitation, and minimum, maximum, and mean annual temperature.

 

Maxent used two environmental “suites”: (i) climate, elevation and potential vegetation together and (ii) climate and elevation.  The final predictions, showing the likelihood of occurrence of each species in each site, are pictured in Figure 4.4b.

Figure 4.4b

Predicted Potential Geographic Distributions for Bradypus variegatus and Microryzomys minutus

Distributions for Bradypus variegatus (left) and Microryzomys minutus (right), as predicted by Maxent. Four colors are used to indicate the strength of the predictions with darker colors indicating stronger predictions (Phillips et. al 2006).

 

 


   
 
Assess Your Knowledge
M1: Introduction to Conservation Area Networks
M2: Systematic Conservation Planning Overview
M3: Stakeholder Identification and Involvement
M4: Data Compilation, Assessment, and Treatment
M5: Surrogacy Identification and Analysis
M6: Conservation Targets and Goals
M7: Review Existing Conservation Areas
M8: Place Prioritization
M9: Vulnerability and Persistence Analysis
M10: Network Refinement Protocol
M11: Multiple Criteria Analysis
M12: Implementation of Conservation Plan
M13: Periodic Network Reassessment
M14: Conclusion and Review - Future Directions

 

Systematic Conservation Planning Modules
M1: Introduction to Conservation Area NetworksM8: Place Prioritization
M2: Systematic Conservation Planning OverviewM9: Vulnerability and Persistence Analysis
M3: Stakeholder Identification and InvolvementM10: Network Refinement Protocol
M4: Data Compilation, Assessment, and TreatmentM11: Multiple Criteria Analysis
M5: Surrogacy Identification and AnalysisM12: Implementation of Conservation Plan
M6: Conservation Targets and GoalsM13: Periodic Network Reassessment
M7: Review Existing Conservation AreasM14: Conclusion and Review - Future Directions
Module References Module Glossary
Welcome Page