Mapping data gaps to estimate biomass across Brazilian Amazon forests

Tropical forests play a fundamental role in the provision of diverse ecosystem services, such as biodiversity, climate and air quality regulation, freshwater provision, carbon cycling, agricultural support and culture. To understand the role of forests in the carbon balance, aboveground biomass (AGB) estimates are needed. Given the importance of Brazilian tropical forests, there is an urgent need to improve AGB estimates to support the Brazilian commitments under the United Nations Framework Convention on Climate Change (UNFCCC). Many AGB maps and datasets exist, varying in availability, scale and coverage. Thus, stakeholders, policy makers and scientists must decide which AGB product, dataset or combination of data to use for their particular goals. In this study, we assessed the gaps in the spatial AGB data across the Brazilian Amazon forests not only to orient the decision makers about the data that are currently available but also to provide a guide for future initiatives. We obtained a map of the gaps in the forest AGB spatial data for the Brazilian Amazon using statistics and differences between AGB maps and a spatial multicriteria evaluation that considered the current AGB datasets. The AGB spatial data gap map represents areas with good coverage of AGB data and, consequently, the main gaps or priority areas where further biomass assessments should focus, including the northeast of Amazon State, Amapá and northeast of Pará. Additionally, by quantifying the variability in both the AGB maps and field data on multiple environmental factors, we provide valuable elements for understanding the current AGB data as a function of climate, soil, vegetation and geomorphology. The map of AGB data gaps could become a useful tool for policy makers and different stakeholders working on National Communications, Reducing Emissions from Deforestation and Degradation (REDD+), or carbon emissions modeling to prioritize places to implement further AGB assessments. Only 0.2% of the Amazon biome forest is sampled, and extensive effort is necessary to improve what we know about the tropical forest.


Background
Tropical forests play a fundamental role in the provision of ecosystem services such as biodiversity, food production, traditional knowledge and carbon cycling. Aboveground biomass (AGB) estimates are needed to understand the role of tropical forests in the global carbon budget (Pan et al. 2011).
In the Brazilian Amazon, the total AGB stock has been estimated by several sources, including forest inventory plots and remote sensing approaches (Saatchi et al. 2011(Saatchi et al. , 2015Baccini et al. 2012). Given the extension, complexity and diversity of landscapes in tropical forest areas, remote sensing is one of the best tools for estimating AGB (Saatchi et al. 2011(Saatchi et al. , 2015. However, remote sensing methods are still dependent on the availability of AGB field data (e.g. inventory plots) to ensure proper calibration and validation and spatial extrapolation methods (Mitchard et al. 2014;Saatchi et al. 2015).
Differences in remote sensing products and field data have resulted in great discrepancies in the spatial distribution of AGB estimates on different AGB maps (Mitchard et al. 2014;Ometto et al. 2014;Tejada 2014). Previous studies have indicated that considerable spatial uncertainties exist in biomass estimates (Ometto et al. 2014;Tejada 2014). To tackle the uncertainty associated with biomass estimates, the IPCC guidelines on greenhouse gases (GHGs) (IPCC 2006) suggest using environmental factor maps to find classes or strata with homogeneous AGB (a process known as stratification). Nonetheless, stratification has inherent methodological challenges, such as selecting the environmental factor maps with proper classification schemes and quality as a function of the scale (IPCC 2006;Angelsen et al. 2012).
There is an urgent need to improve and validate biomass estimates to support Brazilian commitments in the context of climate change, such as the National Communications (NC) and Reducing Emissions from Deforestation and Degradation (REDD+) commitments under the United Nations Framework Convention on Climate Change (UNFCCC). Progressive evolution is expected because these aspects are a growing concern in the scientific and political communities (MMA 2015;Fearnside 2018). While improvements are not available, estimations have been performed using the current and available AGB databases and environmental factor maps. Whether for NC, REDD+ or carbon emissions modeling, stakeholders, policy makers and scientists have to decide which AGB product, dataset or combination of data to use based on the availability, scale and coverage.
In this study, we assessed the gaps in the spatial AGB data across the Brazilian Amazon forests not only to orient the decision makers about what data are currently available but also to provide a guide for future initiative support. To achieve this goal, we used the current AGB dataset coverage and analyzed the differences in the AGB maps. We contrasted the AGB maps and the RadamBrasil field data within different environmental factor maps, such as climate, soil, vegetation and geomorphology maps. The previous results were merged, and we obtained the gaps in the forest AGB spatial data referring to the places where data acquisition should be improved. In other words, we assessed priority areas for further AGB assessments in the Brazilian Amazon.

Study area
The Brazilian portion of the Amazon Basin has an area of 3,869,653 km 2 and covers 60% of the basin (Fig. 1). This study focuses on only the forest area considered intact by the Deforestation Monitoring Program (PRODES) in 2014 (~3,139,172 km 2 ) (INPE 2015) within the Brazilian Amazon biome (IBGE 2004a).

AGB field and laser data
We used extensive AGB data, which were collected by contacting the most important research groups involved  Tejada et al. (2019) in the subject. Both the data locations and methodology were registered in a geospatial database in Tejada et al. (2019). This study recorded 5351 AGB plots and 619, 858 ha of airborne laser scanning data in the Brazilian forest biome (Fig. 2 and Table 1).

Forest AGB maps
We chose five published AGB maps at the pantropical or Brazilian Amazon scale. At the pantropical scale, we selected the AGB maps published by Saatchi et al. (2011), Baccini et al. (2012 and Avitabile et al. (2016). The first two maps used LiDAR remote sensing data to extrapolate the field data. The AGB map of Avitabile et al. (2016) combined the maps from Saatchi et al. (2011) andBaccini et al. (2012) and included additional field data. At the Brazilian Amazon scale, we used the AGB maps published by Nogueira et al. (2015) and the third National Communication of Brazil (MCT 2016); both maps are based on field data extrapolated using vegetation classes. The AGB maps and their main characteristics are described in Table 2.

Environmental factor maps
We gathered information on five environmental factors: vegetation, soil, climate, topography and geomorphology. The maps came from different sources and are detailed in Table 3; further information is provided in Tejada et al. (2019).
The vegetation map of Brazil (IBGE and USGS 1992) was digitalized by the U.S. Geological Survey. The vegetation map (SIVAM 2002) was based on radar images and field work during the RadamBrasil project (RadamBrasil 1983) and was updated based on the SIVAM (Sistema de Vigilância da Amazônia) project in 2002 (Malkomes et al. 2002). In 2004, the IBGE published a wall-to-wall map series at a 5 million scale, including the vegetation map of Brazil (IBGE 2004b), to reconstruct the original vegetation cover. The Brazilian Biological Diversity Project (PROBIO) combined all the previous vegetation mapping efforts by SIVAM, RadamBrasil, PRODES and IBGE (among many others) to generate a unique geographic database for the Amazon biome (MMA 2006a).
The soil map of Brazil (IBGE 2001) is part of the IBGE wall-to-wall maps at a 5 million scale using the Embrapa soil classification system and RadamBrasil data. The soil data were taken from the Legal Amazon map that was produced by the Ministry of Environment of Brazil (MMA) via the Environmental and Ecological Zoning project (ZEE) in the context of the scenarios for the Legal Amazon project and the IBGE (MMA 2006b). At the Amazon basin scale, the soil map from Quesada et al. (2011) was created using references for the RAINFOR forest sites with soil data. The map of the soil carbon stocks from Bernoux et al. (2002) links the vegetation and global soil classes of the IPCC (2006).
The climate map of Brazil is an update of a previous climate map from 1978 (Nimer 1979) that reflects the climate zones, thermic regions and wetness expressed by dry months (IBGE 2002a). The water deficit map shows the cumulative water deficit from 1988 to 2014 calculated by Fonseca et al. (2016) using Tropical Rainfall Measuring Mission (TRMM) data.
The relief map is part of the 4th IBGE Atlas (IBGE 2002b). To improve the original classification (i.e. the relief map), the relief map units were based on geomorphology classes at a 5 million scale and remote sensing images from the SIVAM project (IBGE 2006). In the context of the ZEE project, we used the geomorphology map of the Legal Amazon (MMA 2006b) at 1:250000, which also used satellite images.

Analyses of forest AGB variability and environmental factors
First, we performed a variability analysis between the forest AGB maps, RadamBrasil field data and environmental factors. Then, the differences between the AGB maps were analyzed. For both analyses, we standardized the carbon pools of the AGB maps, removing the belowground biomass (BGB). The biomass maps from Saatchi et al. (2011), Baccini et al. (2012 and Avitabile et al. (2016) considered AGB and not BGB. However, MCT (2016) and Nogueira et al. (2015) considered both AGB and BGB. To compare these maps, we removed BGB using expansion factors (BGB is 25.8% of AGB) according to Nogueira et al. (2018).   The variability in the AGB maps within the different environmental factor maps (soil, vegetation, topography and climate) was measured in terms of population variance (considering every environmental factor map, Eq. 1) and stratified variance (SV) (considering the environmental factor map classes, Eq. 2). The population and SV in the RadamBrasil field plot data were also calculated in each environmental factor map to compare the variance in the field data versus the variance in the AGB maps to see if the tendency of the AGB in the maps was corrected. As we had access to the data from the RadamBrasil field plots, we assumed that the AGB remained stable unless the area was deforested (we removed the deforested areas with the PRODES mask), as was assumed by many AGB maps that used this dataset (e.g. MCT 2010MCT , 2016Nogueira et al. 2015).
Eq. 1 global variance where X i is an observation, μ is the population mean, and N is the population size.
Eq. 2 stratified variance where s 2 is the total stratified variance, n is the size of stratum j, N is the population size and s j is the sample variance in stratum j. We expect that each class of an environmental factor map should be homogeneous. Therefore, the AGB should exhibit a smaller variance within a class than in the entire map. Stratification could help to reduce the cost and effort required to sample large areas by calculating the number of AGB plots needed to represent each class (Pearson et al. 2005;IPCC 2006).
We carried out SV analysis to identify the environmental factor maps (and classes) with low variance in the AGB maps and RadamBrasil. We expected that a class with lower AGB variance would better represent the AGB.
The differences between the AGB maps were analyzed from the two-by-two differences in the five AGB maps (Saatchi et al. 2011;Baccini et al. 2012;Nogueira et al. 2015;MCT 2016;Avitabile et al. 2016), generating 10 maps. Then, we calculated the cell statistics by combining all the AGB maps to obtain the average, standard deviation and range to summarize the tendencies.

Map of the gaps in forest AGB spatial data
To obtain the forest AGB data gap map, we performed a spatial multicriteria evaluation (SMCE) in the GIS integrated land and watershed management information system (ILWIS) environment (Meijerink et al. 1988) using the distance maps from the LiDAR transects and AGB plots and the standard deviation map of all AGB maps as inputs. For the SMCE, all the input maps were previously standardized to make them fully comparable, converting the original values to a 0 to 1 range, as shown in Fig. 3.
The distance and the standard deviation maps were conceived as a benefit factor, which, under the ILWIS-SMCE criterion, means that the higher the value is, the more it contributes to the goal. In this case, the goal is to map the gaps in the representativeness of the AGB data, including AGB maps and plots. Thus, areas with greater distances to the sampling plots or LiDAR transects and with higher standard deviation are more likely to be considered gaps. Areas with shorter distances to plots and high standard deviation will have an intermediate weight in the gap map.

Forest AGB maps and environmental factors
As expected, the global variance in each AGB product (Fig. 4a) was higher than the SV considering each environmental factor (Fig. 4b). The soil maps had the highest SV among all environmental factors, except for the IBGE (2002a, 2002b) climate map. Relief, geomorphology, and the two vegetation maps (SIVAM 2002;MMA 2006a) showed the lowest SV. Climate had the highest SV. It is particularly interesting that the IBGE (2006) relief map had the lowest SV in the AGB maps among all environmental factor maps, being lower than even the PROBIO vegetation map (MMA 2006a), which has more detailed classes (except for the SV of the RadamBrasil field data).
The RadamBrasil field data had higher SV values than the AGB maps (Fig. 4b) Analysis of the climate maps ( Fig. 5a and b) indicated that the classes with high precipitation and low water deficits were those with low SV. The climate maps with a few large classes had high SV.
The tendency among the five vegetation maps (Figs. 5cg) is a high SV in the central Amazon (lowland dense humid forests or Db) close to the main rivers and in the northeast (submontane dense humid forest or Ds). The large sizes of these two vegetation classes coincide in all vegetation maps and cover almost 50% of the total area (Fig. 5c, e, f and g), except for in the PROBIO vegetation map where these classes cover 30% of the area (Fig. 5d). If we consider the first 5 classes, they cover 70% of the area of all vegetation maps (except for in the PROBIO map, where they cover 57% of the area), showing that few classes represent large areas, reflecting high SV.
The vegetation maps with more classes, such as the PRO-BIO map (Fig. 5d) with 298 classes and SIVAM (2002) with 80 classes, have low SV. Additionally, the RadamBrasil field data have the lowest SV in these vegetation maps.
Of all the environmental factor maps, the IBGE (2006) relief (or geomorphology) map has the lowest SV with 69 classes (Fig. 6c); only two classes have high SV, depression of the Solimões River and depression of Southern Amazonia. However, these classes represent only 19% of the area, and the first five classes represent only 36% of the total map area.
The MMA (2006b) geomorphology map with 64 classes (Fig. 6a) presents low SV, with the exception of the convex dissection (Dc 53) and pediplain exposed (Pru) classes, which both covered 25% of the area, and the first five classes covered 40% of the total map area. The IBGE (2002b) relief map with only half as many classes (32) has a high SV (Fig. 6b), and the first 5 classes represent Fig. 3 Forest AGB spatial gap mapping flowchart using a spatial multicriteria evaluation (SMCE) Fig. 4 Global and stratified variance in the AGB maps and the RadamBrasil data within each environmental factor map. a Global variance in each AGB map and RadamBrasil data; b stratified variance in the AGB maps and RadamBrasil AGB field data within each environmental factor map 70% of the total map area. This result indicates that many classes with uniform sizes (Fig. 6a and c) have lower SV than those with few large classes (Fig. 6b).
Considering the SV in each class of each environmental factor, soil had the highest SV. The high activity clay soils with an open Amazon forest class, the southwest classes (Fig. 7d), and the podzol hydromorphic class (Fig. 7e) have low SV, possibly because of the small sizes of these classes. The soil map of Bernoux et al. (2002) indicates that the Nogueira et al. (2015) AGB map has a lower SV than the rest of the maps, possibly because it considers vegetation, while the rest of the AGB maps have high SV. The first two classes of the soil maps covered almost 50% of the total area, showing many large classes with high SV and few with low SV.

Analysis of differences between forest AGB maps
As the AGB maps are a result of several AGB datasets, it is interesting to determine where the main differences and similarities in the AGB estimates occur. It is assumed that the places with the greatest AGB similarities are the places with better biomass estimates (Fig. 7).
The main difference between the Saatchi et al. (2011)   By calculating the cell statistics throughout the AGB maps, it is possible to see that the extreme differences are next to rivers, mainly the Amazon River in Amapá and northeast of Pará. The standard deviation (Fig. 8) calculated from this set of AGB maps objectively explains the magnitude of these differences. Additionally, the range, which is the difference between the maximum and minimum AGB values, represents the discrepancies among the AGB maps. Most of the differences in the standard deviation map are found in the west-central and northwestern Amazon. The extremes are along the riverbanks in Amapá and northeast of Pará.

Forest AGB spatial data gap map
The final map of the gaps in the AGB spatial data (Fig. 9) shows the areas with high gaps in red and the areas that have moderate coverage of field and LiDAR data in orange where the differences in AGB are intermediate. Yellow areas are the places that have good coverage of AGB plots and LiDAR transects and where the AGB maps exhibit great similarities. Consequently, the main gaps or priority areas where further biomass assessments should be focused are the northeast of Amazon State, Amapá, northeast of Pará and along the rivers.

Discussion
According to Goetz et al. (2009), there are different approaches to map carbon stocks: the direct remote sensing (DR) approach and the stratify and multiply (SM) approach. According to our analysis, AGB maps that are derived from the DR approach (e.g. Saatchi et al. 2011;Baccini et al. 2012;Avitabile et al. 2016) have lower AGB values than the maps derived from the SM approach (e.g. Nogueira et al. 2015;MCT 2016) (see Fig. 7). The reason for this difference is that the maps created using the DR approach reflect the actual biomass and consider forest degradation (deforestation areas were removed by using a forest mask), while the SM maps represent the potential biomass per vegetation class. The differences between the DR maps are located in specific places (west Amazon, Amapá, northeast of Pará), while there are larger areas with substantial differences in the SM maps due to the large areas with high biomass values (whole Amazon State, west Pará and the same places as those in the DR maps) (Figs. 7 and 8). On the other hand, the differences in scale between the SR and SM maps are worth mentioning. The DR maps represent a pantropical scale, which applies general assumptions to extrapolate the AGB, while the SM maps are conceived specifically for the Brazilian Amazon and adopt local assumptions.
To obtain a stratification adherent to the IPCC (2006) guidelines and the Voluntary Carbon Standard (VCS 2015), the ideal is to combine environmental factors to represent the AGB distribution in the Brazilian Amazon. Our SV results show great variation in terms of the SV between the AGB maps and AGB RadamBrasil field data. The reason could be the different acquisition-generation dates between RadamBrasil field data (from 1973 to 1983) and some AGB maps (i.e. Nogueira et al. 2015;MCT 2016). The AGB maps of MCT (2016) and Nogueira et al. (2015) (both with SM approach) used the RadamBrasil field data that do not consider the degradation of later years, since our mask removes only deforested areas. However, the AGB maps of Saatchi et al. (2011), Baccini et al. (2012 and Avitabile et al. (2016) (all with the RS approach) considered degradation and represent the 2000s. Another reason why RadamBrasil field data have higher variance is that they alone do not represent the large size of the main vegetation classes, which is why the AGB maps used other inputs, such as remote sensing images and models.
The SV analysis also showed that the number and size of environmental factor classes influence the variance. The PROBIO map (MMA 2006a) with 298 classes has the lowest SV of all the vegetation maps, followed by the SIVAM (2002) with 80 classes. The relief map from the IBGE (2006) with 69 classes also has low SV. The RadamBrasil field data served as a reference during the calculation of the AGB with the field plots, and in these three maps, the SV was lower than that in the rest of the environmental factor maps. and Baccini et al. (2012), having relation with the maps that they used.
Vegetation map (SIVAM 2002) I: Both used satellite images I: Both used SRTM, and satellite images D: Nogueira et al. (2015) used this map The geomorphology map of MMA (2006b) with 64 classes had low SV values, except in the RadamBrasil field data. The relief map (IBGE 2002b) and vegetation maps (IBGE and USGS 1992;IBGE 2004b;MCT 2010) with few classes had moderate SV, while the three soil maps and the climate map (IBGE 2002a) had the highest SV. The water deficit map was a continuous map, so we could classify the map into more classes, and the SV had relatively low values. This map could be further explored for its use as an AGB indicator.
Many AGB maps exhibit a direct relationship with one or many environmental factor maps. A specific environmental factor map could be used to produce the AGB map (Table 4). Although direct and indirect relationships exist, we chose to keep all the variance analyses to see how these relationships influence the SV. It could be useful to understand when the variance is reduced due to dependency. For example, the low SV in the Nogueira et al. (2015) map in the SIVAM (2002) vegetation map is due to a direct relationship between them (this vegetation map was used to extrapolate AGB) ( Table 4) The comparison analysis between all AGB map statistics (Fig. 8) reveals that the areas with high standard deviations coincide with the areas of high SV in the three vegetation maps (i.e. SIVAM 2002;IBGE 2004b;MMA 2006a), while no such matches were found in the rest of the environmental factors. This result could mean that there is high uncertainty in the central Amazon (lowland dense humid forests or Db) close to the main rivers and in the northeast (submontane dense humid forest or Ds) due to the large size of these vegetation classes; thus, there should be further analysis in these areas (e.g. plot establishment). The same pattern occurs in the forest AGB spatial data gap map.
The forest AGB spatial data gap map (Fig. 9) shows the places with few or no AGB field plots or LiDAR datasets, which are also the places where the AGB maps differ most. In other words, the map of the AGB data gaps represents the priority areas for further AGB assessments. The 5351 AGB field plots that we accessed represent only 0.001% of the Brazilian Amazon biome area, and the LiDAR data represent 0.197%, meaning that less than 0.2% of the forest area is sampled (Tejada et al. 2019). Areas with medium weight in the data gap map, where there is a short distance to plots but high standard deviation between AGB maps, could mean that AGB plots and LiDAR transects data were not used for generating AGB maps (perhaps for the limited access to these datasets). The places with the greatest gaps are close to rivers in the States of Amazon, Amapá and northeast and west of Pará, coinciding with the two major vegetation classes (Db and Ds). The vegetation map was used by the Nogueira et al. (2015) and MCT (2016) AGB maps. Considering the large extent, accessibility difficulties and costs to establish field and airborne LiDAR AGB assessments in the Brazilian Amazon, this information is of high relevance for designing further studies.

Conclusions
The map of the forest AGB spatial data gaps represents the zones with limited information and where the AGB map estimates differ the most. Only 0.2% of the Amazon biome forest is sampled, and extensive effort is necessary to improve what we know about the tropical forest.
The variance analysis between the environmental factors and AGB data showed that it is important to correctly find an environmental class (or a combination of classes) that represents the AGB as a guideline (IPCC 2006;VCS 2015) to assess the biomass according to the NC and REDD+ recommendations. Our SV analysis should serve as a reference for AGB products and their relationship with environmental factors, not only in Brazil but also in the rest of the countries that will try to obtain AGB maps using IPCC (2006) guidelines recommended under REDD+ projects.
The AGB data gap map could become a useful tool for policy makers and different stakeholders working on NC, REDD+, or carbon emissions modeling to prioritize places to implement further AGB assessments.