Sampling design and data collection
The study area is located in three different regions of Northern Spain. In total, seven monitoring sites amounting 90 sampling plots (100 m2 each plot) have been considered in this study. Of those plots, 39 plots correspond to pure P. sylvestris forest stands whilst 51 plots represent pure P. pinaster forest stands.
The P. sylvestris plots are located in Catalonia region (North-eastern Spain, 19 plots measured between 1997 and 2015), in Soria Province, Eastern Castilla y León region (North-Central Spain, 17 plots measured between 1995 and 2015) and in Palencia Province, Western Castilla y León region (North Western Spain, 3 plots measured between 2008 and 2015). The P. pinaster plots are located in Catalonia region (North-eastern Spain, 28 plots measured between 2008 and 2015), in Soria Province, Eastern Castilla y León region (North-Central Spain, 14 plots measured between 1997 and 2015) in Palencia Province, Western Castilla y León region (North Western Spain, 3 plots measured between 2003 and 2015) and in Valladolid Province, Western Castilla y León region (North Western Spain, 6 plots measured between 2006 and 2015).
All the considered plots have been weekly monitored during the autumn season, with data recorded for at least 9 consecutive years or even more. The permanent mushroom plots are representative of the heterogeneity of the forest areas of both pine species in northern Spain, being located in a wide range of ecological conditions and subjected to different forest management treatments (mainly forest thinning). The data were characterized by high interannual variability in mushroom occurrence and yield associated with differences in weather conditions between years and between plots, and representing different past forest management practices Additional file 1: Table S1). Weather data was obtained from the nearest weather station to each study area. More detailed descriptions of the study areas as well as sampling methodology can be found in Vásquez-Gassibe et al. (2016) and Hernández-Rodríguez et al. (2015) for Western-Castilla y León (Palencia and Valladolid provinces), Martínez-Peña et al. (2012) and Taye et al. (2016) for Eastern-Castilla y León, and de-Miguel et al. (2014) and Alday et al. (2017b) for Catalonia.
Preliminary analysis
An exploratory principal component analysis (PCA, Legendre and Legendre 1998) was performed to explore the variability among plots with respect to mushroom production and climate variables. The PCA were run with PRINCOMP procedure available in SAS version 9.4. (SAS Institute Inc. 2016).
Next, we clustered the plots based on mushroom production, climatic and site conditions. Hierarchical clustering is a method of forming clusters iteratively, starting with each object in its own cluster and then proceeding by combining the most similar pairs of clusters step by step, thus forming a hierarchy of clusters (e.g. Everitt et al. 2011). We performed hierarchical clustering on the mean values of the yields over a study period, using Euclidean distance as a measure of similarity and Ward’s minimum variance method as the clustering method. The number of clusters was selected based on the dendrogram and the cubic clustering criterion (CCC) (Milligan and Cooper 1983; Yeo and Truxillo 2005). The cluster analysis was run with CLUSTER procedure available in SAS version 9.4. (SAS Institute Inc. 2016).
Modeling mushroom production
The annual mushroom yields were modelled for the following groups of fungi separately: all ectomycorrhizal mushrooms, edible mushrooms (those ectomycorrhizal fungi considered as edible in the available fungal literature) and marketed mushrooms (ectomycorrhizal edible fungi usually sold in markets) (de-Miguel et al. 2014; Alday et al. 2017a), as a function of location and variables representing different meteorological conditions (i.e., monthly total rainfall and mean temperature), stand characteristics (i.e., stand basal area) and thinning treatment. Since, in Spain, wild edible mushrooms are usually commercialized on a fresh weight basis, fresh mushroom biomass in kg∙ha− 1∙yr− 1 was selected as the response variable by pooling the yield data of all fungal species according to the three levels of grouping. Different combinations and transformations of predictors were tested. Weather variables were aggregated in different ways, e.g., the accumulated precipitation during August and/or September (late summer) or during September and/or October (early autumn), to further test their combined effect on the response variables in addition to testing the influence of the disaggregated monthly rainfall and temperature. The different combinations of meteorological variables also aimed at testing hypothetical delayed responses of mushroom yield to the combined effect of different predictors (e.g., previous research has reported a delay of several weeks in the combined effect between rainfall events and favorable temperatures) (Martínez de Aragón et al. 2007; Martínez-Peña et al. 2012).
Since the available data are based on repeated measurements of the same sampling plots during several years, measurements taken on a given plot are likely to be more correlated than measurements taken from different plots. Similarly, measurements taken closer in time on the same plot (i.e., in a given year) are likely to be more correlated than measurements taken further apart in time. Such autocorrelation patterns implies that assumptions about error variance being independent are no longer valid (Wolfinger 1996; Littell et al. 2000). The analysis of repeated measurements requires that correlations between the observations made on the same sampling unit must be taken into account as well as possible heterogeneous variances among observations on the same plot over time. Second, data are unbalanced because the number of sample plots varied among groups and each plot there were no available data for the same years. Third, mushroom yield has a stochastic nature that coupled with the rather small size of sample plots results in the occurrence of “zero” production in many sample plots.
To deal with these characteristics of the data, hurdle models within a generalized linear mixed-effects modeling framework (GLMM) were used. Hurdle models model the zeros and non-zeros as two separate processes (Hamilton and Brickell 1983) which, as compared with single-model functions, can also provide further insight into mushroom dynamics by analyzing those factors driving mushroom occurrence and abundance separately (de-Miguel et al. 2014). Therefore, we applied this approach for the three considered groups of fungi although the percentage of zeros differs in each one, being 4.25% in the all ectomycorrhizal mushroom group, while in the edible and marketed groups the proportion of observations with zero mushroom production was 8.86% and 31%, respectively. The first part of the hurdle models aimed at predicting the probability of occurrence of mushroom production based on binomially distributed data (i.e., absence or presence) using logistic regression (Eq. 1) along with a logit link function (Eq. 2). The second part of the hurdle models aimed at predicting mushroom yield conditional on the probability of mushroom occurrence by means of Gamma regression (Eq. 3) along with a log link function (Eq. 4). Finally, the expected mushroom yield was obtained by multiplying the estimates provided by Eq. 1 and Eqs. 3 and 5.
$$ p\left({y}_{ij}=1|x\right)=\uppi (x)=\frac{1}{1+{e}^{-\left[\left({\alpha}_0+{v}_{1j}\right)+\alpha {\boldsymbol{X}}_{ij1}\right]}} $$
(1)
$$ g(x)=\log \left[\frac{\pi (x)}{1-\pi (x)}\right]=\left({\alpha}_0+{v}_{1j}\right)+\alpha {\boldsymbol{X}}_{ij1}\kern0.5em $$
(2)
$$ {\mathrm{yield}}_{cij}={\mathrm{e}}^{\beta_0+{v}_{2j}}{\boldsymbol{X}}_{ij2} $$
(3)
$$ g(x)=\log \left({\mathrm{e}}^{\beta_0+{v}_{2j}}{\boldsymbol{X}}_{ij2}\right)=\left({\beta}_0+{v}_{2j}\right)+\beta \log \left({\boldsymbol{X}}_{ij2}\right) $$
(4)
$$ {\mathrm{yield}}_{ij}=p\left({y}_{ij}=1|x\right)\bullet {\mathrm{yield}}_{cij} $$
(5)
where p(yij = 1| x) is the probability of occurrence of all ectomycorrhizal, edible and marketed mushrooms in plot i and year j, yieldcij is all ectomycorrhizal, edible and marketed mushrooms yield conditional on mushroom occurrence in plot i and year j (kg∙ha− 1∙yr− 1), yieldij is the predicted all ectomycorrhizal, edible and marketed mushrooms yield in plot i and year j (kg∙ha− 1∙yr− 1), α0 and β0 denote fixed-effects, v1j and v2j denote year random effects which were specified as crossed effects, and Xij1 and Xij2 denote vectors of predictor variables in plot i and year j.
The resulting groups of plots from the cluster analysis were included as fixed dummy variables within the models. In addition, to test whether differences among groups and years were statistically significant a repeated measures ANOVA was performed. The differences were examined using pairwise comparisons according to the Tukey method using MIXED procedure available in SAS 9.4 (SAS Institute Inc. 2016).
Several site and climatic variables, as well as their transformations were included as potential predictors in the model. Models were fitted adding 2 year random effects in the intercept of each part of the hurdle model, v1 and v2. These effects are distributed under a normal distribution with mean zero \( {\upsigma}_{\mathrm{b}}^2,{\upsigma}_{\mathrm{s}}^2 \). The unstructured covariance structure was used to describe the variance-covariance structure of the random effects (Littell et al. 2000). Plot random effect were not included in the models because the variance of the plot random effects were practically zero. All the models were fitted using the NLMIXED procedure available in SAS version 9.4 (SAS Institute Inc. 2016).
Model selection and evaluation
The models were selected and evaluated according to the following criteria: biological sense, goodness-of-fit and predictive ability. The biological sense was evaluated considering whether alternative models behaved logically, i.e., whether they represented biologically or ecologically consistent relationships between predictors and the response variables according to current scientific and expert knowledge. Only those models whose coefficients were statistically significant (p < 0.05) were further considered in the analysis.
The goodness-of-fit of the models was analyzed according to the mean bias, which reflects the deviation of model predictions against observed values, and the root mean square error (RMSE), which account for the precision of the estimates. The relative values of these statistics (BIAS%, RMSE%) were calculated by dividing the mean bias and RMSE by the mean of the predicted mushroom yield. The Akaike’s information criterion (AIC) and likelihood ratio tests were also used to further guide the selection of predictor variables to prevent overfitting by accounting for the trade-offs between model parsimony and goodness of fit. Furthermore, uncertainty was assessed also using resampling techniques, namely bootstrapping based on 2000 bootstrap samples with replacement, to ensure the stabilization of the estimates, and by computing prediction and confidence intervals accounting for the residual variance, the uncertainty in the fixed coefficients, and the uncertainty in the variance parameters of the year random effects. In addition, receiver operating characteristic (ROC) curves and the corresponding area under the ROC curve (AUC) along with their bootstrapped confidence intervals were computed for the logistic models for the probability of mushroom occurrence.