Skip to main content

Table 1 Occurrence categories identified by our data cleaning workflow. A list of potential actions (non-comprehensive) are outlined to emphasize the potential of the workflow to improve current datasets

From: Big data of tree species distributions: how big and how good?

Label Label name Label information (I) and potential actions to be developed (A)
H Missing coordinates (I) No detected coordinates.
(A) Trace back record and assign coordinates.
G Duplicated records (I) If record is duplicated within the environmental grid cell, it may give information of sampling effort.
F Unknown range (I) Not known range.
(A) Double check country-level databases and invasive registries of countries where the record occurs. If present, update database.
(A) Re-check common coordinate errors (Yesson et al. 2007).
E Missing environmental information or unlikely environment (botanic garden) (A) Check suitability of spatial layers.
(A) Confirm botanic garden location.
(A) Re-check common coordinate errors (Yesson et al. 2007).
D Geographic coordinate issues and environment issues (A) Re-check common coordinate errors (Yesson et al. 2007).
(A) Check values in environmental layers.
C Geographic coordinate issues (A) Re-check common coordinate errors.
B Environmental space issues (A) Check values in environmental layers.
A No issues detected (A) Send a ‘thank you’ email to the database custodian.
AA High precision (A) Send a ‘thank you’ email to the database custodian.
AAA High precision and low environment uncertainty (A) Send a ‘thank you’ email to the database custodian.