Skip to main content

Table 1 Occurrence categories identified by our data cleaning workflow. A list of potential actions (non-comprehensive) are outlined to emphasize the potential of the workflow to improve current datasets

From: Big data of tree species distributions: how big and how good?

Label

Label name

Label information (I) and potential actions to be developed (A)

H

Missing coordinates

(I) No detected coordinates.

(A) Trace back record and assign coordinates.

G

Duplicated records

(I) If record is duplicated within the environmental grid cell, it may give information of sampling effort.

F

Unknown range

(I) Not known range.

(A) Double check country-level databases and invasive registries of countries where the record occurs. If present, update database.

(A) Re-check common coordinate errors (Yesson et al. 2007).

E

Missing environmental information or unlikely environment (botanic garden)

(A) Check suitability of spatial layers.

(A) Confirm botanic garden location.

(A) Re-check common coordinate errors (Yesson et al. 2007).

D

Geographic coordinate issues and environment issues

(A) Re-check common coordinate errors (Yesson et al. 2007).

(A) Check values in environmental layers.

C

Geographic coordinate issues

(A) Re-check common coordinate errors.

B

Environmental space issues

(A) Check values in environmental layers.

A

No issues detected

(A) Send a ‘thank you’ email to the database custodian.

AA

High precision

(A) Send a ‘thank you’ email to the database custodian.

AAA

High precision and low environment uncertainty

(A) Send a ‘thank you’ email to the database custodian.