The proposed approach operates exclusively on the point cloud graph. The key idea is to walk each node through the graph edges and search for the correct root nodes. We use the height information that is inherently embedded in the node features to find valid root nodes. Ultimately, all nodes that arrive to the same root node belong to the same cluster (i.e., tree). The entire flowchart is illustrated in Fig. 2. We describe below in details the key steps in constructing the point cloud graph, pathing through the graph, and determining the node clusters.

### Graph construction

A point cloud graph \(\mathcal {G} = (\mathcal {V}, \mathcal {E}, \mathcal {W}\)) consists of a set of vertices \(\mathcal {V}\) (i.e., nodes), edges \(\mathcal {E}\) representing a set of unordered pairs of vertices, and edge weights \(\mathcal {W}: \mathcal {E} \in \mathbb {R}\) mapping the Euclidean distances between graph nodes.

Conventionally, there exists several methods to convert a point cloud into a graph. The most widely used *k*-nearest neighbors (KNN) graph connects a point to its neighbors to generate a graph. Similarly, the neighboring connectivity can otherwise be defined by bounding points in a given radius *R*. On the other hand, such connectivity property can be directly modeled by the Delaunay triangulation. Each method has certain advantages and drawbacks. For example, KNN generates a balanced graph, but is vulnerable to distanced neighbors with dispersed edges. Moreover, KNN may be trapped to locally dense regions and results in a disconnected graph. *R* radius graph is robust to distanced neighbors but is sensitive to heterogeneous point densities. Lastly, Delaunay triangulation produces a fully connected graph but is vulnerable to noise and distanced neighbors as well. Ben-Shabat et al. (2018) compared all these methods for constructing a graph while aiming for point cloud segmentation, and recommended KNN based on their results.

In our study, we leverage the advantages of KNN and Delaunay triangulation graph by merging them to generate a hybrid graph (Fig. 2). Specifically, we first construct a KNN (i.e., *k* = 10) graph based on the KD-Tree structure. We then prune the graph to remove dispersed edges. For each node, its edges that are one standard deviation anyway from the average length of edges connected to it are removed. In this way, the graph is locally pruned and optimized. Subsequently, the Delaunay triangulation is applied to generate a fully connected graph. Similarly, we prune the Delaunay graph to remove long edges. However, the pruning is instead performed globally by excluding edges whose lengths are longer than the 80th percentile of all edge. A sensitivity analysis of these pruning criteria is give in the results section. Consequently, the pruned edges from both graphs are then merged. To do so, we achieve a hybrid graph with rich and continuous point connectivity by taking advantages of both techniques and avoided undesirable edges. Figure 3 illustrates the advantage of such as a hybrid graph.

### Node pathing

The key step of our method is to walk each note within the graph (i.e., pathing or path finding). Initially, we move each note to its lowest neighboring node (Fig. 4b) by assuming that the tree root node would have the lowest height. The neighboring relationship is bounded by graph edges. The reached lowest node in the neighbors then becomes the seed node, and this procedure is continued until the reached node cannot be moved anymore (i.e., the lowest in the vicinity). Accordingly, we achieve an initial clustering result, in which the nodes that reach the same lowest node are grouped (Fig. 4c). The reached lowest nodes are denoted as root nodes. Nevertheless, it is obvious that tree branching structures will not follow a rigid downward centripetal orientation. Some graph nodes will eventually land at their local lowest, rather than the global lowest which is expected to be the tree root node (Fig. 4c).

We then refine the detected root nodes to locate the desirable root nodes that represent tree roots. First, we prune root nodes that are higher than a threshold *H*. This simple step already eliminates a majority of invalid root nodes. The determination of *H* depends on the quality of point clouds. For example, if a plot is well sampled by multi-scan TLS so that tree stems are clearly represented in the point cloud, the *H* value can be set to low such as 1 m. Otherwise, *H* can be increased to e.g., 3 m to mitigate the impacts from poorly sampled tree stems. Second, the remaining root nodes are further merged, as some nodes can be from the same tree stem. Specifically, we merge two root nodes if their Euclidean distance is shorter than a threshold *Ed*, and their graph distance is shorter than *n* (e.g., 3) times that of *Ed*. The graph distance is defined as the shortest path distance resolved by the Dijkstra algorithm (Dijkstra 1959). Indubitably, *Ed* is linked to the distance between two adjacent tree stems. Furthermore, the graph distance is a more robust measure than Euclidean distance in this circumstance. For example, two root nodes form two neighboring trees can be spatially near, but their graph distance is either infinite or very large. Therefore, by evaluating the graph distance, two spatially near root nodes can be further investigated if they are from the same tree or not.

Consequently, only valid root nodes are retained. All graph nodes that are initially landed at those invalid nodes are routed to a specific corresponding valid root node that has the shortest path to them (Fig. 4d).

### Node clustering

The above-mentioned pathing step allocates each graph node to a root, which is essentially a clustering procedure. Therefore, the outcome of this method is a number of point clusters that represent individual trees. It is noted that our method can be easily operated on a coarse point cloud first (e.g., superpoints), and the results can be mapped back to full resolution. In this study, we sampled points from uniformly distributed 10 cm voxels to extract trees first, and the results are encoded with original resolution. This further accelerates the processing.

### Assessments

The assessments of our method is on two parts, tree locations and detailed crown segmentation.

Tree locations are assessed following the metrics used in the TLS benchmarking project (Liang et al. 2018). Three metrics including the completeness, the correctness, and the mean accuracy of detection are calculated. The completeness measures the percentage of detected reference trees. The correctness measures the percentage of detected trees against references. The mean accuracy is the joint metric based on the completeness and correctness, given by:

$$\begin{array}{*{20}l} \text{Completeness}=\frac{n_{\text{match}}}{n_{\text{ref}}}, \end{array} $$

(1)

$$\begin{array}{*{20}l} \text{Correctness}=\frac{n_{\text{match}}}{n_{\text{extr}}}, \end{array} $$

(2)

$$\begin{array}{*{20}l} \text{Mean \ accuracy}=\frac{2 n_{\text{match}}}{\left(n_{\text{ref}}+n_{\text{extr}}\right)}, \end{array} $$

(3)

where *n*_{match} is the number of detected reference trees, *n*_{ref} is the number of reference trees, and *n*_{extr} is the number of detected trees.

On the contrary, the Intersection over Union (IoU), a standard metric for segmentation evaluation measure, is used to assess the detailed crown segmentation. For an *N*×*N* confusion matrix (*N*= 36 in this study), each entry *c*_{ij} refers to the number of points from reference tree *i* predicted as tree *j*. Then the IoU of tree *i* is calculated as:

$$ \mathrm{I o U}_{i}=\frac{c_{i i}}{c_{i i}+\sum_{j \neq i} c_{i j}+\sum_{k \neq i} c_{k i}}. $$

(4)

The mean IoU (mIoU) of all trees is then estimated by:

$$ \text{mIoU} = \frac{\sum_{i=1}^{N} \mathrm{I o U}_{i}}{N}. $$

(5)

We additionally assessed our results for estimating crown area and tree volume. Crown area was calculated as the vertically projected bounding area of a tree, and tree volume was the volume of convex hull of the entire tree. The root mean square error (RMSE) and its relative value were reported as:

$$\begin{array}{*{20}l} \text{RMSE}=\sqrt{\frac{1}{k} \sum_{i=1}^{k}\left(y_{i}-\widehat{y}_{i}\right)^{2}}, \end{array} $$

(6)

$$\begin{array}{*{20}l} \mathrm{RMSE (\%)}=100\% \times \frac{\text{RMSE}}{\bar{y}}, \end{array} $$

(7)

where *k* is the number of observation data, \(\widehat {y}\) denotes the reference value and \(\bar {y}\) is the mean value of the variable.

### Method comparisons

Several studies that also used the benchmark dataset from Finland reported their accuracies of locating single trees. We hereby performed quantitative comparisons of our results with two state-of-the-art approaches in Zhang et al. (2019) and Wang (2020). However, unfortunately, it is unfeasible to compare our results on detailed crown segmentation with both approaches, as only Wang (2020) was able to segment individual crowns, while Zhang et al. (2019) only detected tree locations.