Uncovering a Random Tree
Benjamin Hackl • June 20, 2022 • AofA2022

## Uncovering a Random Tree

Joint work with Alois Panholzer and Stephan Wagner

### The Uncover Process

• Take a random labeled tree with labels $1, 2, \dots, n$
• "Cover" all vertices and edges
• Process: uncover vertex $j$ in step $j$
• Edges are uncovered if both endpoints are uncovered

### Motivation: study of coalescent models

• None of these models capture the behavior of the uncover process!

### What are we interested in?

• number of uncovered edges / clusters
• size of root cluster
• size of largest cluster

### Uncovered Edges: Generating Function

Theorem. Let $1 = j_0 < j_1 < \cdots < j_r < n$ . In
$E_n = n^{n - j_r - 1} \prod_{i=1}^r \bigg(n - j_r + j_i z_i + \sum_{h=i+1}^r (j_h - j_{h-1}) z_h\bigg)^{j_i - j_{i-1}},$
the coefficient of $z_1^{a_1} z_2^{a_2 - a_1} \cdots z_r^{a_r - a_{r-1}}$ counts labeled trees of size $n$ with $a_i$ uncovered edges after $j_i$ uncovered vertices.

Key ingredient: edge weight $(i, j) \rightsquigarrow x_i y_j$, then via [Martin, Reiner] / [Remmel, Williamson]:

$\sum_{|T| = n} w(T) = x_1 y_n \prod_{j=2}^{n-1} \big( \sum_{i=1}^n x_{\min(i, j)} y_{\max(i, j)} \big)$

### Uncovered Edges: Stochastic Process

• $(K_j^{(n)})_{1\leq j\leq n}$ ... # of uncovered edges after $j$ uncovered vertices
Observation:
• with $j$ uncovered vertices, $\binom{j}{2}$ of $\binom{n}{2}$ possible edge positions are uncovered
• there are $n-1$ edges in total, all positions are equally likely
$\Longrightarrow \mathbb{E} K_j^{(n)} = (n-1) \frac{\binom{j}{2}}{\binom{n}{2}} = \frac{j(j-1)}{n}$

### Uncovered Edges: Deviation

• Linearly interpolated, continuous process: $(\tilde K_t^{(n)})_{0\leq t\leq 1}$
• Deviation: subtract mean, rescale. Limiting behavior?

### Uncovered Edges: Characterization – Statement

Theorem (H.–Panholzer–Wagner, '22).
The centered, rescaled, linearly interpolated process $Z^{(n)}(t) = \frac{\tilde K_t^{(n)} - t^2 n}{\sqrt{n}}$
converges weakly to $Z^{\infty}(t) = (1-t) W(t^2 / (1-t))$, where $(W(t))_{t\geq 0}$ is the standard Wiener process.
• Note: $Z^{\infty}$ is almost a Brownian bridge, $(1-t) W(t/(1-t))$

### Uncovered Edges: Characterization – Ingredients

Tightness

$\mathbb{P}\big(\sup_{t\in[0,1]} |Z^{(n)}(t)| \geq C\big) \leq \frac{4}{(C-1)^2}$ For $C\to\infty$: process exceeds $C$ with probability $\to 0$.

Proof: construction of related martingale; application of Doob's $L^p$-inequality.

Finite-dim. distributions

\begin{align*} \mathbf{K}_{\mathbf{t}n} = \big( K_{\lfloor t_1 n\rfloor}^{(n)}, \dots, K_{\lfloor t_r n\rfloor}^{(n)} \big) \end{align*} \begin{align*} \frac{\mathbf{K}_{\mathbf{t}n} - \mathbb{E} \mathbf{K}_{\mathbf{t}n}}{\sqrt{n}} \xrightarrow[n\to\infty]{d} \mathcal{N} \end{align*}

Proof: construct p.g.f. from $E_n$, $P_n = \prod_{i=2}^{n-1} \Big( \frac{1}{n} + \frac{i}{n} z_i + \sum_{h=i+1}^{n-1} \frac{1}{n} z_h \Big),$ sum of multinomial variables! Then: multivariate central limit theorem.

### Root Cluster Size: Approach

• $\mathcal{T}^{\bullet}$ ... rooted labeled trees
• $\mathcal{G}$ ... refinement of $\mathcal{T}^{\bullet}$, uncovered vertices are marked with $U$
• $\mathcal{F}$ ... refinement of $\mathcal{G}$, vertices in root cluster marked with $V$

\begin{align*} \mathcal{F} &= \mathcal{Z} \ast \mathrm{Set}(\mathcal{G}) + \mathcal{Z}\times \{U, V\} \ast \mathrm{Set}(\mathcal{F}) \\ \mathcal{G} &= (\mathcal{Z} + \mathcal{Z}\times \{U\}) \ast \mathrm{Set}(\mathcal{G}) \\ \end{align*}

### Root Cluster Size: Approach

\begin{align*} \mathcal{F} &= \mathcal{Z} \ast \mathrm{Set}(\mathcal{G}) + \mathcal{Z}\times \{U, V\} \ast \mathrm{Set}(\mathcal{F}) \\ \mathcal{G} &= (\mathcal{Z} + \mathcal{Z}\times \{U\}) \ast \mathrm{Set}(\mathcal{G}) \\ \end{align*}

$F = z e^G + zuv e^F \qquad G = z (1 + u) e^G$

Theorem.
$R_n^{(k)}$ ... root cluster size after $k$ uncovered vertices (out of $n$) $\mathbb{E} R_n^{(k)} = \sum_{j=1}^k \frac{j k^{\underline{j}}}{n^j}$

### Root Cluster Size: Analysis

$\mathbb{E} R_n^{(k)} = \sum_{j=1}^k \frac{j k^{\underline{j}}}{n^j} \fragment{1}{= \int_0^{\infty} (x-1) e^{-x} \big(1 + \frac{x}{n}\big)^k~dx}$
Theorem (ctd').
Depending on $k = k(n)$, we find $\mathbb{E} R_n^{(k)} \sim f(n)$, with
 $k$ $k = o(n)$ $k \sim \alpha n$ $d=\omega(\sqrt{n})$$k = n-d d\sim c\sqrt{n}$$k = n-d$ $d=o(\sqrt{n})$$k = n-d$ $f(n)$ $k/n$ $\frac{\alpha}{(1 - \alpha)^2}$ $n^2/d^2$ $\kappa n$ $n - d\sqrt{\pi n /2}$

### Root Cluster Size: Probabilities

Lagrange inversion on $F$ or ...
Proposition. Labeled trees on $[n]$ without edges between $[r]$ and $[r+1, k]$ and where $[r]$ forms a subtree are enumerated by $r n^{n - k - 1} (n-k) (n - r)^{k-r-1}.$

$\mathbb{P}(R_n^{(k)} = r) = \begin{cases} 1 - k/n & \text{if } r = 0,\\ r^r (n-k) (n-r)^{k-r-1} \binom{k}{r} n^{-k} & \text{if } 1\leq r\leq k < n,\\ 1 & \text{if } r = k = n. \end{cases}$

### Root Cluster Size: Limiting Behavior

Theorem. Depending on $k = k(n)$ the root cluster size $R_n^{(k)}$ behaves like ...
• $k = o(n)$: $R_n^{(k)} \to 0$
• $k \sim \alpha n$: $R_n^{(k)} \to R_{\alpha}$, non-degenerate discrete RV
• subcritical, $k = n-d$ with $d = \omega(\sqrt{n})$: $(d/n)^2 R_n^{(k)} \to \mathrm{Gamma}(1/2, 1/2)$
• critical, $k = n - d$ with $d \sim c\sqrt{n}$: $R_n^{(k)} / n \to R(c)$
• supercritical, $k = n - d$ with $d = o(\sqrt{n})$: $(n - R_n^{(k)})/d^2 \to D$

### Counting Clusters

• $X_{n, r}^{(k)}$ ... # clusters of size $r$ after uncovering $k$ out of $n$ vertices
• Counting vertices in $T$ w.r.t. clusters: $\sum_{r=1}^{n} r X_{n,r}^{(k)}(T) = k$
Proposition. $\mathbb{E} X_{n, r}^{(k)} = \binom{k}{r} \Big(\frac{r}{n}\Big)^{r-1} \Big(1 - \frac{k}{n}\Big) \Big(1 - \frac{r}{n}\Big)^{k-r-1}$
• From before: $r n^{n - k - 1} (n-k) (n - r)^{k-r-1}$ special forests
• Cayley's tree formula: $r^{r-2}$ possible clusters, $n^{n-2}$ trees overall
• Abel's binomial theorem: $\sum_{r=1}^k \binom{k}{r} r^r (n-r)^{k-r-1} = \frac{k}{n-k} n^{k-1}$

### Largest Cluster: Characterization

Theorem (H.–Panholzer–Wagner, '22).
$c_{\max}^{(k)}(T_n)$ ... largest cluster of unif. random tree after uncovering $k$ vertices
• subcritical, $k = n - \omega(\sqrt{n})$: $c_{\max}^{(k)}(T_n)/n \to 0$ in probability,
• critical, $k = n- \Theta(\sqrt{n})$: $c_{\max}^{(k)}(T_n)/n$ converges weakly to (non-degenerate) continuous limiting distribution
• supercritical, $k = n-o(\sqrt{n})$: w.h.p. there is a giant component of size $\sim n$

### Summary

• number of uncovered edges: process mean is $j(j-1)/n$, (rescaled) deviation behaves similar to Brownian bridge
• characterization of distribution of root cluster size, critical region is $n - \Theta(\sqrt{n})$ uncovered vertices
• analysis of size of largest cluster in and around critical region of $n - \Theta(\sqrt{n})$ uncovered vertices
• along the way: several neat combinatorial results (e.g., enumeration of special forests; Abel's binomial theorem, ...)