# A Survey of Tree Representations

ABSTRACT: In this report, we classify trees, then classify tree representations, then give a survey of tree representations. We also discuss operations on a tree. The purpose of this report is to provide economy of thought, provide consistent terminology, and assist in identifying "convenient" representations of trees to use in applications.

## Introduction

The word "tree" is often encountered in computer data structures theory and in graph theory. We use definitions and theorems from graph theory. But our focus is on trees commonly used as computer data structures.

Definition.
• A tree is an undirected graph with the following equivalent characterizations.
• It is connected and acyclic.
• Each two nodes have exactly one path between them.
• If one edge is removed, then it is separated.
• It is acyclic and if one edge is added, it becomes cyclic.
• Recursively, a tree is a single node or two trees connected with an edge.
• A forest is a set of disconnected trees.
• The degree (``valency'') of a node is its number of incident edges.
• The root node is a unique distinguished node of a tree.
• An internal node has degree 2 or more.
An external node has degree 1.
A leaf node is an external node that is not the root.
• For trees with a root node, a node's neigbor is a parent node if it is closer to the root and a child node if it is further from the root.

Theorems.
• A tree has n nodes has n-1 edges.
• A rooted tree with n nodes has between 1 and n-1 leaves.
• For any three nodes, the three paths between pairs have exactly one node in common.

## Classification of Trees

### Knuth's Classification

Knuth [K] said that there are "three fundamental classes of trees," free, oriented, and ordered, defined as follows.

Definition.
• Free trees have no designated "root" node and have no order on incident vertices.
• Oriented ("rooted") trees are free trees which have a designated "root" node, but still no ordering on children.
• Ordered ("plane rooted") trees are oriented trees with an ordering on children of each node.

In addition, Knuth says each of these three fundamental classes can be "labelled" or "unlabelled".

Definition.
• A labelled tree has a unique label on each node. Without loss of generality, let those labels be {0,1,2,...,n-1} for a n node tree.
• An unlabelled tree has labels ignored.

So we can partition free, oriented, and ordered trees by whether they are labelled or unlabelled. Knuth says that the most important trees in computer data structures are ordered trees.

 Free Oriented Ordered Unlabelled counted by OEIS A000055 counted by OEIS A000081 counted by OEIS A000108 Counted by Catalan numbers C(n)=(2n)!/((n+1)!n!) Labelled Counted by OEIS A000272 Counted by n^(n-2) Counted by OEIS A000169 Counted by n^(n-1) Counted by OEIS A001813 Counted by quadrupleFactorialNumbers(n-1) = (2(n-1))!/((n-1)!)

Theorem.

• The minimum number of bits to encode a structure of size n counted by catalan numbers is 2n bits

Beyond the above classification, many trees in practice are equipped with additional structure, which we identify next.

#### Number of Children

Consider ordered trees. We can restrict the number of children of each node.

Definition.
• k-ary ("k-arity") tree - all nodes have at most k children. Alternative def: recursively, each node has up to k children, each a k-ary tree.
notation: binary tree is a 2-ary tree, ternary tree is a 3-ary tree, quad tree is a 4-ary tree, and oct tree is a 8-ary tree.
• strict ("full") k-ary tree - all nodes have exactly k or 0 children.
• complete k-ary tree - ordered tree with the maximum number of nodes at every depth, except maybe the bottommost depth where only the rightmost nodes are absent.
• perfect k-ary tree - complete k-ary tree with no missing nodes.
• k-ary tree with omitted children - strict k-ary ordered tree such that some children can be "omitted" i.e. some indices of children can be ``skipped''.

Example.
```binary tree:           Strict binary tree:         Complete binary tree:      Perfect Binary Tree:          Binary tree with omitted children:
x                        x                           x                         x                          x
/   \                     /  \                        / \                       /  \                       / \
/     \                   /    \                      /   \                     /    \                     /   \
x       x                 x      x                    x     x                   x       x                  x     x
/ \      |                / \                         / \   / \                 / \     / \                / \     \
x   x     x               x   x                       x   x  x  x               x   x   x   x              x   x     x
|                        / \                     / \  |                    / \ / \ / \ / \                /     /
x                       x   x                   x   x x                    x x x x x x x x               x     x

```
Theorems. Consider a k-ary tree on n nodes.
• If it is strict, then #internal nodes=(n-1)/k, #leaves=(n(k-1)+1)/k, and #edges = n-1.
• If it is perfect then it is also complete and strict.
• There is also a bijection between strict k-ary trees and k-ary trees with omitted children.
• Unlabelled full k-ary trees are counted by Fuss-Catalan numbers (For k=2 through 8, see OEIS A000108, A001764, A002293, A002294, A002295, A002296 and A007556).
In particular, unlabelled strict binary trees are counted by Catalan numbers, as illustrated in the figures.  Strict binary trees Binary trees with omitted children, missing children drawn as empty squares.  • Labelled versions of strict k-ary trees are counted by n! times Fuss-Catalan numbers.

### Depth

More structure can be given by bounding the length of the path from the root to the leaves.

Definition.
• The depth (``height'', ``level'', ``breadth'') of a node is the number of edges on path to the root.

Example.
```Tree with height 3:

x     <---depth 0
/ | \
x  x  x  <---depth 1
/ \    |
x   x   x  <---depth 2

```
Theorems.
• A perfect k-ary tree has (k^(max depth+1)-1)/(k-1) nodes [pf: truncated geometric series k^0+k^1+...+k^depth].
e.g. perfect binary trees has 2^(max depth+1)-1 nodes
• A k-ary tree has at between 1 and k^(max depth) leaves.
• A strict k-ary tree with a bounded depth has number of nodes between k*(max depth)+1 and that of the the perfect tree.

## A Survey of Tree Representations

Knuth [K] gave examples of three types of tree representations: linked, sequence, and tree. In addition, there may be some other representations. A classification of types of tree representations:
• linked representation has pointers between nodes
• sequence representation is a serialized representation of tree structure
• tree representation represent tree as other trees, usually with better structure so easier to handle
• other representation does not satisfy the above definitions above, eg adjacency matrix or adjacency list

Below is a table of tree representations, classified by type of tree and by type of representation.

 Free Oriented Ordered Extra structure Unlabelled linked linked Sequence Tree binary trees with omitted children Binary tree Labelled Sequence Tree Quad tree representation of adjacency matrix Other any ordered labelled tree with ordering of children ignored any free tree with a designated root other Sequence Many unlabelled encodings above can be extended to labelled

### Unlabelled Free Trees Each node has pointer to each neighbor.

### Unlabelled Oriented Trees Each node has pointer to each child or its parent.

### Unlabelled Ordered Trees

#### Balanced Parentheses Traverse tree depth-first, pre-order. When first visit a subtree, print '(' (or '1') and when leaving that subtree, print ')' (or '0'). Alternatively, label edges with '(' on left and ')' on right, walk around tree

Theorem

• The encoding has length 2n for a n node tree.
• This definition is recursive, eg the tree '(' subtree1 subtree2 ... subtreek ')' has a root with k children

[KM]
[Re] for binary version

#### Branching Sequence ("Arity List")

```    *
/ \
*   *
/\  /|\
* * * * *
encoding depth first: 2 2 0 0 3 0 0 0
encoding breadth first: 2 2 3 0 0 0 0 0
```

Label each node with its arity, traverse (depth or breadth first) to accumulate sequence of arities

Remark.

• A binary encoding uses eg 00001 to represent 4 and 0001 to represent 3

[Re] for binary version

#### Depth Sequence

```    *
/|\
* * *
/\
*  *
depth list: 0 1 2 2 1 1
```
```   *
|
*
|
*
/\
*  *
depth list: 0 1 2 3 4 4
```

Label each node with its depth, traverse depth-first pre-order to accumulate sequence of depths.

Remark.

• A binary encoding uses eg 00001 to represent 4 and 0001 to represent 3

[RR] binary version

The navigation sequence ("generalized index") of a tree node is a list indices of children when traversing from the root to that node.

Notation: For tree T, let T[i1,i2,...,in] denote the node at navigation sequence i1,i2,...,in, are the indices of the child at each depth up to depth n.

Storing every node's navigation sequence is a tree representation, they can be ordered in any way.

#### Left-child right-sibling  rotated: Draw edges between siblings. Delete all edges to children except the leftmost child.

Theorem.

• This is a bijection between forests and binary trees with omitted children.

[K]

#### Leaves of binary tree  ref: [CG], [OEIS A000081]

back to table

#### binary tree with omitted children

##### Zak's sequence Label nodes '1' and extend all possible leaves and label them '0', traverse depth-first preorder accumulating string, drop last 0.

Theorem.

• Encoding is length 2n binary word where n is number of internal nodes (ie 2n+1 total nodes)

Remark.

• In the image, x is the indices of 0's and y is the indices of 1's.

ref: [Za], [KM]

##### children pattern sequence
```     11
/  \
01  01
\    \
00   10
/
11
/  \
00   00

Encoding: 11 01 00 01 10 11 00 00
```
label nodes of binary tree: 00=leaf, 01=only right child, 10=only left child, 11=two children, traverse in preorder

Theorem

• length 2n for n node tree

[KM]

##### Level-order representation Label all nodes 1's and extend all external nodes with children labelled 0. Traverse breadth-first accumulating encoding.

Theorem Consider a k-ary tree. Let the nodes be ordered in breadth-first left-to-right order, and consider the "level-order sequence" as the binary encoding above.

• The first child of the ith node corresponds to the level-order sequence index k*i-k+2. The remaining children correspond to the following level-order sequence indices. E.g. for binary trees, the ith node has children in the level-order sequence at indices 2i and 2i+1.
• Consider index p in the level-order sequence. If it is the 1st child, then its parent is node (p+k-2)/k. If it is the 2nd child, then its parent is node (p+k-3)/k. And so on until the last child with parent at node p/k. E.g. for binary trees, the level-order index p has parent node p/2 or (p-1)/2, depending on if it is the first or second child.
• The level-order encoding has length is kn+1 bits.

Remark.

• This encoding also works well if there are no missing children. It works best for perfect k-ary trees, where the level-order sequence is the same as the node sequence.
• So can jump parent to children by 2 times #preceding 1's and from child to parant by index/2.
• When implementing, the level-order sequence can be stored separately for each level.

[De]

##### Rotation based [KM]

#### binary trees

##### Strict binary tree as compressed branching sequence (and compressed balanced parentheses)
```    0
/ \
1   0
/\
1  1

encoding: 01011
```

For a strict binary tree, label internal nodes with 0's and external nodes with 1's. Then traverse depth first pre-order accumulating encoding.

Theorem.

• The length of the encoding string equals the number of nodes.

[Ko]

##### Binary pixel  Store array of pixel values.

### Labelled Free Trees

#### Prufer Code Remove the leaf with the smallest label, and append the label of the node it was removed from. Continue until 2 vertices are left.

The Prufer code encodes labelled free trees as n-2 digit sequence from {1,2,...,n}.

Adjacency matrix is a n-by-n binary matrix representing either labelled free tree or labelled oriented trees. For labelled oriented trees, there are n-1 1's each corresponding to an edge from parent to child. For labelled free trees there are 2*(n-1) 1's each corresponding to an edge from one vertex to another and another edge back. Example:
```    1
/\
2  3
/\
4  5
```
```  1 2 3 4 5
1 0 1 1 0 0
2 1 0 0 1 1
3 1 0 0 0 0
4 0 1 0 0 0
5 0 1 0 0 0
```
For orieted trees with root labelled '1':
```  1 2 3 4 5
1 0 1 1 0 0
2 0 0 0 1 1
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
```

The adjacency matrix can be compressed with a runlength encoding or quad-tree encoding

One wonders about whether the adjacency matrix can be reduced to some (block- or tri-)diagonal form. The following refutes such a possibility.
Theorem: There exists a tree whose adjacency matrix is irreducable to some diagonal form.

Because adjacency matrices are sparse, they can be compressed with a runlength coding. For example the top matrix above, read row-by-row, is: 1:0, 2:1, 2:0, 1:1, 2:0, 3:1, 5:0, 1:1, 5:0, 1:1, 3:0.

Adjacency Lists can represent both labelled free tree and labelled oriented trees. This is a list of the n-1 edges in an n node tree. Alternatively, this can be represented. Example: consider the tree
```     1
/  \
2    5
/\   /|\
3  4 6 7 8
```
```(1,2),(2,3),(2,4),(1,5),(5,6),(5,7),(5,8)
```
```1: 2,5
2: 1,3,4
3: 2
4: 2
5: 1,6,7,8
6: 3
7: 3
8: 3
```
For a oriented tree, say with root '1', list only inderior nodes and children:
```1: 2,5
2: 3,4
5: 6,7,8
```
back to table

### Ordered Labelled

#### parenthesesed strings in pre or postfix ntn ```prefix: A(BC(K)))D((E(H))F(J)G)
postfix: (B(K)C)A((H)E(J)FG)D
```

[K] (knuth pg336)

#### Mitchell's cononical recursive representation List of indices to parent they were hung from when added, label i corresponds to ith node added

Remark.

• great for adding leaf nodes

[Mi] has some linear-time operations on this
[K] mentions this pg 390

## Classification of Operations on Trees

### Base and Derived Operations

Definitions.

• A tree operation takes a tree and outputs a part of the tree or a new tree.
• A complete set of tree operations means they can be combined to derive any operation on a tree.
• A minimal set of tree operations means no operations can be derived from others.
• Base tree operations is a complete and minimal set of operations. Note: need not be unique.
• A derived tree operation, with respect to a base set, is a tree operation derived from the base operations.

We fix a base set of operations as follows.

Conjecture.

• Define our base operations as access next child, access parent, add node, and remove node. We conjecture that these base operations is a minimal complete set of tree operations.

### Common Operations

A classification of tree operations follow.
 Access Modify Local access parent access next child access m'th child access neighbor ... insert and graft delete and prune refine ... Global traverse breadth first - all nodes or leaves only depth first - all nodes (pre-, in-, post-order) or leaves only search access node at a navigation sequence ... balance merge trees in some way Convert from one representation to another ...

## Efficienct Implementation on a Computer

### Store a given tree representation in computer memory.

• contiguous or not; possibly contiguous in chunks
• serialized with respect to some tree traversal
• tree structure can be stored as:
• Per-node structure
• number of children
• depth
• link(s) to other part(s) of tree
• info about neighborhood (eg #nodes in breadth first until next sibling)
• Not per-node structure
• landmarks - eg some sample of navigation sequences to assist in search
• balanced-parentheses
• level-order structure
• Other
• the order in which the above is stored, eg depth list stored in depth-first pre-order is sufficient to represent the tree
• can store per-node tree structure with node data, or separately
• array of structures or structure of arrays
• tree structure can be stored or recomputed when needed
• extra structure can be stored, e.g. a depth list stored in depth-first pre-order is sufficient to represent a tree, but might want to store additional structural data to make some operation easier

### Choosing a tree representation based on desired constraints and operations.

• declare tree at compile-time or run-time
• constraints on tree structure, used when deciding on a representation
• arity - a set of arities, bounded above or below, as a function of depth
• depth - bounded, refinable
• complete or perfect - for bounded depth and strict arity
• once structure once built, whether it is fixed or can con updated
• storage footprint
• desired operations
• If one operation is important, should represent tree around that operation.
• Fast traversal is solved by serializing tree nodes in the order of the traversal.
• Fast searching is is partially solved by keeping the tree balanced.
• If multiple operations are important, we can can keep extra structure data to help specific operations.
• operations
• It is expensive to change tree structure if changes ripple through rest of representation, e.g. insertion of subtree may require modifying the rest of the tree's layout in memory.
• May be wise to break tree representation into chunks and maybe mix encodings, ie building blocks ### Miscellaneous thoughts

• allow a user-supplied function to be called at certain events
• allocation strategy
• vectorization, parallelization, and later, supercomputing
• omitted child: can add_child() skip indices?
• Optionally we can give a more efficient implementation of important derived operations.
• Search
• for a normal search, a user-supplied comparison operation decides which child to access
• for searching for the index of navigation sequences, can use interpolation methods to choose the next index to check

# References

[KM] Katajainen, Makinen. Tree Compression and Optimizatin With Applications, 1990.
[Za] S. Zaks, "Lexicographic generation of ordered trees", Theoret. Comput. Sci. 10 (1980) 63-82.
[K] Knuth, The Art of Computer Programming, Volume 1.
[De] Erik Demaine, MIT 6.897: Advanced Data Structures, Lecture 12, Spring 2003
[Mi] Mitchell, Linear algorithms on recursive representations of trees
[Re] Read, Ronald C. The coding of various kinds of unlabeled trees. Graph theory and computing, pp. 153–182. Academic Press, New York, 1972.
[Ko] Thomas Koshy, Catalan Numbers with Applications
[RR] Rajeev Raman and S. Srinivasa Rao, Succinct Representations of Ordinal Trees
[CG] Conway and Guy, The book of numbers