A Survey of Tree Representations
ABSTRACT: In this report, we classify trees, then classify tree representations, then give a survey of tree representations. We also discuss operations on a tree. The purpose of this report is to provide economy of thought, provide consistent terminology, and assist in identifying "convenient" representations of trees to use in applications.
Introduction
The word "tree" is often encountered in computer data structures theory and in graph theory. We use definitions and theorems from graph theory. But our focus is on trees commonly used as computer data structures.
Definition.
 A tree is an undirected graph with the following equivalent characterizations.
 It is connected and acyclic.
 Each two nodes have exactly one path between them.
 If one edge is removed, then it is separated.
 It is acyclic and if one edge is added, it becomes cyclic.
 Recursively, a tree is a single node or two trees connected with an edge.
 A forest is a set of disconnected trees.
 The degree (``valency'') of a node is its number of incident edges.
 The root node is a unique distinguished node of a tree.
 An internal node has degree 2 or more.
An external node has degree 1.
A leaf node is an external node that is not the root.
 For trees with a root node, a node's neigbor is a parent node if it is closer to the root and a child node if it is further from the root.
Theorems.
 A tree has n nodes has n1 edges.
 A rooted tree with n nodes has between 1 and n1 leaves.
 For any three nodes, the three paths between pairs have exactly one node in common.
Classification of Trees
Knuth's Classification
Knuth [K] said that there are "three fundamental classes of trees," free, oriented, and ordered, defined as follows.
Definition.
 Free trees have no designated "root" node and have no order on incident vertices.
 Oriented ("rooted") trees are free trees which have a designated "root" node, but still no ordering on children.
 Ordered ("plane rooted") trees are oriented trees with an ordering on children of each node.
In addition, Knuth says each of these three fundamental classes can be "labelled" or "unlabelled".
Definition.
 A labelled tree has a unique label on each node. Without loss of generality, let those labels be {0,1,2,...,n1} for a n node tree.
 An unlabelled tree has labels ignored.
So we can partition free, oriented, and ordered trees by whether they are labelled or unlabelled. Knuth says that the most important trees in computer data structures are ordered trees.
Theorem.
 The minimum number of bits to encode a structure of size n counted by catalan numbers is 2n bits
Beyond the above classification, many trees in practice are equipped with additional structure, which we identify next.
Trees With Additional Structure
Number of Children
Consider ordered trees. We can restrict the number of children of each node.
Definition.
 kary ("karity") tree  all nodes have at most k children. Alternative def: recursively, each node has up to k children, each a kary tree.
notation: binary tree is a 2ary tree, ternary tree is a 3ary tree, quad tree is a 4ary tree, and oct tree is a 8ary tree.
 strict ("full") kary tree  all nodes have exactly k or 0 children.
 complete kary tree  ordered tree with the maximum number of nodes at every depth, except maybe the bottommost depth where only the rightmost nodes are absent.
 perfect kary tree  complete kary tree with no missing nodes.
 kary tree with omitted children  strict kary ordered tree such that some children can be "omitted" i.e. some indices of children can be ``skipped''.
Example.
binary tree: Strict binary tree: Complete binary tree: Perfect Binary Tree: Binary tree with omitted children:
x x x x x
/ \ / \ / \ / \ / \
/ \ / \ / \ / \ / \
x x x x x x x x x x
/ \  / \ / \ / \ / \ / \ / \ \
x x x x x x x x x x x x x x x x
 / \ / \  / \ / \ / \ / \ / /
x x x x x x x x x x x x x x x x
Theorems. Consider a kary tree on n nodes.
 If it is strict, then #internal nodes=(n1)/k, #leaves=(n(k1)+1)/k, and #edges = n1.
 If it is perfect then it is also complete and strict.
 There is also a bijection between strict kary trees and kary trees with omitted children.
 Unlabelled full kary trees are counted by FussCatalan numbers (For k=2 through 8, see OEIS A000108, A001764, A002293, A002294, A002295, A002296 and A007556).
In particular, unlabelled strict binary trees are counted by Catalan numbers, as illustrated in the figures.
 Labelled versions of strict kary trees are counted by n! times FussCatalan numbers.
Depth
More structure can be given by bounding the length of the path from the root to the leaves.
Definition.
 The depth (``height'', ``level'', ``breadth'') of a node is the number of edges on path to the root.
Example.
Tree with height 3:
x <depth 0
/  \
x x x <depth 1
/ \ 
x x x <depth 2
Theorems.
 A perfect kary tree has (k^(max depth+1)1)/(k1) nodes [pf: truncated geometric series k^0+k^1+...+k^depth].
e.g. perfect binary trees has 2^(max depth+1)1 nodes
 A kary tree has at between 1 and k^(max depth) leaves.
 A strict kary tree with a bounded depth has number of nodes between k*(max depth)+1 and that of the the perfect tree.
A Survey of Tree Representations
Knuth [K] gave examples of three types of tree representations: linked, sequence, and tree. In addition, there may be some other representations.
A classification of types of tree representations:
 linked representation has pointers between nodes
 sequence representation is a serialized representation of tree structure
 tree representation represent tree as other trees, usually with better structure so easier to handle
 other representation does not satisfy the above definitions above, eg adjacency matrix or adjacency list
Below is a table of tree representations, classified by type of tree and by type of representation.
Unlabelled Free Trees
Linked Neighbors
Each node has pointer to each neighbor.
back to table
Unlabelled Oriented Trees
Linked ChildParent
Each node has pointer to each child or its parent.
back to table
Unlabelled Ordered Trees
Balanced Parentheses
Traverse tree depthfirst, preorder. When first visit a subtree, print '(' (or '1') and when leaving that subtree, print ')' (or '0'). Alternatively, label edges with '(' on left and ')' on right, walk around tree
Theorem
 The encoding has length 2n for a n node tree.
 This definition is recursive, eg the tree '(' subtree1 subtree2 ... subtreek ')' has a root with k children
[KM]
[Re] for binary version
back to table
Branching Sequence ("Arity List")
*
/ \
* *
/\ /\
* * * * *
encoding depth first: 2 2 0 0 3 0 0 0
encoding breadth first: 2 2 3 0 0 0 0 0
Label each node with its arity, traverse (depth or breadth first) to accumulate sequence of arities
Remark.
 A binary encoding uses eg 00001 to represent 4 and 0001 to represent 3
[Re] for binary version
back to table
Depth Sequence
*
/\
* * *
/\
* *
depth list: 0 1 2 2 1 1
*

*

*
/\
* *
depth list: 0 1 2 3 4 4
Label each node with its depth, traverse depthfirst preorder to accumulate sequence of depths.
Remark.
 A binary encoding uses eg 00001 to represent 4 and 0001 to represent 3
[RR] binary version
back to table
Navigation Sequence
The navigation sequence ("generalized index") of a tree node is a list indices of children when traversing from the root to that node.
Notation: For tree T, let T[i1,i2,...,in] denote the node at navigation sequence i1,i2,...,in, are the indices of the child at each depth up to depth n.
Storing every node's navigation sequence is a tree representation, they can be ordered in any way.
back to table
Leftchild rightsibling
rotated:
Draw edges between siblings. Delete all edges to children except the leftmost child.
Theorem.
 This is a bijection between forests and binary trees with omitted children.
[K]
back to table
Additional Structure
binary tree with omitted children
Label nodes '1' and extend all possible leaves and label them '0', traverse depthfirst preorder accumulating string, drop last 0.
Theorem.
 Encoding is length 2n binary word where n is number of internal nodes (ie 2n+1 total nodes)
Remark.
 In the image, x is the indices of 0's and y is the indices of 1's.
ref: [Za], [KM]
back to table
11
/ \
01 01
\ \
00 10
/
11
/ \
00 00
Encoding: 11 01 00 01 10 11 00 00
label nodes of binary tree: 00=leaf, 01=only right child, 10=only left child, 11=two children, traverse in preorder
Theorem
 length 2n for n node tree
[KM]
back to table
Label all nodes 1's and extend all external nodes with children labelled 0. Traverse breadthfirst accumulating encoding.
Theorem Consider a kary tree. Let the nodes be ordered in breadthfirst lefttoright order, and consider the "levelorder sequence" as the binary encoding above.
 The first child of the ith node corresponds to the levelorder sequence index k*ik+2. The remaining children correspond to the following levelorder sequence indices. E.g. for binary trees, the ith node has children in the levelorder sequence at indices 2i and 2i+1.
 Consider index p in the levelorder sequence. If it is the 1st child, then its parent is node (p+k2)/k. If it is the 2nd child, then its parent is node (p+k3)/k. And so on until the last child with parent at node p/k. E.g. for binary trees, the levelorder index p has parent node p/2 or (p1)/2, depending on if it is the first or second child.
 The levelorder encoding has length is kn+1 bits.
Remark.
 This encoding also works well if there are no missing children. It works best for perfect kary trees, where the levelorder sequence is the same as the node sequence.
 So can jump parent to children by 2 times #preceding 1's and from child to parant by index/2.
 When implementing, the levelorder sequence can be stored separately for each level.
[De]
back to table
binary trees
0
/ \
1 0
/\
1 1
encoding: 01011
For a strict binary tree, label internal nodes with 0's and external nodes with 1's. Then traverse depth first preorder accumulating encoding.
Theorem.
 The length of the encoding string equals the number of nodes.
[Ko]
back to table
Labelled Free Trees
Prufer Code
Remove the leaf with the smallest label, and append the label of the node it was removed from. Continue until 2 vertices are left.
The Prufer code encodes labelled free trees as n2 digit sequence from {1,2,...,n}.
back to table
Adjacency matrix (for free and oriented trees)
Adjacency matrix is a nbyn binary matrix representing either labelled free tree or labelled oriented trees. For labelled oriented trees, there are n1 1's each corresponding to an edge from parent to child. For labelled free trees there are 2*(n1) 1's each corresponding to an edge from one vertex to another and another edge back.
Example:
1
/\
2 3
/\
4 5
For free tree:
1 2 3 4 5
1 0 1 1 0 0
2 1 0 0 1 1
3 1 0 0 0 0
4 0 1 0 0 0
5 0 1 0 0 0
For orieted trees with root labelled '1':
1 2 3 4 5
1 0 1 1 0 0
2 0 0 0 1 1
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
The adjacency matrix can be compressed with a runlength encoding or quadtree encoding
One wonders about whether the adjacency matrix can be reduced to some (block or tri)diagonal form. The following refutes such a possibility.
Theorem: There exists a tree whose adjacency matrix is irreducable to some diagonal form.
Because adjacency matrices are sparse, they can be compressed with a runlength coding. For example the top matrix above, read rowbyrow, is: 1:0, 2:1, 2:0, 1:1, 2:0, 3:1, 5:0, 1:1, 5:0, 1:1, 3:0.
back to table
Adjacency List
Adjacency Lists can represent both labelled free tree and labelled oriented trees. This is a list of the n1 edges in an n node tree. Alternatively, this can be represented.
Example: consider the tree
1
/ \
2 5
/\ /\
3 4 6 7 8
Adjacency list version 1:
(1,2),(2,3),(2,4),(1,5),(5,6),(5,7),(5,8)
Adjacency list version 2:
1: 2,5
2: 1,3,4
3: 2
4: 2
5: 1,6,7,8
6: 3
7: 3
8: 3
For a oriented tree, say with root '1', list only inderior nodes and children:
1: 2,5
2: 3,4
5: 6,7,8
back to table
Ordered Labelled
parenthesesed strings in pre or postfix ntn
prefix: A(BC(K)))D((E(H))F(J)G)
postfix: (B(K)C)A((H)E(J)FG)D
[K]
(knuth pg336)
back to table
Mitchell's cononical recursive representation
List of indices to parent they were hung from when added, label i corresponds to ith node added
Remark.
 great for adding leaf nodes
 adjacency list in disguise
[Mi] has some lineartime operations on this
[K] mentions this pg 390
back to table
Classification of Operations on Trees
Base and Derived Operations
Definitions.
 A tree operation takes a tree and outputs a part of the tree or a new tree.
 A complete set of tree operations means they can be combined to derive any operation on a tree.
 A minimal set of tree operations means no operations can be derived from others.
 Base tree operations is a complete and minimal set of operations. Note: need not be unique.
 A derived tree operation, with respect to a base set, is a tree operation derived from the base operations.
We fix a base set of operations as follows.
Conjecture.
 Define our base operations as access next child, access parent, add node, and remove node. We conjecture that these base operations is a minimal complete set of tree operations.
Common Operations
A classification of tree operations follow.
Efficienct Implementation on a Computer
Store a given tree representation in computer memory.
 contiguous or not; possibly contiguous in chunks
 serialized with respect to some tree traversal
 tree structure can be stored as:
 Pernode structure
 number of children
 depth
 navigation sequence
 links to children
 link to parent
 link(s) to other part(s) of tree
 info about neighborhood (eg #nodes in breadth first until next sibling)
 Not pernode structure
 landmarks  eg some sample of navigation sequences to assist in search
 balancedparentheses
 levelorder structure
 Other
 the order in which the above is stored, eg depth list stored in depthfirst preorder is sufficient to represent the tree
 can store pernode tree structure with node data, or separately
 array of structures or structure of arrays
 tree structure can be stored or recomputed when needed
 extra structure can be stored, e.g. a depth list stored in depthfirst preorder is sufficient to represent a tree, but might want to store additional structural data to make some operation easier
Choosing a tree representation based on desired constraints and operations.
 declare tree at compiletime or runtime
 constraints on tree structure, used when deciding on a representation
 arity  a set of arities, bounded above or below, as a function of depth
 depth  bounded, refinable
 complete or perfect  for bounded depth and strict arity
 once structure once built, whether it is fixed or can con updated
 storage footprint
 desired operations
 If one operation is important, should represent tree around that operation.
 Fast traversal is solved by serializing tree nodes in the order of the traversal.
 Fast searching is is partially solved by keeping the tree balanced.
 If multiple operations are important, we can can keep extra structure data to help specific operations.
 operations
 It is expensive to change tree structure if changes ripple through rest of representation, e.g. insertion of subtree may require modifying the rest of the tree's layout in memory.
 May be wise to break tree representation into chunks and maybe mix encodings, ie building blocks
Miscellaneous thoughts
 allow a usersupplied function to be called at certain events
 allocation strategy
 vectorization, parallelization, and later, supercomputing
 omitted child: can add_child() skip indices?
 Optionally we can give a more efficient implementation of important derived operations.
 Search
 for a normal search, a usersupplied comparison operation decides which child to access
 for searching for the index of navigation sequences, can use interpolation methods to choose the next index to check
References
[KM] Katajainen, Makinen. Tree Compression and Optimizatin With Applications, 1990.
[Za] S. Zaks, "Lexicographic generation of ordered trees", Theoret. Comput. Sci. 10 (1980) 6382.
[K] Knuth, The Art of Computer Programming, Volume 1.
[De] Erik Demaine, MIT 6.897: Advanced Data Structures, Lecture 12, Spring 2003
[Mi] Mitchell, Linear algorithms on recursive representations of trees
[Re] Read, Ronald C. The coding of various kinds of unlabeled trees. Graph theory and computing, pp. 153–182. Academic Press, New York, 1972.
[Ko] Thomas Koshy, Catalan Numbers with Applications
[RR] Rajeev Raman and S. Srinivasa Rao, Succinct Representations of Ordinal Trees
[CG] Conway and Guy, The book of numbers