From charlesreid1

Line 3: Line 3:
==Goodrich Chapter 8 Trees==
==Goodrich Chapter 8 Trees==


===Definitions===
See [[Trees]] for more notes from this chapter.


Trees - most important nonlinear data structures in computing
===Binary Trees===


Relationships in trees are hierarchical
A binary tree is an ordered tree that has the properties:
* Every node has max 2 children
* Each child labeled as left or right
* Left has precedence over right


Each element in a tree has a parent element and zero or more child elements
Other definitions:
* Subtree rooted at left or right child of an internal node is the left subtree/right subtree
* Proper binary trees - trees in which each node has either zero or two children
* Improper binary tree - one or more nodes with a single child


Formally, define tree T as a set of nodes storing elements such that the nodes have a parent-child relationship satisfying the following properties:
===Binary Tree Abstract Data Type===
* If T is non-empty, it has a special node, called the root of T, that has no parent.
* Each node v of T different from the root has a unique parent node w; every node with parent w is a child of w.


===Relationships===
* tree.left(p) - returns the position that is the left child of p
* tree.right(p) - returns the position that is the right child of p
* tree.sibling(p) - return the position that represents the sibling of p


Node relationships:
===Binary Tree Properties===
* Sibling nodes are nodes that share a parent
* External nodes are nodes with no children - total number is n_E
* Internal nodes are nodes with 1 or more children - total number is n_I
* Node u is an ancestor of node v is u = v or if u is an ancestor of the parent of v.
* The stubtree of T that is rooted at node v is the tree consisting of all descendants of v in T - including v itself.


===Edges Paths===
Let T be a non-empty binary tree, and let n, n_E, n_I, and h denote number of nodes, number of external nodes, number of internal nodes, and height of T, respectively.


Edges and paths:
Then T has the following properties:
* An edge of three T is a pair of nodes (u,v) such that u is the parent of v or vice-versa
* A path is a set of consecutive edges - or, a set of consecutive nodes connected by edges.
* A tree is ordered if there is a meaningful linear order among the children of each node.


Example:
<math>
* Documents like books are hierarchically organized as a tree whose internal nodes are parts, chapters, sections, and leaves are paragraphs, tables, figures
h+1 \leq n \leq 2^{h+1} - 1
</math>


===Ordered/Unordered===
<math>
1 \leq n_{E} \leq 2^h
</math>


Ordered vs unordered
<math>
* Ordered tree - the order of the sibling nodes is important
h \leq n_I \leq 2^h - 1
* Unordered tree - there is no precedence with respect to one sibling or the other
</math>
* Examples of ordered trees occur when there are ordered operations (e.g., siblings ordered according to order of birth)
* Examples of unordered trees occur with enumerations or functionality (org charts, divisions in a company)


===Implementation===
<math>
\log{(n+1)} - 1 \leq h \leq n-1
</math>


Tree abstract data types:
===Proper binary tree properties===
* Things get a bit confusing from here. Goodrich separates the concept of a "Position" from the concept of a "Node."
* The Position is a location in the tree, the Node is a four-pointer container. k


Abstract class:
If T is a tree that is proper, it has the following properties:
* Tree is an example of an abstract class implementation
* In Python, each virtual function will raise a NotImplementedError("Must be implemented by subclass")
* In Python, errors in arguments pointing to invalid positions should raise a ValueError
* In Java, we define the functions to be abstract and require that inheriting objects define them. [[Java/Abstract Classes]]


The actual implementation will differ depending on the type of data we are storing and the various requirements we might have for it (e.g., a faster algorithm may be rejected due to high memory usage and memory constraints).
<math>
2h + 1 \leq n \leq 2^{h+1} - 1
</math>


There are some concrete methods that can be defined, if they are independent of implementation. Some examples include:
<math>
* is_root (parent is null)
h + 1 \leq n_{E} \leq 2^h
* is_leaf (number of children is 0)
</math>
* is_empty (T if no nodes in tree)


Computing depth and height:
<math>
* Depth of p is the number of ancestors of p, excluding p itself
h \leq n_I \leq 2^h - 1
* Depth is the distance (in other nodes) to the root node
</math>
* The height of a node is the '''maximum''' height of each of the node's children
* The height of the entire tree is the '''maximum''' depth of its leaf positions.
 
===Algorithmic analysis===
 
Analyze the following two methods to obtain the height:
 
<pre>
def _height(self):
  return max(self.depth(p) for p in self.positions() if self.is_leaf(p))
</pre>
 
Theis will call depth function on each tree, and the worst-case depth is n.
 
That means we're doing


<math>
<math>
\sum_{i=1}^{n} j = \dfrac{n(n+1)}{2} \sim n^2
\log{(n+1)} - 1 \leq h \leq \dfrac{(n-1)}{2}
</math>
</math>
Height runs in O(n^2)
An alternative method utilizes recursion:
<pre>
def _height2(self,p):
  if(self.is_leaf(p)):
    return 0
  else:
    return 1 + max(self._height2(c) for c in self.children(p))
    # for each child, compute height, and return the maximum height + 1
</pre>
===Binary Trees===
See [[Binary Trees]]


==Skiena Chapter 3 Data Structures==
==Skiena Chapter 3 Data Structures==

Revision as of 01:58, 7 June 2017

Notes

Goodrich Chapter 8 Trees

See Trees for more notes from this chapter.

Binary Trees

A binary tree is an ordered tree that has the properties:

  • Every node has max 2 children
  • Each child labeled as left or right
  • Left has precedence over right

Other definitions:

  • Subtree rooted at left or right child of an internal node is the left subtree/right subtree
  • Proper binary trees - trees in which each node has either zero or two children
  • Improper binary tree - one or more nodes with a single child

Binary Tree Abstract Data Type

  • tree.left(p) - returns the position that is the left child of p
  • tree.right(p) - returns the position that is the right child of p
  • tree.sibling(p) - return the position that represents the sibling of p

Binary Tree Properties

Let T be a non-empty binary tree, and let n, n_E, n_I, and h denote number of nodes, number of external nodes, number of internal nodes, and height of T, respectively.

Then T has the following properties:

$ h+1 \leq n \leq 2^{h+1} - 1 $

$ 1 \leq n_{E} \leq 2^h $

$ h \leq n_I \leq 2^h - 1 $

$ \log{(n+1)} - 1 \leq h \leq n-1 $

Proper binary tree properties

If T is a tree that is proper, it has the following properties:

$ 2h + 1 \leq n \leq 2^{h+1} - 1 $

$ h + 1 \leq n_{E} \leq 2^h $

$ h \leq n_I \leq 2^h - 1 $

$ \log{(n+1)} - 1 \leq h \leq \dfrac{(n-1)}{2} $

Skiena Chapter 3 Data Structures

Question 3-5

Question 3-5) Find the overhead fraction (data space/total space) for each binary tree implementation on n nodes given the following conditions:

  • All nodes store data, 2 child pointers, and 1 parent pointer. Data fields are 4 bytes, pointers are 4 bytes.
  • Only leaf nodes store data; internal nodes store 2 child pointers. Data field requires 4 bytes, 2 bytes per pointer.

First case:

  • Binary tree with n nodes -> n-1 edges
  • Child/parent ppointers means 2x edges
  • 2(n-1) edges, 2(n-1) pointers
  • Alternatively, here's the analysis:

n nodes x (4 bytes of data/node) = 4n bytes data

n nodes x (12 bytes of pointers/node) = 12 n bytes

Total space is 16 n bytes, so overhead fraction is 1/4, i.e., the data space to total space ratio is 1/4

Second case:

  • If we have n nodes, we have ~n/2 leaves
  • n nodes total x (1 leaf node / 2 nodes) ~ n/2 lleaf nodes

n/2 empty nodes x (2 pointers/1 empty node) x (2 bytes/pointer) = 2n bytes for empty nodes with pointers

n/2 data nodes x (4 bytes/1 empty node) = 2n bytes data

The overhead fraction is thus 1/2: half the bytes are for data, half the bytes are for pointers.

Question 3-6

Question 3-6) Describe how to modify any balanced tree structure such that search, insert, delete, min, max each take the expected amount of time, O(log n), but sucessor and predecessor methods now take O(1) time each. What modifications are required, and to which methods?

The methods:

  • Predecessor(D,k) - returns the predecessor key (in order) to the given key k
  • Successor(D,k) - returns the successor key to the given key k

Question is, how to we implement neighbor lookup in O(1) time in a balanced tree?

Two options:

  • Option one - add a previous and next node pointers. This increases the complexity of the bookkeeping, and of the add/insert/remove methods, which now have to traverse the tree to fix their links.
  • Option two - make a data tree that is twice as big, and only holds data in the bottom-most leaf nodes. This obviates the need for a next and previous pointer, because the tree is easy to navigate - up and over... at most log(n) operations to traverse tree.

Option one:

We would need to modify the add() and insert() method, the delete method, no need to modify search or min or max methods.

We modify them to keep track of the previous node. Finding the previous node would be somewhat tricky logic. When we add a new node, find its next and previous nodes, and link them correctly.

Option two:

It would be algorithmically easier (but more expensive and more complicated implementation wise) to use a tree structure with only data in the leaf nodes. Finding/keeping track of next or previous node then doesn't require the extra two pointers and the extra logic of traversing a binary tree, it just requires twice the number of nodes (tree of size N has N/2 leaf nodes).

This also requires implementation of the entire data structure again, and it is not so obvious how you keep a tree structure like that balanced and dynamically resized.

Flags