Materialized Path trees

This is an efficient implementation of Materialized Path trees for Django 1.0+, as described by Vadim Tropashko in SQL Design Patterns. Materialized Path is probably the fastest way of working with trees in SQL without the need of extra work in the database, like Oracle’s CONNECT BY or sprocs and triggers for nested intervals.

In a materialized path approach, every node in the tree will have a path attribute, where the full path from the root to the node will be stored. This has the advantage of needing very simple and fast queries, at the risk of inconsistency because of the denormalization of parent/child foreign keys. This can be prevented with transactions.

django-treebeard uses a particular approach: every step in the path has a fixed width and has no separators. This makes queries predictable and faster at the cost of using more characters to store a step. To address this problem, every step number is encoded.

Also, two extra fields are stored in every node: depth and numchild. This makes the read operations faster, at the cost of a little more maintenance on tree updates/inserts/deletes. Don’t worry, even with these extra steps, materialized path is more efficient than other approaches.

Note

The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

Inheritance diagram of MP_Node

class treebeard.mp_tree.MP_Node(*args, **kwargs)

Bases: treebeard.models.Node

Abstract model to create your own Materialized Path Trees.

Warning

Do not change the values of path, depth or numchild directly: use one of the included methods instead. Consider these values read-only.

Warning

Do not change the values of the steplen, alphabet or node_order_by after saving your first model. Doing so will corrupt the tree. If you must do it:

  1. Backup the tree with dump_bulk()
  2. Empty your model’s table
  3. Change depth, alphabet and/or node_order_by in your model
  4. Restore your backup using load_bulk() with keep_ids=True to keep the same primary keys you had.

Example:

class SortedNode(MP_Node):
   node_order_by = ['numval', 'strval']

   numval = models.IntegerField()
   strval = models.CharField(max_length=255)

Read the API reference of treebeard.Node for info on methods available in this class, or read the following section for methods with particular arguments or exceptions.

steplen
Attribute that defines the length of each step in the path of a node. The default value of 4 allows a maximum of 1679615 children per node. Increase this value if you plan to store large trees (a steplen of 5 allows more than 60M children per node). Note that increasing this value, while increasing the number of children per node, will decrease the max depth of the tree (by default: 63). To increase the max depth, increase the max_length attribute of the path field in your model.
alphabet

Attribute: the alphabet that will be used in base conversions when encoding the path steps into strings. The default value, 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ is the most optimal possible value that is portable between the supported databases (which means: their default collation will order the path field correctly).

Note

In case you know what you are doing, there is a test that is disabled by default that can tell you the optimal default alphabet in your enviroment. To run the test you must enable the TREEBEARD_TEST_ALPHABET enviroment variable:

$ TREEBEARD_TEST_ALPHABET=1 python manage.py test treebeard.TestTreeAlphabet

On my Ubuntu 8.04.1 system, these are the optimal values for the three supported databases in their default configuration:

Database Optimal Alphabet
MySQL 5.0.51 0-9A-Z
PostgreSQL 8.2.7 0-9A-Z
Sqlite3 0-9A-Za-z
node_order_by

Attribute: a list of model fields that will be used for node ordering. When enabled, all tree operations will assume this ordering.

Example:

node_order_by = ['field1', 'field2', 'field3']
path

CharField, stores the full materialized path for each node. The default value of it’s max_length, 255, is the max efficient and portable value for a varchar. Increase it to allow deeper trees (max depth by default: 63)

Note

django-treebeard uses Django’s abstract model inheritance, so:

  1. To change the max_length value of the path in your model, you can’t just define it since you’d get a django exception, you have to modify the already defined attribute:

    class MyNodeModel(MP_Node):
        pass
    
    MyNodeModel._meta.get_field('path').max_length = 1024
    
  2. You can’t rely on Django’s auto_now properties in date fields for sorting, you’ll have to manually set the value before creating a node:

    class TestNodeSortedAutoNow(MP_Node):
        desc = models.CharField(max_length=255)
        created = models.DateTimeField(auto_now_add=True)
        node_order_by = ['created']
    
    TestNodeSortedAutoNow.add_root(desc='foo',
                                   created=datetime.datetime.now())
    

Note

For performance, and if your database allows it, you can safely define the path column as ASCII (not utf-8/unicode/iso8859-1/etc) to keep the index smaller (and faster). Also note that some databases (mysql) have a small index size limit. InnoDB for instance has a limit of 765 bytes per index, so that would be the limit if your path is ASCII encoded. If your path column in InnoDB is using unicode, the index limit will be 255 characters since in MySQL’s indexes, unicode means 3 bytes.

Note

treebeard uses numconv for path encoding: http://code.tabo.pe/numconv/

depth
PositiveIntegerField, depth of a node in the tree. A root node has a depth of 1.
numchild
PositiveIntegerField, the number of children of the node.
classmethod add_root(**kwargs)

Adds a root node to the tree.

Raises PathOverflow:
 when no more root objects can be added

See: treebeard.Node.add_root()

add_child(**kwargs)

Adds a child to the node.

Raises PathOverflow:
 when no more child nodes can be added

See: treebeard.Node.add_child()

add_sibling(pos=None, **kwargs)

Adds a new node as a sibling to the current node object.

Raises PathOverflow:
 when the library can’t make room for the node’s new position

See: treebeard.Node.add_sibling()

move(target, pos=None)

Moves the current node and all it’s descendants to a new position relative to another node.

Raises PathOverflow:
 when the library can’t make room for the node’s new position

See: treebeard.Node.move()

classmethod get_tree(parent=None)
Returns:A queryset of nodes ordered as DFS, including the parent. If no parent is given, the entire tree is returned.

See: treebeard.Node.get_tree()

Note

This metod returns a queryset.

classmethod find_problems()

Checks for problems in the tree structure, problems can occur when:

  1. your code breaks and you get incomplete transactions (always use transactions!)
  2. changing the steplen value in a model (you must dump_bulk() first, change steplen and then load_bulk()
Returns:

A tuple of five lists:

  1. a list of ids of nodes with characters not found in the alphabet
  2. a list of ids of nodes when a wrong path length according to steplen
  3. a list of ids of orphaned nodes
  4. a list of ids of nodes with the wrong depth value for their path
  5. a list of ids nodes that report a wrong number of children

Note

A node won’t appear in more than one list, even when it exhibits more than one problem. This method stops checking a node when it finds a problem and continues to the next node.

Note

Problems 1, 2 and 3 can’t be solved automatically.

Example:

MyNodeModel.find_problems()
classmethod fix_tree(destructive=False)

Solves some problems that can appear when transactions are not used and a piece of code breaks, leaving the tree in an inconsistent state.

The problems this method solves are:

  1. Nodes with an incorrect depth or numchild values due to incorrect code and lack of database transactions.
  2. “Holes” in the tree. This is normal if you move/delete nodes a lot. Holes in a tree don’t affect performance,
  3. Incorrect ordering of nodes when node_order_by is enabled. Ordering is enforced on node insertion, so if an attribute in node_order_by is modified after the node is inserted, the tree ordering will be inconsistent.
Parameter:destructive

A boolean value. If True, a more agressive fix_tree method will be attemped. If False (the default), it will use a safe (and fast!) fix approach, but it will only solve the depth and numchild nodes, it won’t fix the tree holes or broken path ordering.

Warning

Currently what the destructive method does is:

  1. Backup the tree with dump_data()
  2. Remove all nodes in the tree.
  3. Restore the tree with load_data()

So, even when the primary keys of your nodes will be preserved, this method isn’t foreign-key friendly. That needs complex in-place tree reordering, not available at the moment (hint: patches are welcome).

Example:

MyNodeModel.fix_tree()

Previous topic

API

Next topic

Nested Sets trees

This Page