This is an efficient implementation of Materialized Path trees for Django 1.0+, as described by Vadim Tropashko in SQL Design Patterns. Materialized Path is probably the fastest way of working with trees in SQL without the need of extra work in the database, like Oracle’s CONNECT BY or sprocs and triggers for nested intervals.
In a materialized path approach, every node in the tree will have a path attribute, where the full path from the root to the node will be stored. This has the advantage of needing very simple and fast queries, at the risk of inconsistency because of the denormalization of parent/child foreign keys. This can be prevented with transactions.
django-treebeard uses a particular approach: every step in the path has a fixed width and has no separators. This makes queries predictable and faster at the cost of using more characters to store a step. To address this problem, every step number is encoded.
Also, two extra fields are stored in every node: depth and numchild. This makes the read operations faster, at the cost of a little more maintenance on tree updates/inserts/deletes. Don’t worry, even with these extra steps, materialized path is more efficient than other approaches.
The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.
Abstract model to create your own Materialized Path Trees.
class SortedNode(MP_Node): node_order_by = ['numval', 'strval'] numval = models.IntegerField() strval = models.CharField(max_length=255)
Read the API reference of treebeard.Node for info on methods available in this class, or read the following section for methods with particular arguments or exceptions.
Attribute: the alphabet that will be used in base conversions when encoding the path steps into strings. The default value, 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ is the most optimal possible value that is portable between the supported databases (which means: their default collation will order the path field correctly).
In case you know what you are doing, there is a test that is disabled by default that can tell you the optimal default alphabet in your enviroment. To run the test you must enable the TREEBEARD_TEST_ALPHABET enviroment variable:
$ TREEBEARD_TEST_ALPHABET=1 python manage.py test treebeard.TestTreeAlphabet
On my Ubuntu 8.04.1 system, these are the optimal values for the three supported databases in their default configuration:
Database Optimal Alphabet MySQL 5.0.51 0-9A-Z PostgreSQL 8.2.7 0-9A-Z Sqlite3 0-9A-Za-z
Attribute: a list of model fields that will be used for node ordering. When enabled, all tree operations will assume this ordering.
node_order_by = ['field1', 'field2', 'field3']
CharField, stores the full materialized path for each node. The default value of it’s max_length, 255, is the max efficient and portable value for a varchar. Increase it to allow deeper trees (max depth by default: 63)
django-treebeard uses Django’s abstract model inheritance, so:
To change the max_length value of the path in your model, you can’t just define it since you’d get a django exception, you have to modify the already defined attribute:
class MyNodeModel(MP_Node): pass MyNodeModel._meta.get_field('path').max_length = 1024
You can’t rely on Django’s auto_now properties in date fields for sorting, you’ll have to manually set the value before creating a node:
class TestNodeSortedAutoNow(MP_Node): desc = models.CharField(max_length=255) created = models.DateTimeField(auto_now_add=True) node_order_by = ['created'] TestNodeSortedAutoNow.add_root(desc='foo', created=datetime.datetime.now())
For performance, and if your database allows it, you can safely define the path column as ASCII (not utf-8/unicode/iso8859-1/etc) to keep the index smaller (and faster). Also note that some databases (mysql) have a small index size limit. InnoDB for instance has a limit of 765 bytes per index, so that would be the limit if your path is ASCII encoded. If your path column in InnoDB is using unicode, the index limit will be 255 characters since in MySQL’s indexes, unicode means 3 bytes.
treebeard uses numconv for path encoding: http://code.tabo.pe/numconv/
Adds a root node to the tree.
|when no more root objects can be added|
Adds a child to the node.
|when no more child nodes can be added|
Adds a new node as a sibling to the current node object.
|when the library can’t make room for the node’s new position|
Moves the current node and all it’s descendants to a new position relative to another node.
|when the library can’t make room for the node’s new position|
|Returns:||A queryset of nodes ordered as DFS, including the parent. If no parent is given, the entire tree is returned.|
This metod returns a queryset.
Checks for problems in the tree structure, problems can occur when:
- your code breaks and you get incomplete transactions (always use transactions!)
- changing the steplen value in a model (you must dump_bulk() first, change steplen and then load_bulk()
A tuple of five lists:
A node won’t appear in more than one list, even when it exhibits more than one problem. This method stops checking a node when it finds a problem and continues to the next node.
Problems 1, 2 and 3 can’t be solved automatically.
Solves some problems that can appear when transactions are not used and a piece of code breaks, leaving the tree in an inconsistent state.
The problems this method solves are:
- Nodes with an incorrect depth or numchild values due to incorrect code and lack of database transactions.
- “Holes” in the tree. This is normal if you move/delete nodes a lot. Holes in a tree don’t affect performance,
- Incorrect ordering of nodes when node_order_by is enabled. Ordering is enforced on node insertion, so if an attribute in node_order_by is modified after the node is inserted, the tree ordering will be inconsistent.
A boolean value. If True, a more agressive fix_tree method will be attemped. If False (the default), it will use a safe (and fast!) fix approach, but it will only solve the depth and numchild nodes, it won’t fix the tree holes or broken path ordering.
Currently what the destructive method does is:
So, even when the primary keys of your nodes will be preserved, this method isn’t foreign-key friendly. That needs complex in-place tree reordering, not available at the moment (hint: patches are welcome).