Joe Celko's Trees and Hierarchies in SQL for Smarties

Joe Celko

Mentioned 42

Expert advice for smarties is offered from the #1 SQL guru. Trees and hierarchies are topics that all SQL users need to know, and this is the first developer's guide that addresses these concepts that are universally difficult for programmers to master. The book is Web-enhanced with downloadable SQL code, ready to use.

More on Amazon.com

Mentioned in questions and answers.

Good Overviews

Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). Usually you end up with a combination of the options below that best fit your needs. The following provides some in depth reading:

Options

Ones I am aware of and general features:

  1. Adjacency List:
    • Columns: ID, ParentID
    • Easy to implement.
    • Cheap node moves, inserts, and deletes.
    • Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Table combined with level column can solve), path (Lineage Column can solve).
    • Use Common Table Expressions in those databases that support them to traverse.
  2. Nested Set (a.k.a Modified Preorder Tree Traversal)
    • Popularized by Joe Celko in numerous articles and his book Trees and Hierarchies in SQL for Smarties
    • Columns: Left, Right
    • Cheap level, ancestry, descendants
    • Compared to Adjacency List, moves, inserts, deletes more expensive.
    • Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
  3. Nested Intervals
    • Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
  4. Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach)
    • Columns: ancestor, descendant
    • Stands apart from table it describes.
    • Can include some nodes in more than one hierarchy.
    • Cheap ancestry and descendants (albeit not in what order)
    • For complete knowledge of a hierarchy needs to be combined with another option.
  5. Flat Table
    • A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
    • Expensive move and delete
    • Cheap ancestry and descendants
    • Good Use: threaded discussion - forums / blog comments
  6. Lineage Column (a.k.a. Materialized Path, Path Enumeration)
    • Column: lineage (e.g. /parent/child/grandchild/etc...)
    • Limit to how deep the hierarchy can be.
    • Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
    • Ancestry tricky (database specific queries)
  7. Multiple lineage columns
    • Columns: one for each lineage level, refers to all the parents up to the root, levels down from the items level are set to NULL
    • Limit to how deep the hierarchy can be
    • Cheap ancestors, descendants, level
    • Cheap insert, delete, move of the leaves
    • Expensive insert, delete, move of the internal nodes

Database Specific Notes

MySQL

Oracle

PostgreSQL

SQL Server

  • General summary
  • 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.

This is kind of a question that is still interesting even after all big 3 vendors implemented Recursive WITH clause. I'd suggest that different readers would be pleased with different answers.

  1. Comprehensive list of references by Troels Arvin.
  2. For the lack of competition, introductory textbook by Joe Celko "Trees and Hierarchies in SQL for Smarties" can indeed be considered a classic.
  3. Review of various tree encodings with emphasis to nested intervals.

Joe Celko wrote the book on SQL Trees & Hiearichies

This is the first edition. Look at the second edition in Bob's comment.

Assume you have a flat table that stores an ordered tree hierarchy:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

Here's a diagram, where we have [id] Name. Root node 0 is fictional.

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?

Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.

Pseudo code or plain English is okay, this is purely a conceptional question.

Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?


EDITS AND ADDITIONS

To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.

The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.

The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.

Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.

I have posted my own solution so you guys can pull it to pieces.

There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

Now you can get a tree starting at node 1 like this:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.


Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+

Re comment from @ashraf: "How about sorting the whole tree [by name]?"

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

Re comment from @Nate:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+

A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

It wasn't that long ago that I was a beginning coder, trying to find good books/tutorials on languages I wanted to learn. Even still, there are times I need to pick up a language relatively quickly for a new project I am working on. The point of this post is to document some of the best tutorials and books for these languages. I will start the list with the best I can find, but hope you guys out there can help with better suggestions/new languages. Here is what I found:

Since this is now wiki editable, I am giving control up to the community. If you have a suggestion, please put it in this section. I decided to also add a section for general be a better programmer books and online references as well. Once again, all recommendations are welcome.

General Programming

Online Tutorials
Foundations of Programming By Karl Seguin - From Codebetter, its C# based but the ideas ring true across the board, can't believe no-one's posted this yet actually.
How to Write Unmaintainable Code - An anti manual that teaches you how to write code in the most unmaintable way possible. It would be funny if a lot of these suggestions didn't ring so true.
The Programming Section of Wiki Books - suggested by Jim Robert as having a large amount of books/tutorials on multiple languages in various stages of completion
Just the Basics To get a feel for a language.

Books
Code Complete - This book goes without saying, it is truely brilliant in too many ways to mention.
The Pragmatic Programmer - The next best thing to working with a master coder, teaching you everything they know.
Mastering Regular Expressions - Regular Expressions are an essential tool in every programmer's toolbox. This book, recommended by Patrick Lozzi is a great way to learn what they are capable of.
Algorithms in C, C++, and Java - A great way to learn all the classic algorithms if you find Knuth's books a bit too in depth.

C

Online Tutorials
This tutorial seems to pretty consise and thourough, looked over the material and seems to be pretty good. Not sure how friendly it would be to new programmers though.
Books
K&R C - a classic for sure. It might be argued that all programmers should read it.
C Primer Plus - Suggested by Imran as being the ultimate C book for beginning programmers.
C: A Reference Manual - A great reference recommended by Patrick Lozzi.

C++

Online Tutorials
The tutorial on cplusplus.com seems to be the most complete. I found another tutorial here but it doesn't include topics like polymorphism, which I believe is essential. If you are coming from C, this tutorial might be the best for you.

Another useful tutorial, C++ Annotation. In Ubuntu family you can get the ebook on multiple format(pdf, txt, Postscript, and LaTex) by installing c++-annotation package from Synaptic(installed package can be found in /usr/share/doc/c++-annotation/.

Books
The C++ Programming Language - crucial for any C++ programmer.
C++ Primer Plus - Orginally added as a typo, but the amazon reviews are so good, I am going to keep it here until someone says it is a dud.
Effective C++ - Ways to improve your C++ programs.
More Effective C++ - Continuation of Effective C++.
Effective STL - Ways to improve your use of the STL.
Thinking in C++ - Great book, both volumes. Written by Bruce Eckel and Chuck Ellison.
Programming: Principles and Practice Using C++ - Stroustrup's introduction to C++.
Accelerated C++ - Andy Koenig and Barbara Moo - An excellent introduction to C++ that doesn't treat C++ as "C with extra bits bolted on", in fact you dive straight in and start using STL early on.

Forth

Books
FORTH, a text and reference. Mahlon G. Kelly and Nicholas Spies. ISBN 0-13-326349-5 / ISBN 0-13-326331-2. 1986 Prentice-Hall. Leo Brodie's books are good but this book is even better. For instance it covers defining words and the interpreter in depth.

Java

Online Tutorials
Sun's Java Tutorials - An official tutorial that seems thourough, but I am not a java expert. You guys know of any better ones?
Books
Head First Java - Recommended as a great introductory text by Patrick Lozzi.
Effective Java - Recommended by pek as a great intermediate text.
Core Java Volume 1 and Core Java Volume 2 - Suggested by FreeMemory as some of the best java references available.
Java Concurrency in Practice - Recommended by MDC as great resource for concurrent programming in Java.

The Java Programing Language

Python

Online Tutorials
Python.org - The online documentation for this language is pretty good. If you know of any better let me know.
Dive Into Python - Suggested by Nickola. Seems to be a python book online.

Perl

Online Tutorials
perldoc perl - This is how I personally got started with the language, and I don't think you will be able to beat it.
Books
Learning Perl - a great way to introduce yourself to the language.
Programming Perl - greatly referred to as the Perl Bible. Essential reference for any serious perl programmer.
Perl Cookbook - A great book that has solutions to many common problems.
Modern Perl Programming - newly released, contains the latest wisdom on modern techniques and tools, including Moose and DBIx::Class.

Ruby

Online Tutorials
Adam Mika suggested Why's (Poignant) Guide to Ruby but after taking a look at it, I don't know if it is for everyone. Found this site which seems to offer several tutorials for Ruby on Rails.
Books
Programming Ruby - suggested as a great reference for all things ruby.

Visual Basic

Online Tutorials
Found this site which seems to devote itself to visual basic tutorials. Not sure how good they are though.

PHP

Online Tutorials
The main PHP site - A simple tutorial that allows user comments for each page, which I really like. PHPFreaks Tutorials - Various tutorials of different difficulty lengths.
Quakenet/PHP tutorials - PHP tutorial that will guide you from ground up.

JavaScript

Online Tutorials
Found a decent tutorial here geared toward non-programmers. Found another more advanced one here. Nickolay suggested A reintroduction to javascript as a good read here.

Books
Head first JavaScript
JavaScript: The Good Parts (with a Google Tech Talk video by the author)

C#

Online Tutorials
C# Station Tutorial - Seems to be a decent tutorial that I dug up, but I am not a C# guy.
C# Language Specification - Suggested by tamberg. Not really a tutorial, but a great reference on all the elements of C#
Books
C# to the point - suggested by tamberg as a short text that explains the language in amazing depth

ocaml

Books
nlucaroni suggested the following:
OCaml for Scientists Introduction to ocaml
Using Understand and unraveling ocaml: practice to theory and vice versa
Developing Applications using Ocaml - O'Reilly
The Objective Caml System - Official Manua

Haskell

Online Tutorials
nlucaroni suggested the following:
Explore functional programming with Haskell
Books
Real World Haskell
Total Functional Programming

LISP/Scheme

Books
wfarr suggested the following:
The Little Schemer - Introduction to Scheme and functional programming in general
The Seasoned Schemer - Followup to Little Schemer.
Structure and Interpretation of Computer Programs - The definitive book on Lisp (also available online).
Practical Common Lisp - A good introduction to Lisp with several examples of practical use.
On Lisp - Advanced Topics in Lisp
How to Design Programs - An Introduction to Computing and Programming
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp - an approach to high quality Lisp programming

What about you guys? Am I totally off on some of there? Did I leave out your favorite language? I will take the best comments and modify the question with the suggestions.

Java: SCJP for Java 6. I still use it as a reference.

Haskell:

O'Reilly Book:

  1. Real World Haskell, a great tutorial-oriented book on Haskell, available online and in print.

My favorite general, less academic online tutorials:

  1. The Haskell wikibook which contains all of the excellent Yet Another Haskell Tutorial. (This tutorial helps with specifics of setting up a Haskell distro and running example programs, for example.)
  2. Learn you a Haskell for Great Good, in the spirit of Why's Poignant Guide to Ruby but more to the point.
  3. Write yourself a Scheme in 48 hours. Get your hands dirty learning Haskell with a real project.

Books on Functional Programming with Haskell:

  1. Lambda calculus, combinators, more theoretical, but in a very down to earth manner: Davie's Introduction to Functional Programming Systems Using Haskell
  2. Laziness and program correctness, thinking functionally: Bird's Introduction to Functional Programming Using Haskell

Some books on Java I'd recommend:

For Beginners: Head First Java is an excellent introduction to the language. And I must also mention Head First Design Patterns which is a great resource for learners to grasp what can be quite challenging concepts. The easy-going fun style of these books are ideal for ppl new to programming.

A really thorough, comprehensive book on Java SE is Bruce Eckel's Thinking In Java v4. (At just under 1500 pages it's good for weight-training as well!) For those of us not on fat bank-bonuses there are older versions available for free download.

Of course, as many ppl have already mentioned, Josh Bloch's Effective Java v2 is an essential part of any Java developer's library.

Let's not forget Head First Java, which could be considered the essential first step in this language or maybe the step after the online tutorials by Sun. It's great for the purpose of grasping the language concisely, while adding a bit of fun, serving as a stepping stone for the more in-depth books already mentioned.

Sedgewick offers great series on Algorithms which are a must-have if you find Knuth's books to be too in-depth. Knuth aside, Sedgewick brings a solid approach to the field and he offers his books in C, C++ and Java. The C++ books could be used backwardly on C since he doesn't make a very large distinction between the two languages in his presentation.

Whenever I'm working on C, C:A Reference Manual, by Harbison and Steele, goes with me everywhere. It's concise and efficient while being extremely thorough making it priceless(to me anyways).

Languages aside, and if this thread is to become a go-to for references in which I think it's heading that way due to the number of solid contributions, please include Mastering Regular Expressions, for reasons I think most of us are aware of... some would also say that regex can be considered a language in its own right. Further, its usefulness in a wide array of languages makes it invaluable.

C: “Programming in C”, Stephen G. Kochan, Developer's Library.

Organized, clear, elaborate, beautiful.

C++

The first one is good for beginners and the second one requires more advanced level in C++.

I know this is a cross post from here... but, I think one of the best Java books is Java Concurrency in Practice by Brian Goetz. A rather advanced book - but, it will wear well on your concurrent code and Java development in general.

C#

C# to the Point by Hanspeter Mössenböck. On a mere 200 pages he explains C# in astonishing depth, focusing on underlying concepts and concise examples rather than hand waving and Visual Studio screenshots.

For additional information on specific language features, check the C# language specification ECMA-334.

Framework Design Guidelines, a book by Krzysztof Cwalina and Brad Abrams from Microsoft, provides further insight into the main design decisions behind the .NET library.

For Lisp and Scheme (hell, functional programming in general), there are few things that provide a more solid foundation than The Little Schemer and The Seasoned Schemer. Both provide a very simple and intuitive introduction to both Scheme and functional programming that proves far simpler for new students or hobbyists than any of the typical volumes that rub off like a nonfiction rendition of War & Peace.

Once they've moved beyond the Schemer series, SICP and On Lisp are both fantastic choices.

For C++ I am a big fan of C++ Common Knowledge: Essential Intermediate Programming, I like that it is organized into small sections (usually less than 5 pages per topic) So it is easy for me to grab it and read up on concepts that I need to review.

It is a must read for me the night before and on the plane to a job interview.

C Primer Plus, 5th Edition - The C book to get if you're learning C without any prior programming experience. It's a personal favorite of mine as I learned to program from this book. It has all the qualities a beginner friendly book should have:

  • Doesn't assume any prior exposure to programming
  • Enjoyable to read (without becoming annoying like For Dummies /
  • Doesn't oversimplify

For Javascript:

For PHP:

For OO design & programming, patterns:

For Refactoring:

For SQL/MySQL:

  • C - The C Programming Language - Obviously I had to reference K&R, one of the best programming books out there full stop.
  • C++ - Accelerated C++ - This clear, well written introduction to C++ goes straight to using the STL and gives nice, clear, practical examples. Lives up to its name.
  • C# - Pro C# 2008 and the .NET 3.5 Platform - Bit of a mouthful but wonderfully written and huge depth.
  • F# - Expert F# - Designed to take experienced programmers from zero to expert in F#. Very well written, one of the author's invented F# so you can't go far wrong!
  • Scheme - The Little Schemer - Really unique approach to teaching a programming language done really well.
  • Ruby - Programming Ruby - Affectionately known as the 'pick axe' book, this is THE defacto introduction to Ruby. Very well written, clear and detailed.

Does anyone know where I can find a library of common but difficult (out of the ordinary) SQL script examples. I am talking about those examples you cannot find in the documentation but do need very often to accomplish tasks such as finding duplicates etc.

It would be a big time saver to have something like that handy.

EDIT: Thanks everyone, I think this is turning into a great quick reference. The more descriptive the more effective it would be, so please if you see your way open - please edit and add some descriptions of what one could find. Many thanks to those that have already done so!

Riffing off the Celko answer: SQL For Smarties. This has great in depth chapters that will augment the SQL Puzzles book. Also there is another Celko book I just learned of named Joe Celko's Trees and Hierarchies in SQL for Smarties.

I'm thinking the answer is no, but I'd love it it anybody had any insight into how to crawl a tree structure to any depth in SQL (MySQL), but with a single query

More specifically, given a tree structured table (id, data, data, parent_id), and one row in the table, is it possible to get all descendants (child/grandchild/etc), or for that matter all ancestors (parent/grandparent/etc) without knowing how far down or up it will go, using a single query?

Or is using some kind of recursion require, where I keep querying deeper until there are no new results?

Specifically, I'm using Ruby and Rails, but I'm guessing that's not very relevant.

Yes, this is possible, it's a called a Modified Preorder Tree Traversal, as best described here

Joe Celko's Trees and Hierarchies in SQL for Smarties

A working example (in PHP) is provided here

http://www.sitepoint.com/article/hierarchical-data-database/2/

I have a cms which stores comments against articles. These comments can be both threaded and non threaded. Although technically they are the same just with the reply column left blank when it's not threaded. My application works on sqlLite, MySQL and pgsql so I need fairly standard SQL.

I currently have a comment table

comment_id
article_id
user_id
comment
timestamp
thread (this is the reply column)

My question is to figure out how to best represent the threaded comments in the database. Perhaps in a separate table that supports the tree set without the content and a simple table to hold the text? Perhaps in the way it already is? Perhaps another way?

If the comments are un-threaded I can easily just order by the timestamp.

If they are threaded I sort like this

ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1))

As you can see from the ORDER BY, the commenting queries will not ever use an index as function based indexes only really live in Oracle. Help me have lightening fast comment pages.

You've got a choice between the adjacency and the nested set models. The article Managing Hierarchical Data in MySQL makes for a nice introduction.

For a theoretical discussion, see Celko's Trees and Hierarchies.

It's rather easy to implement a threaded list if your database supports windowing functions. All you need is a recursive reference in your target database table, such as:

create Tablename (
  RecordID integer not null default 0 auto_increment,
  ParentID integer default null references RecordID,
  ...
)

You can then use a recursive Common Table Expression to display a threaded view. An example is available here.

I have a tree structure in the DB with TreeNodes table. the table has nodeId, parentId and parameterId. in the EF, The structure is like TreeNode.Children where each child is a TreeNode... I also have a Tree table with contain id,name and rootNodeId.

At the end of the day I would like to load the tree into a TreeView but I can't figure how to load it all at once. I tried:

var trees = from t in context.TreeSet.Include("Root").Include("Root.Children").Include("Root.Children.Parameter")
        .Include("Root.Children.Children")
                        where t.ID == id
                        select t;

This will get me the the first 2 generations but not more. How do I load the entire tree with all generations and the additional data?

When you use Include(), you are asking the Entity Framework to translate your query into SQL. So think: How would you write an SQL statement which returns a tree of an arbitrary depth?

Answer: Unless you are using specific hierarchy features of your database server (which are not SQL standard, but supported by some servers, such as SQL Server 2008, though not by its Entity Framework provider), you wouldn't. The usual way to handle trees of arbitrary depth in SQL is to use the nested sets model rather than the parent ID model.

Therefore, there are three ways which you can use to solve this problem:

  1. Use the nested sets model. This requires changing your metadata.
  2. Use SQL Server's hierarchy features, and hack the Entity Framework into understanding them (tricky, but this technique might work). Again, you'll need to change your metadata.i
  3. Use explicit loading or EF 4's lazy loading instead of eager loading. This will result in many database queries instead of one.

I have a table in SQL server that has the normal tree structure of Item_ID, Item_ParentID. Suppose I want to iterate and get all CHILDREN of a particular Item_ID (at any level).

Recursion seems an intuitive candidate for this problem and I can write an SQL Server function to do this.

Will this affect performance if my table has many many records? How do I avoid recursion and simply query the table? Please any suggestions?

Joe Celko has a book (<- link to Amazon) specifically on tree structures in SQL databases. While you would need recursion for your model and there would definitely be a potential for performance issues there, there are alternative ways to model a tree structure depending on what your specific problem involves which could avoid recursion and give better performance.

I'm considering using PostgreSQL's Ltree module in my application to help with threaded comments. I've been eying it for a while to use for threaded comments. I figure it would help with cases where you need to update a node and its children, like when you want to hide a comment and its replies.

I'm thinking ltree (or something like it) it would be useful if it was coupled with a traditional adjacency list ("comment_id"/"parent_comment_id").

Before taking the plunge into using ltree, I'm wondering a few things:

  1. Are you, or have you, used ltree? Is it what one might call "production ready"?
  2. If so, what problems did you use it to solve? Did it do a good job?
  3. Do you think it is a good fit for a threaded comment system?
    1. If you used it, what did you use for the "text" part of the path? Did you set up something like the DMOZ example they use "Top.Astronomy.Cosmology" or base it on something like the primary key "1.403.29.5"?
    2. Is there a better way to do this? I'm a bit nervous using a nested list approach--everything I've read suggests that it isn't all to hot with UPDATES or INSERTS (don't you have to reorder the whole thing?). I'm also not a CS major and that kind of data structure is something I might forget in the future. Is anybody using nested lists for comments or something like it?

If it is of any help, here is the schema I'm considering:

CREATE TABLE comments (
    comment_id SERIAL PRIMARY KEY,
    parent_comment_id int REFERENCES comments(comment_id) ON UPDATE CASCADE ON DELETE CASCADE,
    thread_id int NOT NULL  REFERENCES threads(thread_id) ON UPDATE CASCADE ON DELETE CASCADE,
    path ltree NOT NULL,
    comment_body text NOT NULL,
    hide boolean not null default false
);

The "path" column, used by ltree, would look something like:

<thread_id>.<parent_comment_id_#1>.<parent_comment_id_#2>.<my_comment_id>

Is there anything wrong with using the primary keys in the path? Should I be including the node's own primary key in the path? If I did, would it make sense to put a unique index on it to serve as a constraint?

I recommend anyone implementing hierarchical relationships in SQL read Joe Celko's Trees and Hierarchies in SQL for Smarties.

Traversing arbitrary depth parent child links can be very inefficient when using just a parent_id. The book describes techniques that make this access fast.

One strategy (which I happen to use) can also be found for free in this series of articles:

I know there are two approaches: adjacency list and nested tree. It's said that adjacency list can become slow to use on traversal because of numerous queries. But I don't know any realistic figures for this. The site I'm making will have in the region of 200 pages. Is traversal to generate (for example) a sitemap going to take longer than about 0.3 seconds?

Running on MySQL (innoDB) with LAMP stack.

I'd prefer to implement adjacency if possible because of the more simplistic design.

Thanks.

There are more options than just the two you mention. There are:

  • Adjacency List (the "parent_id" one almost everyone uses)
  • Nested Sets
  • Path Enumeration
  • Closure Table (aka Adjacency Relation)

See my answer to "What is the most efficient/elegant way to parse a flat table into a tree?"

Or a couple of books:

I'm curious to know what the best way (best practice) to handle hierarchies are in regards to database design. Here is a small example of how I usually handle them.

Node Table

NodeId int PRIMARY KEY
NodeParentId int NULL
DisplaySeq int NOT NULL
Title nvarchar(255)

Ancestor Table

NodeId int
AncestorId int
Hops int

with Indexes on NodeId, AncestorId, Hops

Tables look like this:

Node Table

NodeId    NodeParentId    DisplaySeq    Title
1         NULL            1             'Root'
2         1               1             'Child 1'
3         1               2             'Child 2'
4         2               1             'Grandchild 1'
5         2               2             'Grandchild 2'

Ancestor Table

NodeId    AncestorId    Hops
1         NULL          0
1         1             0
2         1             1
2         2             0
3         1             1
3         3             0
4         1             2
4         2             1
4         4             0
5         1             2
5         2             1
5         5             0

With this design, I've found that with large hierarchies I can get an entire section of the hierarchy very quickly by joining on the Ancestor table for AncestorId = target NodeId, like:

SELECT *
FROM Node n
INNER JOIN Ancestor a on a.NodeId=n.NodeId
WHERE a.AncestorId = @TargetNodeId

It's also easy to get direct children as well

SELECT *
FROM Node n
INNER JOIN Ancestor a on a.NodeId=n.NodeId
WHERE a.AncestorId = @TargetNodeId
AND Hops = 1

I'm interested in knowing what other solutions you may have used for this type of thing. In my experience, hierarchies can get pretty hairy, and any way to optimize their retrieval is very important.

There are some vendor-specific extensions to do this, but my favorite db-neutral way comes from Joe Celko - google 'Joe Celko Trees and Hierarchies' or buy this book: link text

This is a very clever set-based way to go. Easy to query hierarchy. I added the 'parentID' field you have just because I ask the 'direct children' and 'parent' questions a lot and that speeds those up. But this is a wonderful way to get a 'ancestry' or 'descdent' query.

We have the following example table (actually taken from another example here on stackoverflow...)

CREATE TABLE example (
  id integer primary key,
  name char(200),
  parentid integer,
  value integer);

And given a specific child we want to get the top Parent.

I know of the tablefunc connectby function but that is for getting a parents children.

But, I'm interested in the other direction, given a child what is its top parent? What type of query would I try and use?

Any friendly advice is appreciated.

Look into Joe Celko's books, SQL for Smarties and his book on Trees and Hierarchies. He has a section or two in SQL for Smarties on trees and hierarchies, or if you want to really get into it then you can get the other book. SQL for Smarties will also touch on a lot of other database design and querying info. Some really good stuff in there. He presents alternative ways of modeling trees which can work much better than the adjacency list model that you're using.

In one of his models the question of "who is the top most parent" becomes very trivial.

The title might be worded strange, but it's probably because I don't even know if I'm asking the right question.

So essentially what I'm trying to build is a "breadcrumbish" categoricalization type system (like a file directory) where each node has a parent (except for root) and each node can contain either data or another node. This will be used for organizing email addresses in a database. I have a system right now where you can create a "group" and add email addresses to that group, but it would be very nice to add an organizational system to it.

This (in my head) is in a tree format, but I don't know what tree.

The issue I'm having is building it using MySQL. It's easy to traverse trees that are in memory, but on database, it's a bit trickier.


Image of tree: http://j.imagehost.org/0917/asdf.png


SELECT * FROM Businesses: Tim's Hardware Store, 7-11, Kwik-E-Mart, Cub Foods, Bob's Grocery Store, CONGLOM-O

SELECT * FROM Grocery Stores: Cub Foods, Bob's Grocery Store, CONGLOM-O

SELECT * FROM Big Grocery Stores: CONGLOM-O

SELECT * FROM Churches: St. Peter's Church, St. John's Church


I think this should be enough information so I can accurately describe what my goal is.

As always when I see questions about modeling trees and hierarchies, my suggestion is that you get a hold of a copy of Joe Celko's book on the subject. He presents various ways to model them in a RDBMS, some of which are fairly imaginative, and he gives the pros and cons for each pattern.

What in everyone's opinion is the best representation for a time-bound hierarchy in SQL?

What I mean by this is:
- On any given date you have a normal tree hierarchy
- This hierarchy can change from day to date
- Each child still only has one parent on any given date

Day 1...

Business
 |
 |-Joe
 |  |-Happy
 |  |-Sneezy
 |  |-Doc(*)
 |
 |-Moe
    |-Bashfull
    |-Sleepy

Day 2...

Business
 |
 |-Joe
 |  |-Happy
 |  |-Sneezy
 |
 |-Moe
    |-Doc(*)
    |-Bashfull
    |-Sleepy

At any time, a child can join the hierarchy for the first time, or leave the hierarchy completely. (For example, new employees, and retired employees.)

The main considerations:

  • Updating the hierarchy
  • Viewing the whole hierarchy across a date range
  • Reporting on whole sub-trees within the hierarchy
  • Reporting on whole sub-trees across a date range

I know how I do it at present, but am intrigued as to how other people may do it :)

EDIT

I naively assumed a few considerations so will be more explicit...

  • Each 'team' or 'person' will have a unique ID in a dimension table elsewhere
  • Other fact tables will use those IDs (storing performance metrics, for example)
  • The structure needs to facilitate historical reporting across date ranges
  • Use of ETL or triggers to maintain alternative structures Is an option

The generic nature is most important (forming just one part of a generic relational mode), combined with ease of use for driving report (for any part of the tree across any range of dates) and the ability to be updated reliably.

There are several different books of relevance here - one set is for 'temporal databases', and the other for 'hierarchical structures in RDBMS'.

The tricky parts of your question, it seems to me, are:

  • Viewing the whole hierarchy across a date range

  • Reporting on whole sub-trees across a date range

The other items are, if not straight-forward, then manageable using the techniques outlined in the books, and along the lines suggested in other answers. Part of the problem is understanding what those two bullet points mean. In one sense, they are 'the same'; the 'whole hierarchy' is just a special case of 'whole sub-trees'. But the deeper question is 'how do you want to demonstrate - visualize, represent - the changes in the hierarchy over time?' Are you seeking to compare the states at the start and end times, or are you seeking to see the intermediate changes too? How do you want to represent the moves of an individual within a hierarchy?

More questions than answers - but I hope the pointers are some help.

What is the best way to build the table that will represent the tree? I want to implement a select ,insert ,update and delete that will work well with big data. The select for example will have to support "Expand ALL" - getting all the children (and there children) for a given node.

Check out Joe Celko's book on trees and hierarchies for multiple ways to tackle the hierarchy problem. The model that you choose will depend on how you weight lookups vs. updates vs. complexity. You can make the lookups pretty fast (especially for getting all children in a node) using the adjacency list model, but updates to the tree are slower.

My employer, a small office supply company, is switching suppliers and I am looking through their electronic content to come up with a robust database schema; our previous schema was pretty much just thrown together without any thought at all, and it's pretty much led to an unbearable data model with corrupt, inconsistent information.

The new supplier's data is much better than the old one's, but their data is what I would call hypernormalized. For example, their product category structure has 5 levels: Master Department, Department, Class, Subclass, Product Block. In addition the product block content has the long description, search terms and image names for products (the idea is that a product block contains a product and all variations - e.g. a particular pen might come in black, blue or red ink; all of these items are essentially the same thing, so they apply to a single product block). In the data I've been given, this is expressed as the products table (I say "table" but it's a flat file with the data) having a reference to the product block's unique ID.

I am trying to come up with a robust schema to accommodate the data I'm provided with, since I'll need to load it relatively soon, and the data they've given me doesn't seem to match the type of data they provide for demonstration on their sample website (http://www.iteminfo.com). In any event, I'm not looking to reuse their presentation structure so it's a moot point, but I was browsing the site to get some ideas of how to structure things.

What I'm unsure of is whether or not I should keep the data in this format, or for example consolidate Master/Department/Class/Subclass into a single "Categories" table, using a self-referencing relationship, and link that to a product block (product block should be kept separate as it's not a "category" as such, but a group of related products for a given category). Currently, the product blocks table references the subclass table, so this would change to "category_id" if I consolidate them together.

I am probably going to be creating an e-commerce storefront making use of this data with Ruby on Rails (or that's my plan, at any rate) so I'm trying to avoid getting snagged later on or having a bloated application - maybe I'm giving it too much thought but I'd rather be safe than sorry; our previous data was a real mess and cost the company tens of thousands of dollars in lost sales due to inconsistent and inaccurate data. Also I am going to break from the Rails conventions a little by making sure that my database is robust and enforces constraints (I plan on doing it at the application level, too), so that's something I need to consider as well.

How would you tackle a situation like this? Keep in mind that I have the data to be loaded already in flat files that mimic a table structure (I have documentation saying which columns are which and what references are set up); I'm trying to decide if I should keep them as normalized as they currently are, or if I should look to consolidate; I need to be aware of how each method will affect the way I program the site using Rails since if I do consolidate, there will be essentially 4 "levels" of categories in a single table, but that definitely seems more manageable than separate tables for each level, since apart from Subclass (which directly links to product blocks) they don't do anything except show the next level of category under them. I'm always a loss for the "best" way to handle data like this - I know of the saying "Normalize until it hurts, then denormalize until it works" but I've never really had to implement it until now.

If I understand correctly, you want to take their separate tables and turn them into a hierarchy that's kept in a single table with a self-referencing FK.

This is generally a more flexible approach (for example, if you want to add a fifth level), BUT SQL and relational data models don't tend to work well with linked lists like this, even with new syntax like MS SQL Servers CTEs. Admittedly, CTEs make it much better though.

It can be difficult and costly to enforce things, like that a product must always be on the fourth level of the hierarchy, etc.

If you do decide to do it this way, then definitely check out Joe Celko's SQL for Smarties, which I believe has a section or two on modeling and working with hierarchies in SQL or better yet get his book that is devoted to the subject (Joe Celko's Trees and Hierarchies in SQL for Smarties).

I'm creating a web site where all pages hang off a database-driven tree-hierarchy.

All but one node has a parent node. Nodes may have role-based read permissions. Some nodes may have special rules (such as: don't display within navigation menus).

Nodes may represent links to other nodes (like a shortcut in Windows). Nodes typically represent pages.

Pages present either HTML content or execute programming. Some pages may be roots of subtrees (alternate masterpages and stylesheets).

Please help me setup my nodes database in Microsoft SQL Server for use by Linq to SQL.

I've got three ideas:

  1. Many lightweight tables with almost zero nullalbe fields.

    #1 Many lightweight tables with almost zero nullalbe fields

  2. Heavyweight Node table with lots of nullalbe fields.

    #2 Heavyweight Node table with lots of nullalbe fields

  3. Best (or worst) of both: Lots of nullalbe foreign keys to many lightweight tables.

    #3 Lots of nullalbe foreign keys to many lightweight tables

Which do you feel best represents the data? Which will be easiest to use with Linq to SQL?

How can I keep my data integrity rules within the database? How do I best enforce them within my programming?

  • Nodes must be either (but not both) links or pages.

  • Pages must be either (but not both) html or code.

  • Links may not be roots, html, nor code.

Can I make an ASP.NET Site Map Provider with such a structure? Should I?


Update: I've asked a more general question:

What’s the best way to handle one-to-one relationships in SQL?


Related question:
How do I enforce data integrity rules in my database?

I agree that designing with Linq in mind is bass-ackwards. Joe Celko's book "Trees and Hierarchies in SQL For Smarties" has many good ideas for schemas to represent what you're trying to do. Linq ought to be able to deal with these just fine.

Suppose I have a MySQL table that defines a collection of things, each of which is associated with either 1 or 2 owners. For example:

CREATE TABLE thing (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
    , name CHAR(10)
    , first_owner INT UNSIGNED NOT NULL
    , second_owner INT UNSIGNED DEFAULT NULL
    );

+----+------------+-------------+--------------+
| id | name       | first_owner | second_owner |
+----+------------+-------------+--------------+
| 1  | skateboard | Joe         | NULL         |
| 2  | flashlight | Joe         | NULL         |
| 3  | drill      | Joe         | Erica        |
| 4  | computer   | Erica       | NULL         |
| 5  | textbook   | Diane       | NULL         |
| 6  | cell phone | Amy         | Diane        |
| 7  | piano      | Paul        | Amy          |
+----+------------+-------------+--------------+

Each distinct owner is a node of a graph, and two owners in the same row constitute an edge between their nodes. A graph drawn from the above example rows looks like this:

In this example, there are two components: Joe and Erica are one; Diane, Paul and Amy are the other.

I want to identify these components in my table, so I add another column:

ALTER TABLE thing ADD COLUMN `group` INT UNSIGNED;

How could I write an UPDATE statement that would populate this new column by uniquely identifying the connected component to which the row belongs? Here's an example of an acceptable result for the above example rows:

+----+------------+-------------+--------------+-------+
| id | name       | first_owner | second_owner | group |
+----+------------+-------------+--------------+-------+
| 1  | skateboard | Joe         | NULL         | 1     |
| 2  | flashlight | Joe         | NULL         | 1     |
| 3  | drill      | Joe         | Erica        | 1     |
| 4  | computer   | Erica       | NULL         | 1     |
| 5  | textbook   | Diane       | NULL         | 2     |
| 6  | cell phone | Amy         | Diane        | 2     |
| 7  | piano      | Paul        | Amy          | 2     |
+----+------------+-------------+--------------+-------+

I could do this with a stored procedure, but my actual scenario involves more tables and millions of rows, so I'm hoping there's a clever way to do this without looping through cursors for a week.

This is a simplified example for the purpose of illustrating the problem. Each component is supposed to represent a "household" and most will have only 1 or 2 nodes, but those with more nodes are especially important. There isn't necessarily any strict upper limit to the size of a household.

A very good answer to a related question

"What is the most efficient/elegant way to parse a flat table into a tree?"

There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

Please have a look at original question for reference.

I'm making an website and I have problem generate the parent/child tree like:

Page 1
---Sub page 1
------Sub page 1 - 1
Page 2
Page 3
---Sub page 3
------Sub page 3 - 1
------Sub page 3 - 2

If I use UID, it's impossible to write the "ORDER BY" t-sql to make the tree. I'm thinking of a function that can generate the ID (varchar) such as:

001000000
---001001000
------001001001
002000000
003000000
---003001000
------003001001
------003001002

Look into the nested set model for hierarchies. Joe Celko has a book which covers this in addition to other ways to model trees and hierarchies in SQL. With the nested set model you would have something like this:

CREATE TABLE Tree_Nodes
(
    lft     INT         NOT NULL,
    rgt     INT         NOT NULL,
    name    VARCHAR(40) NOT NULL,
    CONSTRAINT PK_Tree_Nodes PRIMARY KEY CLUSTERED (lft, rgt)
)
INSERT INTO Tree_Nodes (lft, rgt, name)
SELECT 1, 6, 'Page 1' UNION ALL
SELECT 2, 5, 'Sub page 1' UNION ALL
SELECT 3, 4, 'Sub page 1 - 1' UNION ALL
SELECT 7, 8, 'Page 2' UNION ALL
SELECT 9, 16, 'Page 3' UNION ALL
SELECT 10, 15, 'Sub page 3' UNION ALL
SELECT 11, 12, 'Sub page 3 - 1' UNION ALL
SELECT 13, 14, 'Sub page 3 - 2'

Then to get the result that you're trying to get, it's simply:

SELECT
    lft,
    rgt,
    name
FROM
    Tree_Nodes
ORDER BY
    lft

I'm not aware how deep my tree will be. So I think the NSM is fit for me, reading some docs. In sql, this model suppose I'm using an integer value as primary key. I thought to create a twin table only to store the ints (PK,left,righ) connected by a relation one-to-one with the real table. Things are complicating and it is a waste of space disk, especially when the server is not mine and I have to pay each megabyte. Help!!

UPDATE

Excellent! Fabolous!! Thanks Macka and Bill, I could skip reading a whole book, for now. Celko is a future order on Amazon. ;-)

As @Macka writes, the left and right values are not foreign keys to tree nodes, so they don't have to be the same type. They can be integers while the tree node primary key is a GUID.

Celko also wrote "Trees and Hierarchies in SQL for Smarties" which goes into more detail about Nested Set Model, and other solutions. Reading this book will save you a lot of time and a lot of mistakes.

There are other solutions to storing hierarchical data in a database. See my answer here: What is the most efficient/elegant way to parse a flat table into a tree?

in my forum i have threads and replies.

one thread has multiple replies. but then, a reply can be a reply of an reply (like google wave). because of that a reply has to have a column "reply_id" so it can point to the parent reply. but then, the "top-level" replies (the replies directly under the thread) will have no parent reply.

so how can i fix this? how should the columns be in the reply table (and thread table).

at the moment it looks like this:

threads: id title body

replies: id thread_id (all replies will belong to a thread) reply_id (here lies the problem. the top-level replies wont have a parent reply) body

what could a smart design look like to enable reply a reply?

My data fits a tree form naturally. Therefore, I have a simple SQL table to store the data: {id, parentid, data1, ..., dataN}

I want to be able to "zoom in" on the data and produce a report which summarizes the data found below the current branch. That is, when standing in the root, I want to have the totals of all the data. When I have traveled down a certain branch of the tree, I want to only have the summation of the data found only for that node and its child nodes.

How do I write such a query in SQL?

Thanks in advance!

/John

Vlad's reference on nested sets looks pretty good. If you want something that covers trees and hierarchies in more detail then you can also check out Joe Celko's book.

The "ID, ParentID" adjacency list model is really an "old time" way of looking at hierarchies in a relational database model.

Apologies for the less than ideal title; had a hard time coming up with something.

Three entities: Products, Clients and Client_Product_Authorization.

A Client has access to many Products.

A Product is accessed by many Clients.

A Client_Product_Authorization authorizes a Client to access a Product.

We have 100,000+ Products (virtual goods) and managing Client_Product_Authorizations becomes a very labor intensive task. So we have Products that is a container of other Products. Example:

Product X is a container for Products 1 through 2000.

So, by creating a Client_Product_Authorization granting a Client the Product X, we are indirectly providing for the client to access products 1 through 2000. Mind that product 1 might be contained in different container products (so, yes, it is a many-to-many self relationship).

Here is the entity level data model:

alt text

The advantage of this mechanism is that we can just change the composition of Product X (adding or removing other products) and it automatically adjusts the product list available to the clients authorized to Product X. Managing awarding access to our large product-base is a matter of selecting a few container products.

The disadvantage is that it now became harder (in terms of creating a SQL statement, because of the many-to-many self-relationship) to know what the Client is actually authorized to see (as in the individual non-container products). As in:

Product Z is a container for Product X and Product Y

Product X is a container for Products 1 through 2000

Product Y is a container for Products 2001 through 5000

What are the actual non-container products a client authorized to Product Z can see?

Products 1 through 2000 and 2001 through 5000.

I would like to make the list of non-container products a client is authorized for to be materialized in some way. So that questions like:

Should Client ABC be allowed to see Product 78?

OR

What products is Client ABC authorized to see?

can be easily responded with a query.

The goal is to make the job of software trying to determine the list of products accessible to a client a simple mechanism, instead of requiring a traversal through all container products, their sub-container products, etc etc etc.

Three questions:

a) Is there a better data-model for this scenario?

b) Is there a different mechanism to simplify the management of access authorization for this large set of products?

c) How would you go about making the obtention of the list of non-container products available to a client as simple as possible?

I appreciate the collective's input. Thanks in advance!


Update: Limiting the number of nested products is not an option for business reasons.

Since you're using SQL 2005 you should have access to common table expressions (CTEs) which makes the recursion of finding the children of a product much easier. You might want to look into CTEs and see if that's sufficient for what you're doing.

Also, I don't recall this specific scenario and my copy of the book is at home, but Joe Celko wrote a very good book on modeling hierarchies and trees in an RDBMS. It's probably worth looking into to see if there is a better model for this. He had a few rather ingenious ones for other scenarios that didn't seem obvious at first, but which are very efficient. Even if there isn't a direct match, some of the techniques which he uses might be useful.

The model which you have is what's referred to as the adjacency list model. Celko also shows how to model hierarchies using what are called the nested set model and the path enumeration model.

The nested set model may seem a little complicated at first, but it's actually simple in a way. It's more expensive for updating, but selects from it are VERY fast compared to just about any other way to model hierarchies. You can find an abbreviated description of it here. Since a product can be contained in multiple trees you would have to adapt it slightly for your case.

The path enumeration model basically just uses a delimited (or XML) string to list out the path to the row in question, starting at the root of the tree. You then use string (or XQuery) functions to find children of a parent, etc. As far as I know, it's only really useful for trees, which have a single root, so I don't think that you could use it in your case.

I am translating SQL Server SQL Statements into their ANSI generic equivalent at present, and am stuck with a recursive statement using a WITH statement.

For the sake of concentrating on the issue, I'll simplify the issue as follows

If I have two tables

  1. ReportingUnit

    • col1: Key
    • col2: ParentReportingUnitKey
  2. Facility

    • col1: Key
    • col2: ParentReportingUnitKey

This structure is describing a hierarchy of reporting units down to a facility, where a reporting unit may have 0 .. 1 direct parent reporting units and 0 .. n child reporting units.

A facility is a 'leaf' record, that links to a reporting unit.

I need to craft an ANSI 92 valid SQL Statement (or at worst one that will work on Oracle, DB2 and SQL Server) that will return all facilities related to a given reporting unit anywhere up the hierarchy.

e.g.

  • ReportingUnit R1 has ReportingUnit children R1.1 and R1.2
  • ReportingUnit R1.1 has children R1.1.1, R1.1.2
  • ReportingUnit R1.2 has children R1.2.1, R1.2.2

  • Facility F1 has a parent reporting unit R1.1.1

  • Facility F2 has a parent reporting unit R1.1.2
  • Facility F3 has a parent reporting unit R1.2.1
  • Facility F4 has a parent reporting unit R1.2.2

Bearing in mind there are may be 0 .. n levels of recursion in the ReportingUnit table, how can I return all 4 facilities from a SQL Statement given the parameter ReportingUnit=R1?

There is no SQL-92 solution for recursive queries.

The best option is to use one of the solutions for encoding hierarchical relationships so that you can query all descendants or ancestors, using standard SQL.

See a brief description here: "What is the most efficient/elegant way to parse a flat table into a tree?".

Or read "Trees and Hierarchies in SQL for Smarties" by Joe Celko.

I have a table whose 'path' column has values and I would like to update the table's 'child_count' column so that I get the following output.

 path   | child_count 
--------+-------------
        |           5
 /a     |           3
 /a/a   |           0
 /a/b   |           1
 /a/b/c |           0
 /b     |           0

My present solution - which is way too inefficient - uses a stored procedure as follows:

CREATE FUNCTION child_count() RETURNS VOID AS $$
DECLARE
  parent VARCHAR; 
BEGIN   
  FOR parent IN
    SELECT path FROM my_table
  LOOP
    DECLARE
      tokens VARCHAR[] := REGEXP_SPLIT_TO_ARRAY(parent, '/');
      str VARCHAR := '';
    BEGIN
      FOR i IN 2..ARRAY_LENGTH(tokens, 1)
      LOOP
        UPDATE my_table
          SET child_count = child_count + 1
        WHERE path = str;
        str := str || '/' || tokens[i];     
      END LOOP;
    END;    
  END LOOP;
END;
$$ LANGUAGE plpgsql;

Anyone knows of a single UPDATE statement that does the same thing?

Managing trees in sql is not a simple subject. I would recommend you trying to find a good library doing that. Counting direct and indirect descendants is only a small part of what you might want to do with your tree. And storing your "children count" in the database is maybe not the best idea, since the tree can change in the future.

I'm not sure it will fit your development environment, but there is a nice gem for rails called ancestry, that does that just fine (it uses a materialized path as well). But that's ruby, and if I understand you correctly, you want to implement that in postgresql. I would then recommend you buying the book Trees and hierarchies in SQL for Smarties, of Joe Celko.

Update:

Here is a postgresql additional module you might want to have a look at: http://www.postgresql.org/docs/current/static/ltree.html

I have the following data set, which represents nodes in a directed graph.

CREATE TABLE nodes (NODE_FROM VARCHAR2(10),
                    NODE_TO VARCHAR2(10));

INSERT INTO nodes VALUES('GT','TG');
INSERT INTO nodes VALUES('GG','GC');
INSERT INTO nodes VALUES('AT','TG');
INSERT INTO nodes VALUES('TG','GC');
INSERT INTO nodes VALUES('GC','CG');
INSERT INTO nodes VALUES('TG','GG');
INSERT INTO nodes VALUES('GC','CA');
INSERT INTO nodes VALUES('CG','GT');

Visual representation: http://esser.hopto.org/temp/image1.JPG

Using this data set, I want a user to enter a level (e.g. 2) and this returns all nodes 2 "hops" away from a specific node):

NODE_FROM  NODE_TO

TG        GC
TG        GG
AT        TG
GT          TG

http://esser.hopto.org/temp/image2.JPG

My current attempt looks like this:

SELECT node_from, node_to
  FROM nodes
  WHERE level <= 2   -- Display nodes two "hops" from 'AT'
START WITH node_from = 'AT'
CONNECT BY NOCYCLE PRIOR node_to = node_from
    OR    node_to = PRIOR node_from
GROUP BY node_from, node_to;

http://esser.hopto.org/temp/image3.JPG

As you can see, the relationship: GT -> TG is missing.

Sounds like you need to get a copy of Joe Celko's Trees and Hierarchies in SQL for Smarties.

I'm begin to developing a scial sharing website so I'm curious about database design Schema... So in Data-Mining Star-Schema is the best one but how about a social sharing website... And as a nature of the SS websites there will be (i hope :)) many users in same time... Which better for performance for overdose using...

What do you want to do? Star Schema and Snow Flake are reporting schemas. Social sharing would not need that except mayby then for reporting?

You need something representing the social relations, that is usually done with a graph database http://en.wikipedia.org/wiki/Graph_database or in a RDBMS there are graph techniques such as this More details in the book by Celko

I'm building a shopping cart website and using SQL tables

CATEGORY

Id int,
Parent_Id,
Description varchar(100)

Data:

1   0   Electronics
2   0   Furniture
3   1   TVs
4   3   LCD
5   4   40 inches
6   4   42 inches

PRODUCTS

Id int,
Category_Id int
Description...

Data:

1   5   New Samsung 40in LCD TV
2   6   Sony 42in LCD TV

As you can see I only have one column for the last Child Category

Now what I need to do is search by Main Category at homepage, for example if the user clicks to Electronics, show both TVs as they have a Parent-Parent-Parent Id at Electronics, keeping in mind that Products table do have only one column for Category.

Shall I update the Products Table and include 6 columns for category childs in order to solve this? Or how can I build an effective SQL Stored Procedure for this?

Thank you

Jerry

If you're using SQL 2008 then you might want to look at the HIERARCHYID data type. Otherwise, you might want to consider redesigning the Category table. How you have it modeled now, you have to use recursion to get from children notes to parents or from parents down through children.

Instead of using the linked list model (which is what you have) you could use the nested set model for hierarchies. Do a search on Joe Celko and Nested Set Model and you should be able to find some good descriptions of it. He also wrote an entire book on modeling trees and hierarchies in SQL. The nested set model requires a bit of set up to maintain the data, but it's much easier to work with when selecting out data. Since your categories will probably remain relatively stable it seems like a good solution.

EDIT: To actually answer your question... you could write a stored procedure that sits in a WHILE loop, selecting children and collecting any products found in a table variable. Check @@ROWCOUNT in each loop and if it's 0 then you've gotten to the end. Then you just select out from your table variable. It's a recursive (and slow) method, which is why this type of a model doesn't work very well in many cases in SQL.

Under almost no circumstances should you just add 6 (or 7 or 8) category IDs to your products table. Bad. Bad. Bad. It will be a maintenance nightmare among other things (what happens when your categories go 7 levels deep... then 8... then 9.

I've come across this problem a few times now, and I would like to find a better way to solve the problem.

Basically I have a Category which has Sub Categories. If the Sub Category does not have a Description I would like to retrieve the Parent Category Description.

It starts to get difficult when I have Sub Sub Categories.

The Sql Table looks like this:

CategoryID         int,
ParentCategoryID   int null,
Name               varchar(255),
Description        varchar(MAX) null

I would like to create a function that would look up the Description, but I have no idea if that's the best solution or how to create the function.

I'm mainly looking for the best / right way to solve this issue.

Given your structure, Alex Martelli's solution is probably the best you'll find. Another option, if you can change the model, would be to change from a linked-list tree structure to the nested set model.

Joe Celko has a book on trees and hierarchies which goes into the various ways that you can model them and the advantages/disadvantages of each method. You can also probably find a lot of information on the subject through Google or Google Groups.

One of the disadvantages of the nested set model is that changes to the tree structure are a bit more expensive, but for products you're typically retrieving much more than you're updating. Especially moving things around between categories which is usually rare in most business cases.

Using the nested set model, the following would give you what you want:

SELECT
     P1.Name,
     COALESCE(P1.Description, P2.Description) AS Description
FROM
     Products P1
LEFT OUTER JOIN Products P2 ON
     P2.lft < P1.lft AND
     P2.rgt > P1.rgt AND
     P2.Description IS NOT NULL
LEFT OUTER JOIN Products P3 ON
     P3.lft < P1.lft AND P3.lft > P2.lft AND
     P3.rgt > P1.rgt AND P3.rgt < P2.rgt AND
     P3.Description IS NOT NULL
WHERE
     P3.ID IS NULL

Our business as a tiered Salesman relation, sometimes called an omni-tier. Its 3 deep.

in english: Salesman-A-tier has people under them, we'll call them salesman-B-tier, and b-tier has salesman under them salesman-C-tier.

table:

id, name, agentId 
1011, bob, 0
1012, jim, 1011
1013, tim, 1011
1014, sam, 1011
1015, dav, 1013
1016, kim, 1013
1017, sal, 1015
1018, vin, 1015

(the ID is the agents' Id, the field called agentId is that salesmans upstream agent)

what i need is a list of all the salesmen under (in this case bob or id=1011), 3 tiers deep.

i've gotten 2 levels deep but get throttled after that. figuring theres a better approach i cannot see myself, i'm asking for help.

my sql so far:

select c.id, c.name, c.agentId from salesmen s where s.agentId = 1011 or s.agentId = (select ss.agentId from salesmen ss where ss.id=s.agentid)

This gets me 2 tiers deep but i cannot get a third.

any help is appreciated. thanks in advance, Matthew

I know that it's often a pain (or close to impossible) to rearchitect a table like this, but if that's an option then you should check out Joe Celko's book on trees and hierarchies in SQL. He has some alternative table designs, like the nested set model, which can make a query like yours trivial. Here's a brief example that I was able to find from Google.

Barring a redesign, if you're in MS SQL Server 2005 or greater then you can use CTEs as Rob suggests. I don't know what (if any) recursive functions other RDBMS's may offer.

Table1 has a list of items. Table2 has a list of groups the items can be associated with. Table3 is a cross-reference between 1 and 2.

The groups in table 2 are set up in hierarchical fashion.

Key    ParentKey    Name
1      NULL         TopGroup1
2      NULL         TopGroup2
3      1            MiddleGroup1
4      2            MiddleGroup2
5      3            NextGroup1
6      4            NextGroup1
7      2            MiddleGroup3

I want to be able to select from Table1 filtered by Table3.
Select Items from Table1 Where Table3.ParentKey NOT '2' or any of it's descendants.

From another post here on stackoverflow I've been able to use CTE to identify the hierarchy.

WITH Parent AS
(
    SELECT
        table2.Key,
        cast(table2.Key as varchar(128))  AS Path
    FROM
        table2
    WHERE
        table2.ParentKey IS NULL

   UNION ALL

    SELECT
        TH.Key,
        CONVERT(varchar(128), Parent.Path + ',' + CONVERT(varchar(128),TH.Key)) AS Path
    FROM
        table2 TH
    INNER JOIN
        Parent
    ON
        Parent.Key = TH.ParentKey
)
SELECT * FROM Parent

I guess this is really a two part question.

  1. How do you filter the above? For example, return all groups where TopGroup1 isn't in the lineage.
  2. How would I apply that to filtering results in the cross-referenced table1.

There is a whole book on this topic, see: 'Joe Celko's Trees and Hierarchies in SQL for Smarties'

Personally, when I had to solve this problem, I used a temp table to unroll the hierarchy and then selected stuff from the temp table. Essentially you can build up another layer in the temp table in a single query, usually hierarchies are only 5-10 layers deep, so you can unroll it in 5 to 10 queries.

Thank u a lot for your answers beforehand. I need to make a such thing

I have a table friendship (id,user_id,friend_id,status,timestamp)

So lets say I am a user with user_id=43 and I am visiting a user with user_id=15

In the profile it should be a connection line of friendships

Let me describe ... lets say I have a friendship with user (user_id=3 and the user with user_id=3 is friend with user which profile I am visiting.

So on web site I will see

Connection

MyIcon->UserIcon(15)->UserIcon(3)->UserIcon(i am visiting)

And only in case when the friendship statuses for all are status=1...

Can anybody tell me how the query should look like?

Had you modeled this as a Nested Set modeled hierarchy instead of the Adjacency List model which you have then this query would be trivial. As it is, you're looking at having to use recursion, which isn't natural to a relational database.

For some great information on modeling hierarchies, check out Joe Celko's book.

Can SQL be used to find all the brands that has the most common categories?

For example, the brand "Dove" can have category of Soap, Skin Care, Shampoo It is to find all the brands that has the most matching categories, in other words, the most similar brands.

It can be done programmatically using Ruby or PHP: just take a brand, and loop through all the other brands, and see how many matching categories there are, and sort by it. But if there are 2000 brands, then there needs to be 2000 queries per brand. (unless we pre-cache all the 2000 query results, so for all 2000 brands, we re-use those results)

Can it be done by SQL / MySQL by 1 query?

Say, the table has:

entities
--------
id
type =  brand or category or product
name


entities_parent_child
--------------------
parent_id
child_id

the table above has an entry for each parent = brand and child = product, and also an entry for each parent = category and child = product, so brand has to relate to category by products.

I think the hard part for SQL is: find all the maximum matching counts, and sort by those numbers.

I agree with wuputah's comment. For this problem an "entities" table is not the answer. You've given yourself a hint the design is wrong when you say you cannot form a query to get the answers you want.

Create a proper hierarchy for your data, with separate tables for separate real word entities, yours will be:

[Brands] 
[Categories]
[Products]

If you need help with defining trees and hierarchies in SQL I suggest you pick up a copy of Celko's Trees and Hierarchies in SQL for Smarties.

SQL has no concept of polymorphism so don't try to design your database to fit your programming language. Databases work with sets, so think in sets.

To find similar brands join your tables and use grouping:

SELECT Brands.brand_name, COUNT(Categores.category_name) as category_count
FROM Brands INNER JOIN Categories
ON Brands.brand_name = Categories.brand_name
GROUP BY Brands.brand_name
ORDER BY Brands.brand_name, COUNT(Categores.category_name) -- add DESC if you want largest count at the top

That gives you the basic idea, if you can expand on the requirement:

...find all the maximum matching counts, and sort by those numbers

Then I can help redesign the query and, if necessary, the schema design.

I have a set of data that models a hierarchy of categories. A root category contains a set of top-level categories. Each top-level category contains a set of sub-categories.

Each sub category has a set of organizations. A given organization can appear in multiple sub categories.

The leaf nodes of this hierarchy are organizations. An organization can potentially appear in multiple sub-categories.

The data is stored in three SQL tables:

organizations
organization_id organization_name
1               Org A
2               Org B
3               Org C
4               Org D
5               Org E
6               Org F

categories
category_id parent_id category_name
0           NULL      Top Level Category
1           0         First Category
2           0         Second Category
3           1         Sub Category A
4           1         Sub Category B
5           1         Sub Category C
6           2         Sub Category D

organizations_categories -- Maps organizations to sub_categories
organization_id category_id
1               3
2               3
2               6
3               4
4               4
5               4
6               5
6               4
7               6
8               6

I would like to be able to select a list of all unique organizations under a given category or sub-category.

The way I'm doing it right now involves first figuring out which sub categories have been requested and then looping through each sub_category in code and performing a select to get all organizations mapped to that category. The results of each select are appended to an array. This array contains duplicates whenever an organization appears in multiple sub categories.

I would love to replace this kludge with a query that can efficiently select a list of distinct organizations given an id of one of the categories in the hierarchy.

I am devloping this solution using PHP and MySQL.

Thanks for your time and suggestions.

Assuming that your hierarchy is always exactly 3 levels deep:

SELECT DISTINCT
     O.organization_id,
     O.organization_name
FROM
     Categories CAT
INNER JOIN Categories SUB ON
     SUB.parent_id = CAT.category_id
INNER JOIN Category_Organizations CO ON
     CO.category_id = SUB.category_id
INNER JOIN Organizations O ON
     O.organization_id = CO.organization_id
WHERE
     CAT.category_id = @category_id

You can modify that by one level to allow you to pass a sub category id. If you don't know at the time whether or not you have a category id or a sub category id then you can do the following:

SELECT DISTINCT
     O.organization_id,
     O.organization_name
FROM
     Categories CAT
LEFT OUTER JOIN Categories SUB ON
     SUB.parent_id = CAT.category_id
INNER JOIN Category_Organizations CO ON
     CO.category_id IN (CAT.category_id, SUB.category_id)
INNER JOIN Organizations O ON
     O.organization_id = CO.organization_id
WHERE
     CAT.category_id = @category_id

If your hierarchy may have an unknown number of levels (or you think it might in the future) then check out Joe Celko's Trees and Hierarchies in SQL for Smarties for alternative ways to model a hierarchy. It's probably a good idea to do that anyway.

I have a web system which has a classical parent-children menu saved in a database, with fields id as the PK, and parent_id to pointing to the owning menu. (Yes, I know this doesn't scale very well, but that's another topic).

So for these records (id-parent_id pairs):

0-7 0-4 4-9 4-14 4-16 9-6

I have this tree:

0
├ 7
└ 4
  ├ 9
  | └ 6     
  ├ 14
  └ 16

I'm needing to hide a top node, so I have to make a list of all the childrens of that certain node, i.e. for 4, they will be (9, 6, 14, 16). Order doesn't matters.

I'm confused... does this fits into the classical tree problems? or is it a graph one?

How can I compose this structure and solve this problem using php?

Adjacent list models are very difficult to deal with. The company I am with now uses them for hierarchies and it causes great headaches. I have successfully used Celko's nested set models for prior employers and they work great for creating, maintaining and using hierarchies (trees).

I found this link which describes them: http://www.intelligententerprise.com/001020/celko.jhtml

But I would also recommend the book "SQL for Smarties: Advanced SQL Programming" written by Joe Celko and covers nested sets.

Joe Celko's SQL for Smarties: Advanced SQL Programming

Joe Celko's Trees and Hierarchies in SQL for Smarties

I have the following list:

enter image description here

Is a many-to-many recursive association where a category (Computers, Mac, PCs etc.) can have many other categories. Also a category (like optical drives) can belong to PCs and Mac. But the "optical drives" shouldn't allow to have a category as the "PCs" category because it will have an endless loop.

I am wondering how can I create the above validation in Rails and be optimized? Should I check all the parent categories? But this would result in many sql queries if there are many nested categories (for example 200 depth of children categories).

Another thought was to create a depth column in database and make a validation where a parent category shouldn't allow to have as a child, a category with a depth smaller than it's own depth. But that wouldn't allow the scenario of having a category with no common parents, into another depth. For example in the above list-example, optical drives could have as a category the Mac even if it is in an above depth-level. But it couldn't have the PCs because of the endless loop.

Since a node can have more than one parent, this is a graph, not a tree. While there are patterns to handle graphs in relational databases, I've found them cumbersome. This is where graph databases shine.

For sake of illustration, I'll show you a potential solution to your problem in Gremlin, a graph traversal language that runs in Neo4j through a plugin and is available through the Neo4j REST API.

The latest version of Gremlin included in Neo4j is Gremlin 1.5. You can download this version from Github to try out this code:

g = new Neo4jGraph('/tmp/so')

// add all vertices
categories = ['Computers', 'Mac', 'PCs', 'Hard Disks', 'Memory', 'Graphic Cards',
'Optical Drives', 'DVD-Reader', 'DVD-RW', 'Blue Ray', 'Blue Ray / DVD Combo']

categories.each { x -> g.addVertex(['name':x]) }

// show all of the vertices we just created with their properties
g.V.map
==>{name=Computers}
==>{name=Mac}
==>{name=PCs}
==>{name=Hard Disks}
==>{name=Memory}
==>{name=Graphic Cards}
==>{name=Optical Drives}
==>{name=DVD-Reader}
==>{name=DVD-RW}
==>{name=Blue Ray}
==>{name=Blue Ray / DVD Combo}

// for ease of this example, create a lookup table of these vertices
// in a production system, you would look up vertices in a Lucene index
i = [:]
g.V.transform {i[it.name] = it}

// create edges representing one category 'in' another
g.addEdge(i['Mac'], i['Computers'], 'in')
g.addEdge(i['PCs'], i['Computers'], 'in')

// PCs subgraph
g.addEdge(i['Hard Disks'], i['PCs'], 'in')
g.addEdge(i['Memory'], i['PCs'], 'in')
g.addEdge(i['Graphic Cards'], i['PCs'], 'in')

// optical drives subgraph
g.addEdge(i['Optical Drives'], i['PCs'], 'in')
g.addEdge(i['DVD-Reader'], i['Optical Drives'], 'in')
g.addEdge(i['DVD-RW'], i['Optical Drives'], 'in')
g.addEdge(i['Blue Ray'], i['Optical Drives'], 'in')
g.addEdge(i['Blue Ray / DVD Combo'], i['Optical Drives'], 'in')

// adding the optical drive subgraph to Mac is a one-liner
g.addEdge(i['Optical Drives'], i['Mac'], 'in')

// show the names of all vertices in the paths from Computers to child nodes three-levels down
i['Computers'].in.in.in.paths {it.name}
==>[Computers, PCs, Optical Drives, Blue Ray / DVD Combo]
==>[Computers, PCs, Optical Drives, Blue Ray]
==>[Computers, PCs, Optical Drives, DVD-RW]
==>[Computers, PCs, Optical Drives, DVD-Reader]
==>[Computers, Mac, Optical Drives, Blue Ray / DVD Combo]
==>[Computers, Mac, Optical Drives, Blue Ray]
==>[Computers, Mac, Optical Drives, DVD-RW]
==>[Computers, Mac, Optical Drives, DVD-Reader]

g.shutdown()

Gremlin can take some time to get used to, as it's a functional programming approach, but it's very powerful once you get the hang of it.

But the 'optical drives' shouldn't allow to have a category as the 'PCs' category because it will have an endless loop.

This sort of validation can be handled by pathfinding. If there exists a path from the current vertex to the child vertex, then don't allow the edge to be created. Neo4j includes a Pathfinder API to facilitate these sorts of searches.

Since you're in Rails, you might find Neo4j paired up with a RESTful wrapper like neography provides the integration between Rails and Neo4j that you'll need. Neo4j also offers the ability to create unmanaged extensions if you want to create custom RESTful endpoints.

I'm working with mySQL, and I'm in a situation where I need to select data from one table that matches an ID at any level in parent -> child data hierarchy in the other table.

Further more, I would like to resolve this with a well written SQL query, rather than a recursive function in my PHP code, as this feature will be used quite a bit.

I did try searching, and I have stumbled upon numerous similar problems (most of them being resolved), however none of them helped me.

To help illustrate the situation, here's my current setup

table "articles":

  • article_id
  • category_id
  • ...

table categories

  • category_id
  • parent_id
  • ...

I need to select all the articles from "articles" where "articles.category_id" is, let's say, 10. But also receive all the articles from all categories from the tree the "categories.category_id" 10 belongs to.

Meaning, where "10" is the parent and all of it's children, and upwards where 10 is the child and all of it's parents.

Possible without a recursive php function?

Thank you.

It is not possible to fetch an entire tree in one query using the Adjacency List design you're using, given that you're using MySQL.

Some other brands of database support SQL extensions to handle this kind of design. Oracle, Microsoft SQL Server, IBM DB2, and PostgreSQL 8.4 (currently in beta) support SQL extensions.

Other database designs exist that allow you to query trees more efficiently. This question has been addressed many times on StackOverflow, on blogs, and in articles.

You can also read "Trees and Hierarchies in SQL for Smarties" by Joe Celko, which goes into several such designs in depth.

I need to create MySQL tables that represent a tree structure like this:

Root
|- Chapter 1
|     |- Chapter 1.1
|     |    |- Article 1.1.1
|     |    |- Article 1.1.2
|     |- Article 1.2
|     |- Chapter 1.3
|          |- Chapter 1.3.1
|          |      |- Article 1.3.1.1
|          |      |- Article 1.3.1.2
|          |- Article 1.3.2
|          |- Article 1.3.3
|- Chapter 2
      |-Chapter 2.1
      |     |- ...
      |- Chapter 2.2
      |- ...

Simply speaking, there are two types of entity: Chapter and Article. Article is the smallest entity that there is no child under it, while Chapter can contains sub-chapter or articles as children entity. Each entity will have an ID and a Name.

The order of children has no set rule, it can be Chapter, then Article, then Chapter again.

Another challenge is, when a chapter is re-positioned from one chapter to another chapter, all the children should also be re-positioned accordingly. For example, when I move chapter 1.3.1 to under chapter 1.1 (so chapter 1.3.1 becomes chapter 1.1.3), then the article 1.3.1.1 and article 1.3.1.2 should also be moved and become article 1.1.3.1 and article 1.1.3.2. And at the same time Article 1.3.2 and Article 1.3.3 will become Article 1.3.1 and Article 1.3.2 respectively.

So what I'm asking is, how to design the database table so as to present these relationships? And how the SQL will looks like for adding new element / deleting element and re-positioning the elements? (I can use Ajax to handle the re-positioning interaction, and use PHP to generate those hierarchy numbering)

Also, as the tree is generally quite long, I wish to avoid updating of all elements just because only one element is repositioned. (Not sure if this wish is technically possible.)

The best information I have found on representing tree structures in a database is in Joe Celko's Trees and Hierarchies in SQL for Smarties.

You can probably find enough information on the web but I'd recommend getting the book, I found it very useful when implementing a nested set hierarchy.

You can either use an adjacency list or nested sets to model the tree in the database, I'm assuming you are using an adjacency list (where each entry has a parent attribute)

If you want to be able to move entire sub trees from one parent to anther then it should be as easy as changing the parent_id (or whatever PK you are using) to reference the new parent. A nested set model requires changes to all the nodes when a sub tree is moved.

However, other operations are easier on a nested set, such as selecting all the child nodes under a specific parent. This can be more difficult with the adjacency list model but it has become easier with the advent of recursive CTE.

If you are not worried about the order of the content I would avoid storing the chapter numbers with your data. Apply them when you select the data and then you avoid having to update every node when your tree changes.

Is there a helpful tool or script resource to aid conversion from old-school adjacency list tables to MPTT?

I would have thought it was a problem faced by a few cleverer souls than I in the past, and thought I'd check here first in case they came up with a clever solution in their travels - before embarking on my own journey to do such a thing.

This link includes code to make the conversion. Look at the bottom for the SQL version of the code (there's also a PHP script to do it).

I know that I've seen code from Joe Celko on how to do it, but I don't recall if that was on the intertubes or in his book on trees and hierarchies in SQL.

I've got model called Post:

class Post(models.Model):
    poster = models.ForeignKey(User)
    content = models.TextField(verbose_name='Text', max_length=1000)
    reply_to = models.ForeignKey('self', null=True, blank=True, default=None)   

This allows to add 'first post' (with blank reply_to), and reply to post and even 'reply to reply'

For example I've got in my database something like that:

First Post
    Reply one
        Reply to reply one
    Reply two
        Reply to reply two

How to load that tree of replies?

When I use:

r = Post.objects.filter(reply_to=FirstPost)

It returns of course:

Reply one
Reply two

Is it possible to load all related posts at once? I need it mainly to count all replies to first post.

No, I don't think there is a way to load all replies at once.

But, you can add extra metadata to the post type to be able to run a in-order-style query, where counting the number of replies becomes a simple calculation with data already loaded for the parents node.

See this article on how you could do that (it uses the MySQL SQL dialect, and PHP, but the principles still apply).

Basically, you add left and right fields to the nodes in your tree that define an ordering, letting you easily count the number of items below a given root element in the tree. It's like a Binary Tree in a database table. The principle is taken from this excellent database design book: "Joe Celko's Trees and Hierarchies in SQL for Smarties".

See the image below,say if i i/p z,i need to get o/p as x.

This is used to find parents and grand parents.

Can i do this using Mysql.

Any help in this would be appreciated.

Can this be done using facebook fql

sample

Hierarchies are complicated in MySQL. I don't think you can store a bunch of custom stuff in fql so you probably can't do it that way unless it's a built in feature.

Here's an article about MySQL hierarchies:
http://www.phpro.org/tutorials/Managing-Hierarchical-Data-with-PHP-and-MySQL.html

See also this question with good answers:
Hierarchical Data in MySQL

There are even books written on the subject:
http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-Systems/dp/1558609202

Complicated, but doable.

I would like to know how to insert,update and deleting using a tree view from database and i want to have one table but be able to have parents and nodes on the tree view retrieving from database.

if you have any links that might help me give me guys Thanks .

if you have any links that might help me

No problem. Joe Celko's Trees and Hierarchies in SQL for Smarties has got you covered. All the tree inserts, updates and deletes you can handle.

Also:
Populate TreeView from DataBase