[Django]-MongoDB storage along with MySQL XPath features

0👍

So if I understand the question right you want to

  1. find a given node in a tree, given
    some path through a portion of the
    tree to that node plus extra query
    expressions.
  2. then get back that node and
    everything below it.

With a materialized paths approach you can do the above. The main thing that needs tweaking is if there is a path “a..b..c..d..e” to a document and you want to find documents with a path “..b..c..d..”, how to make that fast. If we are starting from the very top it is easy. However here we aren’t. It may make sense to use a combination approach where one has the materialized path in the document for a node plus an array of the node’s ancestors, something like:

{ path : ",a,b,c,d,e,", 
  ancestor : ['a','b','c','d','e']
}

We could index on ancestors which will create a multikey index. Then we would do a query like the following to find nodes on path “…b,c,d…” with some efficiency:

find( { path : /,b,c,d,/, ancestor : 'd', <more_query_expressions_optionally> } )

In the above the index on ancestor would be used and only docs from ‘d’ down need be inspected. The following could be tried which might be even better depending on how smart the query optimizer is:

find( { path : /,b,c,d,/, ancestor : { $all : ['a','d'] }, ... } )
👤dm.

0👍

This is a very broad question, but some things I would consider are: XML datastores like MarkLogic and eXist – they are very good at optimizing queries on tree-structured data.

You might also consider rolling your own with a basic search index like MySQL or possibly Lucene/Solr if you want better full-text search capabilities (phrases, synonyms near queries, etc).

I would do something like index the text content of every named element and attribute (this is more or less the approach taken by the XML datastores I referred to above), use that to retrieve a candidate list of documents, and then evaluate the XPath expressions on the candidates to weed out the false positives. This is not a small project, though.

Leave a comment