RetroZilla/extensions/transformiix/docs/optimized-xpath.html
2015-10-20 23:03:22 -04:00

524 lines
24 KiB
HTML

<html>
<head>
<title>Optimizing XPath</title>
<style>
.comment {
font-style: italic;
}
h4 {
margin: 0;
padding: 0;
}
</style>
</head>
<body>
<h1>Optimizing XPath</h1>
<h2>Overview</h2>
<p>
This document outlines optimizations that we can perform to execute
xpath-expressions faster.
</p>
<h2>Stage 1, <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=94471">DONE</a></h2>
<h3>Summary</h3>
<p>
Speed up retrieval of orderInfo objects by storing them in resp.
node instead of in a hash.
</p>
<h3>Details</h3>
<p>
We currently spend a GREAT deal of time looking through a
DOMHelper::orders hash looking for the orderInfo object for a
specific node. If we moved the ownership and retrieval of these
orderInfo objects to the Node class instead we will probably save
a lot of time. I.E. instead of calling
<code>myDOMHelper->getDocumentOrder(node)</code> you call
<code>node->getDocumentOrder()</code> which then returns the
orderInfo object.
</p>
<p>
It would also be nice if we at the same time fixed some bugs wrt the
orderInfo objects and the function that sorts nodes using them.
</p>
<p>
Bugs filed at this are 88964 and 94471
</p>
<h2>Stage 2, <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=85893">DONE</a></h2>
<h3>Summary</h3>
<p>
Speed up document-order sorting by having the XPath engine always
return document-ordered nodesets.
</p>
<h3>Details</h3>
<p>
Currently the nodesets returned from the XPath engine are totally
unordered (or rather, have undefined order) which forces the XSLT
code to sort the nodesets. This is quite expensive since it requires
us to generate orderInfo objects for every node. Considering that
many XPath classes actually returns nodesets that are already
ordered in document order (or reversed document order) this seems a
bit unnecessary.
</p>
<p>
However we still need to handle the classes that don't by default
return document-ordered nodesets. A good example of this is the id()
function. For example "id('foo bar')" produces two nodes which the
id-function has no idea how they relate in terms of document order.
Another example is "foo | bar", where the UnionExpr object gets two
nodesets (ordered in document order since all XPath classes should
now return ordered nodesets) and need to merge them into a single
ordered nodeset.
</p>
<h2>Stage 3, <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=205703">DONE</a></h2>
<h3>Summary</h3>
<p>
Refcount <code>ExprResult</code>s to reduce the number of objects
created during evaluation.
</p>
<h3>Details</h3>
<p>
Right now every subexpression creates a new object during evaluation.
If we refcounted objects we would be often be able to reuse the same
objects across multiple evaluations. We should also keep global
result-objects for true and false, that way expressions that return
bool-values would never have to create any objects.
</p>
<p>
This does however require that the returned objects arn't modified
since they might be used elsewhere. This is not a big problem in the
current code where we pretty much only modify nodesets in a couple
of places.
</p>
<p>
To be able to reuse objects across subexpressions we chould have an
<code>ExprResult::ensureModifyable</code>-function. This would
return the same object if the refcount is 1, and create a new object
to return otherwise. This is especially usefull for nodesets which
would be mostly used by a single object at a time. But it could be
just as usefull for other types, though then we might need a
<code>ExprResult::ensureModifyableOfType(ExprResult::ResultType)</code>-function
that only returned itself if it has a refcount of 1 and is of the
requsted type.
</p>
<h2>Stage 4</h2>
<h3>Summary</h3>
<p>
Speed up evaluation of XPath expressions by using specialized
classes for common optimizable expressions.
</p>
<h3>Details</h3>
<p>
Some common expressions are possible to execute faster if we have
classes that are specialized for them. For example the expression
"@foo" can be evaluated by simply calling |context->getAttributeNode
("foo")|, instead we now walk all attributes of the context node and
filter each node using a AttributeExpr. Below is a list of
expressions that I can think of that are optimizable, but there are
probably more.
</p>
<p>
One thing that we IMHO should keep in mind is to only put effort on
optimising expressions that are actually used in realworld
stylesheets. For example "foo | foo", "foo | bar[0]" and
"foo[position()]" can all be optimised to "foo", but since noone
should be so stupid as to write such an expression we shouldn't
spend time or codesize on that. Of course we should return the
correct result according to spec for those expressions, we just
shouldn't bother with evaluating them fast.
</p>
<p>
Apart from finding expression that we can evaluate more cleverly
there is also the problem of how and where do we create these
optimised objects instead of the unoptimised, general ones we create
now. And what are these optimised classes, should they be normal
Expr classes or should they be something else? We could also add
"optional" methods to Expr which have default implementations in
Expr, for example a ::isContextSensitive() which returns MB_TRUE
unless overridden. However we probably can't answer all this until
we know which expressions we want to optimised and how we want to
optimise them.
</p>
<p>
These expressions can be optimised:
</p>
<p><h4><a name="usecase01" href="#usecase01">Use case 1</a></h4>
<h4>Class:</h4>
Steps along the attribute axis which doesn't contain wildcards
<h4>Example:</h4>
@foo
<h4>What we do today:</h4>
Walk through the attributes NamedNodeMap and filter each node using a
NameTest.
<h4>What we could do:</h4>
Call getAttributeNode (or actually getAttributeNodeNS) on the
contextnode and return a nodeset containing just the returned node, or
an empty nodeset if NULL is returned.
</p>
<p><h4><a name="usecase02" href="#usecase02">Use case 2</a></h4>
<h4>Class:</h4>
Union expressions where each expression consists of a LocationStep and
all LocationSteps have the same axis. None of the LocationSteps have any
predicates (well, this could be relaxed a bit)
<h4>Example:</h4>
foo | bar | baz
<h4>What we do today:</h4>
Evaluate each LocationStep separately and thus walk the same path through
the document each time. During the walking the NodeTest is applied to
filter out the correct nodes. The resulting nodesets are then merged and
thus we generate orderInfo objects for most nodes.
<h4>What we could do:</h4>
Have just one LocationStep object which contains a NodeTest that is a
"UnionNodeTest" which contains a list of NodeTests. The UnionNodeTest
then tests each NodeTest until it finds one that returns true. If none
do then false is returned.
This results in just one walk along the axis and no need to generate any
orderInfo objects.
</p>
<p><h4><a name="usecase03" href="#usecase03">Use case 3</a></h4>
<h4>Class:</h4>
Steps where the predicates isn't context-node-list sensitive.
<h4>Example:</h4>
foo[@bar]
<h4>What we do today:</h4>
Build a nodeset of all nodes that match 'foo' and then filter the
nodeset through the predicate and thus do some node shuffling.
<h4>What we could do:</h4>
Create a "PredicatedNodeTest" that contains a NodeTest and a list of
predicates. The PredicatedNodeTest returns true if both the NodeTest
returns true and all predicats evaluate to true. Then let the
LocationStep have that PredicateNodeTest as NodeTest and no predicates.
This will save us the predicate filtering and thus some node shuffling.
(Note how this combines nicely with the previous optimisation...)
(Actually this can be done even if some predicates are context-list
sensitive, but only up until the first that isn't.)
</p>
<p><h4><a name="usecase04" href="#usecase04">Use case 4</a></h4>
<h4>Class:</h4>
PathExprs that only contains steps that from the child:: and attribute::
axes.
<h4>Example:</h4>
foo/bar/baz
<h4>What we do today:</h4>
For each step we evaluate the step once for every node in a nodeset
(for example for the second step the nodeset is the list of all "foo"
children) and then merge the resulting nodesets while making sure that
we keep the nodes in document order (and thus generate orderInfo
objects).
<h4>What we could do:</h4>
The same thing except that we don't merge the resulting nodeset, but
rather just concatenate them. We always know that the resulting nodesets
are after each other in node order.
</p>
<p><h4><a name="usecase05" href="#usecase05">Use case 5</a></h4>
<h4>Class:</h4>
List of predicates where some predicate are not context-list sensitive
<h4>Example:</h4>
foo[position() > 3][@bar][.//baz][position() > size() div 2][.//@fud]
<h4>What we do today:</h4>
Apply each predicate separately requiring us to shuffle nodes five times
in the above example.
<h4>What we could do:</h4>
Merge all predicates that are not node context-list sensitive into the
previous predicate. The above predicate list could be merged into the
following predicate list
foo[(position() > 3) and (@bar) and (.//baz)][(position() > size() div 2) and (.//@fud)]
Which only requires two node-shuffles
</p>
<p><h4><a name="usecase06" href="#usecase06">Use case 6</a></h4>
<h4>Class:</h4>
Predicates that are only context-list-position sensitive and not
context-list-size sensitive
<h4>Example:</h4>
foo[position() > 5][position() mod 2]
<h4>What we do today:</h4>
Build the entire list of nodes that matches "foo" and then apply the
predicates
<h4>What we could do:</h4>
Apply the predicates during the initial build of the first nodeset. We
would have to keep track of how many nodes has passed each and somehow
override the code that calculates the context-list-position.
</p>
<p><h4><a name="usecase07" href="#usecase07">Use case 7</a></h4>
<h4>Class:</h4>
Predicates that are constants
<h4>Example:</h4>
foo[5]
<h4>What we do today:</h4>
Perform the appropriate walk and build the entire nodeset. Then apply
the predicate.
<h4>What we could do:</h4>
There are three types of constant results; 1) Numerical values 2)
Results with a true boolean-value 3) Results with a false boolean value.
In the case of 1) we should only step up until the n:th node (5 in above
example) and then stop. For 2) we should completely ignore the predicate
and for 3) we should return an empty nodeset without doing any walking.
In some cases we can't at parsetime decide if a constant expression will
return a numerical or not, for example for "foo[$pos]", so the decision
of 1) 2) or 3) would have to be made at evaltime. However we should be
able to decide if it's a constant or not at parsetime.
Note that while evaluating a LocationStep [//foo] can be considered
constant.
</p>
<p><h4><a name="usecase08" href="#usecase08">Use case 8</a></h4>
<h4>Class:</h4>
PathExprs that contains '//' followed by an unpredicated child-step.
<h4>Example:</h4>
.//bar
<h4>What we do today:</h4>
We walk the entire subtree below the contextnode and at every node we
evaluate the 'bar'-expression which walks all the children of the
contextnode. This means that we'll walk the entire subtree twice.
<h4>What we could do:</h4>
Change the expression into "./descendant::bar". This means that we'll
only walk the tree once. This can only be done if there are no
predicates since the context-node-list will be different for
predicates in the new expression.
Note that this combines nicely with the "Steps where the predicates
isn't context-node-list sensitive" optimization.
</p>
<p><h4><a name="usecase09" href="#usecase09">Use case 9</a></h4>
<h4>Class:</h4>
PathExprs where the first step is '.'
<h4>Example:</h4>
./*
<h4>What we do today:</h4>
Evaluate the step "." which always returns the same node and then
evaluate the rest of the PathExpr.
<h4>What we could do:</h4>
Remove the '.'-step and simply evaluate the other steps. In the example
we could even remove the entire PathExpr-object and replace it with a
single Step-object.
</p>
<p><h4><a name="usecase10" href="#usecase10">Use case 10</a></h4>
<h4>Class:</h4>
Steps along the attribute axis which doesn't contain wildcards and
we only care about the boolean value.
<h4>Example:</h4>
foo[@bar], @foo or @bar
<h4>What we do today:</h4>
Evaluate the step and create a nodeset. Then get the bool-value of
the nodeset by checking if the nodeset contain any nodes.
<h4>What we could do:</h4>
Simply check if the current element has an attribute of the
requested name and return a bool-result.
</p>
<p><h4><a name="usecase11" href="#usecase11">Use case 11</a></h4>
<h4>Class:</h4>
Unpredicated steps where we only care about the boolean value.
<h4>Example:</h4>
foo[<u>processing-instruction()</u>]
<h4>What we do today:</h4>
Evaluate the step and create a nodeset. Then get the bool-value of
the nodeset by checking if the nodeset contain any nodes.
<h4>What we could do:</h4>
Walk along the axis until we find a node that matches the nodetest.
If one is found we can stop the walking and return a true
bool-result immediatly, otherwise a false bool-result is returned.
It might not be worth implementing all axes unless we can reuse
code from the normal Step-code. This could also be applied to
<code>PathExpr</code>s by getting the boolvalue of the last step.
</p>
<p><h4><a name="usecase12" href="#usecase12">Use case 12</a></h4>
<h4>Class:</h4>
Unpredicated steps where we only care about the string-value.
<h4>Example:</h4>
starts-with(<u>processing-instruction()</u>, 'hello')
<h4>What we do today:</h4>
Evaluate the step and create a nodeset. Then get the string-value of
the nodeset by getting the stringvalue of the first node.
<h4>What we could do:</h4>
Walk along the axis until we find a node that matches the nodetest.
If one is found we can stop the walking and return a string-result
containing the value of that node. Otherwise an empty string-result
can be returned.<br>
This can also be done when we only care about the number-value.<br>
This could be combined with the "Unpredicated steps where we only
care about the boolean value" optimization by instead of returning
a bool-value or string-value return a nodeset containing just the
found node. If that is done this optimization could be applied to
<code>PathExpr</code>s.
</p>
<p><h4><a name="usecase13" href="#usecase13">Use case 13</a></h4>
<h4>Class:</h4>
Expressions where the value of an attribute is compared to
a literal.
<h4>Example:</h4>
@bar = 'value'
<h4>What we do today:</h4>
Evaluate the attribute-step and then compare the resulting nodeset
to the value.
<h4>What we could do:</h4>
Get the attribute-value for the element and compare that directly
to the value. In the above example we would just call
<code>getAttr('bar', kNameSpaceID_None)</code> and compare the
resulting string with 'value'.
</p>
<p><h4><a name="usecase14" href="#usecase14">Use case 14</a></h4>
<h4>Class:</h4>
PathExprs where the last step has a predicate that is not
context-nodeset dependent and that contains a part that is not
context-node dependent.
<h4>Example:</h4>
foo/*[@bar = current()/@bar]
<h4>What we do today:</h4>
<h4>What we could do:</h4>
First evaluate "foo/*" and "current()/@bar". Then replace
"current()/@bar" with a literal (and possibly optimize) and filter
all nodes in the nodeset from "foo/*".
</p>
<p><h4><a name="usecase15" href="#usecase15">Use case 15</a></h4>
<h4>Class:</h4>
local-name() or namespace-uri() compared to a literal
<h4>Example:</h4>
local-name() = 'foo'
<h4>What we do today:</h4>
evaluate the local-name function and compare the string-result to
the string-result of the literal.
<h4>What we could do:</h4>
Atomize the literal (or get the namespaceID in case of
namespace-uri()) and then compare that to the atom-name of the
contextnode. This is primarily usefull when combined with the
previous class.
</p>
<p><h4><a name="usecase16" href="#usecase16">Use case 16</a></h4>
<h4>Class:</h4>
Comparisons where one side is a nodeset and the other is not a
bool-value.
<h4>Example:</h4>
//myElem = @baz
<h4>What we do today:</h4>
Evaluate both sides and then compare them according to the spec.
<h4>What we could do:</h4>
First of all we should start by evaluating the nodeset-side, if the
result is an empty nodeset false can be returned immediatly.
Otherwise we evaluate as normal. When both sides are nodesets we
should examine them and try to figure out which is faster to
evaluate. That expression should be evaluated first (probably
by making it the left-hand-side expression).
</p>
<p><h4><a name="usecase17" href="#usecase17">Use case 17</a></h4>
<h4>Class:</h4>
Comparisons where one side is a PathExpr and the other is a
bool-value.
<h4>Example:</h4>
baz = ($foo > $bar)
<h4>What we do today:</h4>
Evaluate both sides and then compare them.
<h4>What we could do:</h4>
Apply the "Steps where we only care about the boolean
value"-optimization on the PathExpr-side and then evaluate as usual.
</p>
<p><h4><a name="usecase18" href="#usecase18">Use case 18</a></h4>
<h4>Class:</h4>
Subexpressions that will be evaluated more then once where the only
change is in context it doesn't depend on
<h4>Example:</h4>
foo[@bar = <u>sum($var/@bar)</u>]
<h4>What we do today:</h4>
Reevaluate the subexpression every time we need it and every time
get the same result.
<h4>What we could do:</h4>
We should save the result from the first evaluation and just bring
it back the following time we need it. This can be done by
inserting an extra expression between the subexpression and its
parent, this expression would then first go look in a cache
available through the <code>nsIEvalContext</code>, if the value
isn't available there the original expression is evaluated and its
result is saved in the cache. The cache can be keyed on an integer
which is stored in the inserted 'cache-expression'.<br>
The cache itself could be created by another expression that is
inserted at the top of the expression. This way that expression
works as a boundry-point for the cache and can in theory be
inserted anywhere in an expression if needed.
</p>
<h2>Stage 5</h2>
<h3>Summary</h3>
<p>
Detect when we can concatenate nodesets instead of merge them in
PathExpr.
</p>
<h3>Details</h3>
<p>
Why can we for expressions like "foo/bar/baz" concatenate the resulting
nodesets without having to check nodeorder? Because at every step two
statements are true:
<ol>
<li>We iterate a nodeset where no node is an ancestor of another</li>
<li>The LocationStep only returns nodes that are members of the subtree
below the context-node</li>
</ol>
</p>
<p>
For example; While evaluating the second step in "foo/bar/baz" we
iterate a nodelist containing all "foo" children of the original
contextnode, i.e. none can be an ancestor of another. And the
LocationStep "bar" only returns children of the contextnode.
</p>
<p>
So, it would be nice if we can detect when this occurs as often as
possible. For example the expression "id(foo)/bar/baz" fulfils those
requirements if the nodeset returned from contains doesn't contain any
ancestors of other nodes in the nodeset, which probably often is the
case in real-world stylesheets.
</p>
<p>
We should perform this check on every step to be able to take advantage
of it as often as possible. For example the in expression
"id(@boss)/ancestor::team/members" we can't use this optimisation at the
second step since the ancestor axis returns nodes that are not members
of the contextnodes subtree. However we will probably be able to use the
optimisation at the third step since if iterated nodeset contains only
one node (and thus can't contain ancestors of it's members).
</p>
</body>
</html>