Module SuperlinearBackTracking
This module implements the analysis described in the paper: Valentin Wustholz, Oswaldo Olivo, Marijn J. H. Heule, and Isil Dillig: Static Detection of DoS Vulnerabilities in Programs that use Regular Expressions (Extended Version). (https://arxiv.org/pdf/1701.04045.pdf)
Theorem 3 from the paper describes the basic idea.
The following explains the idea using variables and predicate names that are used in the implementation:
We consider a pair of repetitions, which we will call pivot and pumpEnd.
We create a product automaton of 3-tuples of states (see StateTuple).
There exists a transition (a,b,c) -> (d,e,f) in the product automaton
iff there exists three transitions in the NFA a->d, b->e, c->f where those three
transitions all match a shared character char. (see getAThreewayIntersect)
We start a search in the product automaton at (pivot, pivot, pumpEnd),
and search for a series of transitions (a Trace), such that we end
at (pivot, pumpEnd, pumpEnd) (see isReachableFromStartTuple).
For example, consider the regular expression /^\d*5\w*$/.
The search will start at the tuple (\d*, \d*, \w*) and search
for a path to (\d*, \w*, \w*).
This path exists, and consists of a single transition in the product automaton,
where the three corresponding NFA edges all match the character "5".
The start-state in the NFA has an any-transition to itself, this allows us to
flag regular expressions such as /a*$/ - which does not have a start anchor -
and can thus start matching anywhere.
The implementation is not perfect.
It has the same suffix detection issue as the js/redos query, which can cause false positives.
It also doesn’t find all transitions in the product automaton, which can cause false negatives.
Import path
import codeql.regex.nfa.SuperlinearBackTrackingModules
| Make | A parameterized module implementing the analysis described in the above papers. |