A Little Love for /.*?/, Please

December 5th, 2005

I’ve recently been working in someone’s code that uses Regex quite a bit, and wanted to make a simple observation: I contend that in 99% of the regex usage I’ve seen, the pattern /.*/ is used where /.*?/ is intended.

When Regex processes .*, it shifts in the entire input then backtracks until it finds the pattern that follows your .*. Most of the time, though, the coder clearly is thinking in left-to-right terms. An example:


string input = "<blah>here's some stuff</blah>other>otherstuff</other>";
foreach(string pattern in new string[] {">.*<", ">.*?<"})
{
    Match m = Regex.Match(input, pattern);
    Console.WriteLine(m.Value);
}

The first pattern is the greedy one. It will match “<here’s some stuff>/blah><other>otherstuff<”. The second is the non-greedy version, and it matches “>here’s some stuff<”, which is probably the desired match. But even if you end up matching the right stuff most of the time, the non-greedy form should be more performant when you’re dealing with longer input strings.

5 Responses to “A Little Love for /.*?/, Please”

  1. Doug Finke Says:

    Ken

    Does anchoring a pattern “turn off” the greediness?

    “^>.*$”

  2. Ken Overton Says:

    As I understand it, adding ‘$’ would enforce greediness rather than turn it off. What I can’t figure out is what ‘.*?$’ would mean. I think it would force the non-greedy expression to be interpretted in a greedy fashion.

  3. Damien Morton Says:

    Ive always wondered why greedy was the default and non-greedy was an option.

  4. Daniel Chait Says:

    Because this is America!

  5. Damien Morton Says:

    Dont get me started!