A Little Love for /.*?/, Please
I’ve recently been working in someone’s code that uses Regex quite a bit, and wanted to make a simple observation: I contend that in 99% of the regex usage I’ve seen, the pattern /.*/ is used where /.*?/ is intended.
When Regex processes .*, it shifts in the entire input then backtracks until it finds the pattern that follows your .*. Most of the time, though, the coder clearly is thinking in left-to-right terms. An example:
string input = "<blah>here's some stuff</blah>other>otherstuff</other>";
foreach(string pattern in new string[] {">.*<", ">.*?<"})
{
Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Value);
}
The first pattern is the greedy one. It will match “<here’s some stuff>/blah><other>otherstuff<”. The second is the non-greedy version, and it matches “>here’s some stuff<”, which is probably the desired match. But even if you end up matching the right stuff most of the time, the non-greedy form should be more performant when you’re dealing with longer input strings.



December 5th, 2005 at 4:00 pm
Ken
Does anchoring a pattern “turn off” the greediness?
“^>.*$”
December 5th, 2005 at 4:09 pm
As I understand it, adding ‘$’ would enforce greediness rather than turn it off. What I can’t figure out is what ‘.*?$’ would mean. I think it would force the non-greedy expression to be interpretted in a greedy fashion.
December 6th, 2005 at 1:00 am
Ive always wondered why greedy was the default and non-greedy was an option.
December 6th, 2005 at 12:23 pm
Because this is America!
December 7th, 2005 at 3:24 pm
Dont get me started!