9fans archive / 2008 / 10 / 355 /    prev next

From: "Russ Cox" <rsc@swt...>
Subject: Re: [9fans] non greedy regular expressions
Date: Fri, 24 Oct 2008 14:10:36 -0700

> I thought greedy=leftmost-longest, while non-greedy=leftmost-first:

Greedy leftmost-first is different from leftmost-longest.
Search for /a*(ab)?/ in "ab".  The leftmost-longest match
is "ab", but the leftmost-first match (because of the
greedy star) is "a".  In the leftmost-first case, the greediness
of the star caused an overall short match.

> All the thinking about this is simply removed with 'non-greedy' ops.

But it isn't (or shouldn't be).
Using /\(.*\)/ to match small parenthesized expressions
is fragile: /\(.*\)/ in "(a(b))" matches "(a(b)".
In contrast, the solution you rejected /\([^)]*\)/ is more robust.

It doesn't make sense to shoehorn non-greedy and greedy
operators into an engine that provides leftmost-longest matching.
If you want a different model, you need to use a different program.
Perl has been ported.

Russ