Need a regexp superhero
User:
mcholste
Date: 12/18/2012 1:19 pm
Date: 12/18/2012 1:19 pm
Views: 782
Rating: 0
Rating: 0
Ok, this is getting way harder than it should be, trying to parse something like this:
"funcOne(funcTwo | funcThree(a,b)),c,d)"/([^\(]+)\(?( [^()]*+ | (?0) )\)?$/x
Thanks,
Martin
Re: Need a regexp superhero
User:
tmurray
Date: 12/18/2012 2:37 pm
Date: 12/18/2012 2:37 pm
Views: 0
Rating: 0
Rating: 0
Perl's regex engine might be technically capable of this, but depending on
how generalized a solution you need, it might be easier to use Marpa or
Parse::RecDescent.
>
> mcholste wrote:
Ok, this is getting way harder than it should be,
> trying to parse something like this:
>
> "funcOne(funcTwo | funcThree(a,b)),c,d)"
>
> parsed into:
> "funcOne", "funcTwo | funcThree(a,b))", "c", "d"
>
> I've got a hack working where I use one regex:
>
> /([^\(]+)\(?( [^()]*+ | (?0) )\)?$/x
>
> to capture the first function name and detect the inner nested parens,
> then I use that to create a "mask" to replace those strings within the
> larger string, (to remove the commas), then do a normal split to find the
> last two params ("c", "d"). There's got to be a better way,
> suggestions?
>
> Thanks,
>
> Martin
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
how generalized a solution you need, it might be easier to use Marpa or
Parse::RecDescent.
>
> mcholste wrote:
Ok, this is getting way harder than it should be,
> trying to parse something like this:
>
> "funcOne(funcTwo | funcThree(a,b)),c,d)"
>
> parsed into:
> "funcOne", "funcTwo | funcThree(a,b))", "c", "d"
>
> I've got a hack working where I use one regex:
>
> /([^\(]+)\(?( [^()]*+ | (?0) )\)?$/x
>
> to capture the first function name and detect the inner nested parens,
> then I use that to create a "mask" to replace those strings within the
> larger string, (to remove the commas), then do a normal split to find the
> last two params ("c", "d"). There's got to be a better way,
> suggestions?
>
> Thanks,
>
> Martin
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
Re: Need a regexp superhero
User:
miner
Date: 12/18/2012 4:25 pm
Date: 12/18/2012 4:25 pm
Views: 0
Rating: 0
Rating: 0
On 12/18/12 2:37 PM, tmurray@wumpus-cave.net wrote:
Agreed, not something that will easily be solved by a RegEx. Better solved with a parser.
Martin, do you have a typo in your string to be parsed?
Has unbalanced parens, unless my mind is playing tricks on me.
jon
tmurray wrote:
Perl's regex engine might be technically capable of this, but depending on
how generalized a solution you need, it might be easier to use Marpa or
Parse::RecDescent.
Agreed, not something that will easily be solved by a RegEx. Better solved with a parser.
Martin, do you have a typo in your string to be parsed?
"funcOne(funcTwo | funcThree(a,b)),c,d)"
Has unbalanced parens, unless my mind is playing tricks on me.
jon
-- .Jonathan J. Miner----------------------------------------------------. | jon@jjminer.org | photos - http://photos.jjminer.org/ | | | R.A.W. #1629 - http://www.reggaeambassadors.org | | | LOCS Webmaster - http://www.locs-buffett.org | | jabber/gchat: camrycurbhopper@gmail.com AIM: camrycurbhopper | `---------------------------------------------------------------------' "We don't have a town drunk... We all take turns!" -- James Slater, "Key West Address"
Re: Need a regexp superhero
User:
afbach
Date: 12/18/2012 3:55 pm
Date: 12/18/2012 3:55 pm
Views: 133
Rating: 0
Rating: 0
On Tue, Dec 18, 2012 at 1:19 PM, <mcholste@gmail.com> wrote:
trying to parse something like this:"funcOne(funcTwo | funcThree(a,b)),c,d)"
How consistent is your spacing/formatting there? What sort of variance (number of params etc.) are you working with.
I can't get your RE to work - did it get munged (I added some more space, but "?0" isn't something I could figure out):
/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Re: Need a regexp superhero
User:
mcholste
Date: 12/18/2012 4:23 pm
Date: 12/18/2012 4:23 pm
Views: 0
Rating: 0
Rating: 0
It's parsing user input, so the spacing and number of params are completely variable. From what I can tell (didn't dig into any man pages), "?0" will refer to the other paren in the paren pairs when matching, but that's more of a guess. That particular RE will get ou the first function name followed by everything after, (including the trailing paren).
perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b)),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b)),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
On Tue, Dec 18, 2012 at 3:55 PM, <afbach@gmail.com> wrote:
afbach wrote:
On Tue, Dec 18, 2012 at 1:19 PM, <mcholste@gmail.com> wrote:trying to parse something like this:"funcOne(funcTwo | funcThree(a,b)),c,d)"How consistent is your spacing/formatting there? What sort of variance (number of params etc.) are you working with.
I can't get your RE to work - did it get munged (I added some more space, but "?0" isn't something I could figure out):
/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Re: Need a regexp superhero
User:
tmurray
Date: 12/18/2012 4:49 pm
Date: 12/18/2012 4:49 pm
Views: 0
Rating: 0
Rating: 0
If it's from user input, I'd definitely look into a proper parser. I'd go
for Marpa, though Parse::RecDescent is also a popular choice. See my
parsing talk from last month:
https://github.com/frezik/parsing-talk
>
> mcholste wrote:
It's parsing user input, so the spacing and number
> of params are completely variable. From what I can tell (didn't dig
> into any man pages), "?0" will refer to the other paren in the paren pairs
> when matching, but that's more of a guess. That particular RE will
> get ou the first function name followed by everything after, (including
> the trailing paren).
>
> perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)?
> $ /x; $str = "funcOne(funcTwo | funcThree(a,b)),c,d)"; if (@m = $str =~
> $re){ print "y: " . join("#", @m); }'
>
>
> On Tue, Dec 18, 2012 at 3:55 PM, wrote:
>
> afbach wrote: On Tue, Dec 18, 2012 at 1:19 PM, wrote:
> trying to parse something like this:
>
> "funcOne(funcTwo | funcThree(a,b)),c,d)"
> How consistent is your spacing/formatting there? What sort of variance
> (number of params etc.) are you working with.
> I can't get your RE to work - did it get munged (I added some more
> space, but "?0" isn't something I could figure out):
>
> /([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x
>
> --
>
> a
>
> Andy Bach,
> afbach@gmail.com
> 608 658-1890 cell
> 608 261-5738 wk
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
for Marpa, though Parse::RecDescent is also a popular choice. See my
parsing talk from last month:
https://github.com/frezik/parsing-talk
>
> mcholste wrote:
It's parsing user input, so the spacing and number
> of params are completely variable. From what I can tell (didn't dig
> into any man pages), "?0" will refer to the other paren in the paren pairs
> when matching, but that's more of a guess. That particular RE will
> get ou the first function name followed by everything after, (including
> the trailing paren).
>
> perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)?
> $ /x; $str = "funcOne(funcTwo | funcThree(a,b)),c,d)"; if (@m = $str =~
> $re){ print "y: " . join("#", @m); }'
>
>
> On Tue, Dec 18, 2012 at 3:55 PM, wrote:
>
> afbach wrote: On Tue, Dec 18, 2012 at 1:19 PM, wrote:
> trying to parse something like this:
>
> "funcOne(funcTwo | funcThree(a,b)),c,d)"
> How consistent is your spacing/formatting there? What sort of variance
> (number of params etc.) are you working with.
> I can't get your RE to work - did it get munged (I added some more
> space, but "?0" isn't something I could figure out):
>
> /([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x
>
> --
>
> a
>
> Andy Bach,
> afbach@gmail.com
> 608 658-1890 cell
> 608 261-5738 wk
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
>
> View Online
>
> Madison Area Perl Mongers - MadMongers
> http://www.madmongers.org
>
Re: Need a regexp superhero
User:
afbach
Date: 12/18/2012 6:39 pm
Date: 12/18/2012 6:39 pm
Views: 115
Rating: 0
Rating: 0
On Tue, Dec 18, 2012 at 4:23 PM, <mcholste@gmail.com> wrote:
Cool! I did not know about ?0 etc - but my reading is a bit different. But
qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x;
Nope. "[^()]*+" seems a syntax error (to me) yet I see [^()]++ in perlre [1] and while i sort of get the first example there ... and I can't see why you're using "\(?" - thought you'd need one literal paren (not zero or one) but I can't get it to work, though I get:
$ perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
y: funcOne#funcTwo | funcThree(a,b),c,d)
Sorry, no help.
a
[1]
perldoc perlre says:
"(?PARNO)" "(?−PARNO)" "(?+PARNO)" "(?R)" "(?0)"
Similar to "(??{ code })" except it does not involve
compiling any code, instead it treats the contents of a
capture buffer as an independent pattern that must match at
the current position. Capture buffers contained by the
pattern will have the value as determined by the outermost
recursion.
PARNO is a sequence of digits (not starting with 0) whose
value reflects the paren‐number of the capture buffer to
recurse to. "(?R)" recurses to the beginning of the whole
pattern. "(?0)" is an alternate syntax for "(?R)". If PARNO
is preceded by a plus or minus sign then it is assumed to be
relative, with negative numbers indicating preceding capture
buffers and positive ones following. Thus "(?−1)" refers to
the most recently declared buffer, and "(?+1)" indicates the
next buffer to be declared. Note that the counting for
relative recursion differs from that of relative
backreferences, in that with recursion unclosed buffers are
included.
The following pattern matches a function foo() which may
contain balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non−parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
If the pattern was used as follows
'foo(bar(baz)+baz(bop))'=~/$re/
and print "\$1 = $1\n",
"\$2 = $2\n",
"\$3 = $3\n";
the output produced should be the following:
$1 = foo(bar(baz)+baz(bop))
$2 = (bar(baz)+baz(bop))
$3 = bar(baz)+baz(bop)
If there is no corresponding capture buffer defined, then it
is a fatal error. Recursing deeper than 50 times without
consuming any input string will also result in a fatal error.
The maximum depth is compiled into perl, so changing it
requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a "qr//"
construct for later use:
my $parens = qr/(\((?:[^()]++|(?−1))*+\))/;
if (/foo $parens \s+ + \s+ bar $parens/x) {
# do something here...
}
Note that this pattern does not behave the same way as the
equivalent PCRE or Python construct of the same form. In Perl
you can backtrack into a recursed group, in PCRE and Python
the recursed into group is treated as atomic. Also, modifiers
are resolved at compile time, so constructs like (?i:(?1)) or
(?:(?i)(?1)) do not affect how the sub‐pattern will be
processed.
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
It's parsing user input, so the spacing and number of params are completely variable. From what I can tell (didn't dig into any man pages), "?0" will refer to the other paren in the paren pairs when matching, but that's more of a guess.
Cool! I did not know about ?0 etc - but my reading is a bit different. But
qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x;
Nope. "[^()]*+" seems a syntax error (to me) yet I see [^()]++ in perlre [1] and while i sort of get the first example there ... and I can't see why you're using "\(?" - thought you'd need one literal paren (not zero or one) but I can't get it to work, though I get:
$ perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
y: funcOne#funcTwo | funcThree(a,b),c,d)
Sorry, no help.
a
[1]
perldoc perlre says:
"(?PARNO)" "(?−PARNO)" "(?+PARNO)" "(?R)" "(?0)"
Similar to "(??{ code })" except it does not involve
compiling any code, instead it treats the contents of a
capture buffer as an independent pattern that must match at
the current position. Capture buffers contained by the
pattern will have the value as determined by the outermost
recursion.
PARNO is a sequence of digits (not starting with 0) whose
value reflects the paren‐number of the capture buffer to
recurse to. "(?R)" recurses to the beginning of the whole
pattern. "(?0)" is an alternate syntax for "(?R)". If PARNO
is preceded by a plus or minus sign then it is assumed to be
relative, with negative numbers indicating preceding capture
buffers and positive ones following. Thus "(?−1)" refers to
the most recently declared buffer, and "(?+1)" indicates the
next buffer to be declared. Note that the counting for
relative recursion differs from that of relative
backreferences, in that with recursion unclosed buffers are
included.
The following pattern matches a function foo() which may
contain balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non−parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
If the pattern was used as follows
'foo(bar(baz)+baz(bop))'=~/$re/
and print "\$1 = $1\n",
"\$2 = $2\n",
"\$3 = $3\n";
the output produced should be the following:
$1 = foo(bar(baz)+baz(bop))
$2 = (bar(baz)+baz(bop))
$3 = bar(baz)+baz(bop)
If there is no corresponding capture buffer defined, then it
is a fatal error. Recursing deeper than 50 times without
consuming any input string will also result in a fatal error.
The maximum depth is compiled into perl, so changing it
requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a "qr//"
construct for later use:
my $parens = qr/(\((?:[^()]++|(?−1))*+\))/;
if (/foo $parens \s+ + \s+ bar $parens/x) {
# do something here...
}
Note that this pattern does not behave the same way as the
equivalent PCRE or Python construct of the same form. In Perl
you can backtrack into a recursed group, in PCRE and Python
the recursed into group is treated as atomic. Also, modifiers
are resolved at compile time, so constructs like (?i:(?1)) or
(?:(?i)(?1)) do not affect how the sub‐pattern will be
processed.
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Re: Need a regexp superhero
User:
chrisdolan
Date: 12/18/2012 9:08 pm
Date: 12/18/2012 9:08 pm
Views: 109
Rating: 0
Rating: 0
If you want to parse with regexps, then a good pattern to consider is:
m/ \G ... /c
That's the continued match, which lets you mix a collection of regexp snippets with code to do the stuff that doesn't come naturally to regexps. I used that pattern extensively in CAM::PDF which needed to support arbitrarily deep recursive data structures.
Chris
On Dec 18, 2012, at 6:39 PM, <afbach@gmail.com> <afbach@gmail.com> wrote:
afbach wrote:
On Tue, Dec 18, 2012 at 4:23 PM, <mcholste@gmail.com> wrote:It's parsing user input, so the spacing and number of params are completely variable. From what I can tell (didn't dig into any man pages), "?0" will refer to the other paren in the paren pairs when matching, but that's more of a guess.
Cool! I did not know about ?0 etc - but my reading is a bit different. But
qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x;
Nope. "[^()]*+" seems a syntax error (to me) yet I see [^()]++ in perlre [1] and while i sort of get the first example there ... and I can't see why you're using "\(?" - thought you'd need one literal paren (not zero or one) but I can't get it to work, though I get:
$ perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
y: funcOne#funcTwo | funcThree(a,b),c,d)
Sorry, no help.
a
[1]
perldoc perlre says:
"(?PARNO)" "(?−PARNO)" "(?+PARNO)" "(?R)" "(?0)"
Similar to "(??{ code })" except it does not involve
compiling any code, instead it treats the contents of a
capture buffer as an independent pattern that must match at
the current position. Capture buffers contained by the
pattern will have the value as determined by the outermost
recursion.
PARNO is a sequence of digits (not starting with 0) whose
value reflects the paren‐number of the capture buffer to
recurse to. "(?R)" recurses to the beginning of the whole
pattern. "(?0)" is an alternate syntax for "(?R)". If PARNO
is preceded by a plus or minus sign then it is assumed to be
relative, with negative numbers indicating preceding capture
buffers and positive ones following. Thus "(?−1)" refers to
the most recently declared buffer, and "(?+1)" indicates the
next buffer to be declared. Note that the counting for
relative recursion differs from that of relative
backreferences, in that with recursion unclosed buffers are
included.
The following pattern matches a function foo() which may
contain balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non−parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
If the pattern was used as follows
'foo(bar(baz)+baz(bop))'=~/$re/
and print "\$1 = $1\n",
"\$2 = $2\n",
"\$3 = $3\n";
the output produced should be the following:
$1 = foo(bar(baz)+baz(bop))
$2 = (bar(baz)+baz(bop))
$3 = bar(baz)+baz(bop)
If there is no corresponding capture buffer defined, then it
is a fatal error. Recursing deeper than 50 times without
consuming any input string will also result in a fatal error.
The maximum depth is compiled into perl, so changing it
requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a "qr//"
construct for later use:
my $parens = qr/(\((?:[^()]++|(?−1))*+\))/;
if (/foo $parens \s+ + \s+ bar $parens/x) {
# do something here...
}
Note that this pattern does not behave the same way as the
equivalent PCRE or Python construct of the same form. In Perl
you can backtrack into a recursed group, in PCRE and Python
the recursed into group is treated as atomic. Also, modifiers
are resolved at compile time, so constructs like (?i:(?1)) or
(?:(?i)(?1)) do not affect how the sub‐pattern will be
processed.
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Madison Area Perl Mongers - MadMongers
http://www.madmongers.org
Re: Need a regexp superhero
User:
mcholste
Date: 12/18/2012 11:02 pm
Date: 12/18/2012 11:02 pm
Views: 113
Rating: 0
Rating: 0
Wow, a lot to go through here. Yes, Jon, there was a typo in the example with the extra closing paren. I think that Marpa and a full grammar may be a bit overkill, but then again, maybe not. I'm trying to get Chris's continued match to work in a demo but am thus far unsuccessful.
On Tue, Dec 18, 2012 at 9:08 PM, <chris@chrisdolan.net> wrote:
chrisdolan wrote:
If you want to parse with regexps, then a good pattern to consider is:m/ \G ... /cThat's the continued match, which lets you mix a collection of regexp snippets with code to do the stuff that doesn't come naturally to regexps. I used that pattern extensively in CAM::PDF which needed to support arbitrarily deep recursive data structures.ChrisOn Dec 18, 2012, at 6:39 PM, <afbach@gmail.com> <afbach@gmail.com> wrote:afbach wrote:
On Tue, Dec 18, 2012 at 4:23 PM, <mcholste@gmail.com> wrote:It's parsing user input, so the spacing and number of params are completely variable. From what I can tell (didn't dig into any man pages), "?0" will refer to the other paren in the paren pairs when matching, but that's more of a guess.
Cool! I did not know about ?0 etc - but my reading is a bit different. But
qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x;
Nope. "[^()]*+" seems a syntax error (to me) yet I see [^()]++ in perlre [1] and while i sort of get the first example there ... and I can't see why you're using "\(?" - thought you'd need one literal paren (not zero or one) but I can't get it to work, though I get:
$ perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
y: funcOne#funcTwo | funcThree(a,b),c,d)
Sorry, no help.
a
[1]
perldoc perlre says:
"(?PARNO)" "(?−PARNO)" "(?+PARNO)" "(?R)" "(?0)"
Similar to "(??{ code })" except it does not involve
compiling any code, instead it treats the contents of a
capture buffer as an independent pattern that must match at
the current position. Capture buffers contained by the
pattern will have the value as determined by the outermost
recursion.
PARNO is a sequence of digits (not starting with 0) whose
value reflects the paren‐number of the capture buffer to
recurse to. "(?R)" recurses to the beginning of the whole
pattern. "(?0)" is an alternate syntax for "(?R)". If PARNO
is preceded by a plus or minus sign then it is assumed to be
relative, with negative numbers indicating preceding capture
buffers and positive ones following. Thus "(?−1)" refers to
the most recently declared buffer, and "(?+1)" indicates the
next buffer to be declared. Note that the counting for
relative recursion differs from that of relative
backreferences, in that with recursion unclosed buffers are
included.
The following pattern matches a function foo() which may
contain balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non−parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
If the pattern was used as follows
'foo(bar(baz)+baz(bop))'=~/$re/
and print "\$1 = $1\n",
"\$2 = $2\n",
"\$3 = $3\n";
the output produced should be the following:
$1 = foo(bar(baz)+baz(bop))
$2 = (bar(baz)+baz(bop))
$3 = bar(baz)+baz(bop)
If there is no corresponding capture buffer defined, then it
is a fatal error. Recursing deeper than 50 times without
consuming any input string will also result in a fatal error.
The maximum depth is compiled into perl, so changing it
requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a "qr//"
construct for later use:
my $parens = qr/(\((?:[^()]++|(?−1))*+\))/;
if (/foo $parens \s+ + \s+ bar $parens/x) {
# do something here...
}
Note that this pattern does not behave the same way as the
equivalent PCRE or Python construct of the same form. In Perl
you can backtrack into a recursed group, in PCRE and Python
the recursed into group is treated as atomic. Also, modifiers
are resolved at compile time, so constructs like (?i:(?1)) or
(?:(?i)(?1)) do not affect how the sub‐pattern will be
processed.
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Madison Area Perl Mongers - MadMongers
http://www.madmongers.org
Re: Need a regexp superhero
User:
chrisdolan
Date: 12/19/2012 7:39 am
Date: 12/19/2012 7:39 am
Views: 0
Rating: 0
Rating: 0
If you need an example, go here: http://cpansearch.perl.org/src/CDOLAN/CAM-PDF-1.58/lib/CAM/PDF.pm
and search for "sub parseAny" and look at the methods it calls.
I should have said "m/ \G ... /cg". The "g" is a critical piece and the "\G" position down't work without it. The "c" flag's purpose is "don't reset pos on failed matches when using /g"
Chris
On Dec 18, 2012, at 11:02 PM, <mcholste@gmail.com> wrote:
mcholste wrote:
Wow, a lot to go through here. Yes, Jon, there was a typo in the example with the extra closing paren. I think that Marpa and a full grammar may be a bit overkill, but then again, maybe not. I'm trying to get Chris's continued match to work in a demo but am thus far unsuccessful.
On Tue, Dec 18, 2012 at 9:08 PM, <chris@chrisdolan.net> wrote:chrisdolan wrote:
If you want to parse with regexps, then a good pattern to consider is:m/ \G ... /cThat's the continued match, which lets you mix a collection of regexp snippets with code to do the stuff that doesn't come naturally to regexps. I used that pattern extensively in CAM::PDF which needed to support arbitrarily deep recursive data structures.ChrisOn Dec 18, 2012, at 6:39 PM, <afbach@gmail.com> <afbach@gmail.com> wrote:afbach wrote:
On Tue, Dec 18, 2012 at 4:23 PM, <mcholste@gmail.com> wrote:It's parsing user input, so the spacing and number of params are completely variable. From what I can tell (didn't dig into any man pages), "?0" will refer to the other paren in the paren pairs when matching, but that's more of a guess.
Cool! I did not know about ?0 etc - but my reading is a bit different. But
qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x;
Nope. "[^()]*+" seems a syntax error (to me) yet I see [^()]++ in perlre [1] and while i sort of get the first example there ... and I can't see why you're using "\(?" - thought you'd need one literal paren (not zero or one) but I can't get it to work, though I get:
$ perl -le 'my $re = qr/([^\(]+) \(? ( [^()]*+ | (?0) ) \)? $ /x; $str = "funcOne(funcTwo | funcThree(a,b),c,d)"; if (@m = $str =~ $re){ print "y: " . join("#", @m); }'
y: funcOne#funcTwo | funcThree(a,b),c,d)
Sorry, no help.
a
[1]
perldoc perlre says:
"(?PARNO)" "(?−PARNO)" "(?+PARNO)" "(?R)" "(?0)"
Similar to "(??{ code })" except it does not involve
compiling any code, instead it treats the contents of a
capture buffer as an independent pattern that must match at
the current position. Capture buffers contained by the
pattern will have the value as determined by the outermost
recursion.
PARNO is a sequence of digits (not starting with 0) whose
value reflects the paren‐number of the capture buffer to
recurse to. "(?R)" recurses to the beginning of the whole
pattern. "(?0)" is an alternate syntax for "(?R)". If PARNO
is preceded by a plus or minus sign then it is assumed to be
relative, with negative numbers indicating preceding capture
buffers and positive ones following. Thus "(?−1)" refers to
the most recently declared buffer, and "(?+1)" indicates the
next buffer to be declared. Note that the counting for
relative recursion differs from that of relative
backreferences, in that with recursion unclosed buffers are
included.
The following pattern matches a function foo() which may
contain balanced parentheses as the argument.
$re = qr{ ( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non−parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
If the pattern was used as follows
'foo(bar(baz)+baz(bop))'=~/$re/
and print "\$1 = $1\n",
"\$2 = $2\n",
"\$3 = $3\n";
the output produced should be the following:
$1 = foo(bar(baz)+baz(bop))
$2 = (bar(baz)+baz(bop))
$3 = bar(baz)+baz(bop)
If there is no corresponding capture buffer defined, then it
is a fatal error. Recursing deeper than 50 times without
consuming any input string will also result in a fatal error.
The maximum depth is compiled into perl, so changing it
requires a custom build.
The following shows how using negative indexing can make it
easier to embed recursive patterns inside of a "qr//"
construct for later use:
my $parens = qr/(\((?:[^()]++|(?−1))*+\))/;
if (/foo $parens \s+ + \s+ bar $parens/x) {
# do something here...
}
Note that this pattern does not behave the same way as the
equivalent PCRE or Python construct of the same form. In Perl
you can backtrack into a recursed group, in PCRE and Python
the recursed into group is treated as atomic. Also, modifiers
are resolved at compile time, so constructs like (?i:(?1)) or
(?:(?i)(?1)) do not affect how the sub‐pattern will be
processed.
--
a
Andy Bach,
afbach@gmail.com
608 658-1890 cell
608 261-5738 wk
Madison Area Perl Mongers - MadMongers
http://www.madmongers.org
Madison Area Perl Mongers - MadMongers
http://www.madmongers.org
Re: Need a regexp superhero
User:
david-delikat
Date: 12/18/2012 4:25 pm
Date: 12/18/2012 4:25 pm
Views: 0
Rating: 0
Rating: 0
here's a solution...
perl -le ' my $parm = qr/((?:(?:\w+(?:\([\w,]+\))?)|[\s|]+)+)/;' \
' print q/"/,join( q/" "/, ("funcOne(funcTwo | funcThree(a,b),c,d)" =~ /^(\w+)\($parm(?:,$parm(?:,$parm))\)$/)), q/"/'
==>
"funcOne" "funcTwo | funcThree(a,b)" "c" "d"
problem is that you have to have '(?:,$parm)' for every possible parameter
otherwise you only get the last one.
your solution would be simpler if you did something like:
my $parm = qr/((?:(?:\w+(?:\([\w,]+\))?)|[\s|]+)+)/gc;
my $res;
if( $line =~ /\w+\(/gc ) {
$res->{name} = $1;
while( $line !~ /^\)/gc ) {
if( $line =~ $parm ) {
push @{$res->{parms}}, $1;
}
}
}
or something like that… it would handle any number of parameters.
or better yet, try Marpa or Parse::RecDescent...
On Dec 18, 2012, at 1:19 PM, <mcholste@gmail.com> <mcholste@gmail.com> wrote:
mcholste wrote:
to capture the first function name and detect the inner nested parens, then I use that to create a "mask" to replace those strings within the larger string, (to remove the commas), then do a normal split to find the last two params ("c", "d"). There's got to be a better way, suggestions?I've got a hack working where I use one regex:"funcOne", "funcTwo | funcThree(a,b))", "c", "d"parsed into:Ok, this is getting way harder than it should be, trying to parse something like this:"funcOne(funcTwo | funcThree(a,b)),c,d)"
/([^\(]+)\(?( [^()]*+ | (?0) )\)?$/x
Thanks,
Martin
Madison Area Perl Mongers - MadMongers
http://www.madmongers.org