Profile of PPIx/Regexp.pm

Filename	/Users/timbo/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/PPIx/Regexp.pm
Statements	Executed 22 statements in 890µs

Subroutines
Calls	P	F	Exclusive Time	Inclusive Time	Subroutine
1	1	1	1.79ms	32.9ms	PPIx::Regexp::::BEGIN@90PPIx::Regexp::BEGIN@90
1	1	1	12µs	24µs	PPIx::Regexp::::BEGIN@85PPIx::Regexp::BEGIN@85
1	1	1	11µs	16µs	PPIx::Regexp::::BEGIN@86PPIx::Regexp::BEGIN@86
1	1	1	8µs	29µs	PPIx::Regexp::::BEGIN@92PPIx::Regexp::BEGIN@92
1	1	1	7µs	4.17ms	PPIx::Regexp::::BEGIN@88PPIx::Regexp::BEGIN@88
1	1	1	7µs	26µs	PPIx::Regexp::::BEGIN@93PPIx::Regexp::BEGIN@93
1	1	1	4µs	4µs	PPIx::Regexp::::BEGIN@91PPIx::Regexp::BEGIN@91
0	0	0	0s	0s	PPIx::Regexp::::_cache_sizePPIx::Regexp::_cache_size
0	0	0	0s	0s	PPIx::Regexp::::_componentPPIx::Regexp::_component
0	0	0	0s	0s	PPIx::Regexp::::can_be_quantifiedPPIx::Regexp::can_be_quantified
0	0	0	0s	0s	PPIx::Regexp::::capture_namesPPIx::Regexp::capture_names
0	0	0	0s	0s	PPIx::Regexp::::delimitersPPIx::Regexp::delimiters
0	0	0	0s	0s	PPIx::Regexp::::errstrPPIx::Regexp::errstr
0	0	0	0s	0s	PPIx::Regexp::::failuresPPIx::Regexp::failures
0	0	0	0s	0s	PPIx::Regexp::::flush_cachePPIx::Regexp::flush_cache
0	0	0	0s	0s	PPIx::Regexp::::max_capture_numberPPIx::Regexp::max_capture_number
0	0	0	0s	0s	PPIx::Regexp::::modifierPPIx::Regexp::modifier
0	0	0	0s	0s	PPIx::Regexp::::modifier_assertedPPIx::Regexp::modifier_asserted
0	0	0	0s	0s	PPIx::Regexp::::newPPIx::Regexp::new
0	0	0	0s	0s	PPIx::Regexp::::new_from_cachePPIx::Regexp::new_from_cache
0	0	0	0s	0s	PPIx::Regexp::::regular_expressionPPIx::Regexp::regular_expression
0	0	0	0s	0s	PPIx::Regexp::::replacementPPIx::Regexp::replacement
0	0	0	0s	0s	PPIx::Regexp::::sourcePPIx::Regexp::source
0	0	0	0s	0s	PPIx::Regexp::::typePPIx::Regexp::type

Call graph for these subroutines as a Graphviz dot language file.

Line	State ments	Time on line	Calls	Time in subs	Code
1					=head1 NAME
2
3					PPIx::Regexp - Represent a regular expression of some sort
4
5					=head1 SYNOPSIS
6
7					use PPIx::Regexp;
8					use PPIx::Regexp::Dumper;
9					my $re = PPIx::Regexp->new( 'qr{foo}smx' );
10					PPIx::Regexp::Dumper->new( $re )
11					->print();
12
13					=head1 INHERITANCE
14
15					C<PPIx::Regexp> is a L<PPIx::Regexp::Node\|PPIx::Regexp::Node>.
16
17					C<PPIx::Regexp> has no descendants.
18
19					=head1 DESCRIPTION
20
21					The purpose of the F<PPIx-Regexp> package is to parse regular
22					expressions in a manner similar to the way the L<PPI\|PPI> package parses
23					Perl. This class forms the root of the parse tree, playing a role
24					similar to L<PPI::Document\|PPI::Document>.
25
26					This package shares with L<PPI\|PPI> the property of being round-trip
27					safe. That is,
28
29					my $expr = 's/ ( \d+ ) ( \D+ ) /$2$1/smxg';
30					my $re = PPIx::Regexp->new( $expr );
31					print $re->content() eq $expr ? "yes\n" : "no\n"
32
33					should print 'yes' for any valid regular expression.
34
35					Navigation is similar to that provided by L<PPI\|PPI>. That is to say,
36					things like C<children>, C<find_first>, C<snext_sibling> and so on all
37					work pretty much the same way as in L<PPI\|PPI>.
38
39					The class hierarchy is also similar to L<PPI\|PPI>. Except for some
40					utility classes (the dumper, the lexer, and the tokenizer) all classes
41					are descended from L<PPIx::Regexp::Element\|PPIx::Regexp::Element>, which
42					provides basic navigation. Tokens are descended from
43					L<PPIx::Regexp::Token\|PPIx::Regexp::Token>, which provides content. All
44					containers are descended from L<PPIx::Regexp::Node\|PPIx::Regexp::Node>,
45					which provides for children, and all structure elements are descended
46					from L<PPIx::Regexp::Structure\|PPIx::Regexp::Structure>, which provides
47					beginning and ending delimiters, and a type.
48
49					There are two features of L<PPI\|PPI> that this package does not provide
50					- mutability and operator overloading. There are no plans for serious
51					mutability, though something like L<PPI\|PPI>'s C<prune> functionality
52					might be considered. Similarly there are no plans for operator
53					overloading, which appears to the author to represent a performance hit
54					for little tangible gain.
55
56					=head1 NOTICE
57
58					The author will attempt to preserve the documented interface, but if the
59					interface needs to change to correct some egregiously bad design or
60					implementation decision, then it will change. Any incompatible changes
61					will go through a deprecation cycle.
62
63					The goal of this package is to parse well-formed regular expressions
64					correctly. A secondary goal is not to blow up on ill-formed regular
65					expressions. The correct identification and characterization of
66					ill-formed regular expressions is B<not> a goal of this package.
67
68					This policy attempts to track features in development releases as well
69					as public releases. However, features added in a development release and
70					then removed before the next production release B<will not> be tracked,
71					and any functionality relating to such features B<will be removed>. The
72					issue here is the potential re-use (with different semantics) of syntax
73					that did not make it into the production release.
74
75					=head1 METHODS
76
77					This class provides the following public methods. Methods not documented
78					here are private, and unsupported in the sense that the author reserves
79					the right to change or remove them without notice.
80
81					=cut
82
83					package PPIx::Regexp;
84
85	2	20µs	2	35µs	# spent 24µs (12+12) within PPIx::Regexp::BEGIN@85 which was called: # once (12µs+12µs) by Perl::Critic::Document::BEGIN@28 at line 85 use strict; # spent 24µs making 1 call to PPIx::Regexp::BEGIN@85 # spent 12µs making 1 call to strict::import
86	2	23µs	2	21µs	# spent 16µs (11+5) within PPIx::Regexp::BEGIN@86 which was called: # once (11µs+5µs) by Perl::Critic::Document::BEGIN@28 at line 86 use warnings; # spent 16µs making 1 call to PPIx::Regexp::BEGIN@86 # spent 5µs making 1 call to warnings::import
87
88	2	26µs	2	8.32ms	# spent 4.17ms (7µs+4.16) within PPIx::Regexp::BEGIN@88 which was called: # once (7µs+4.16ms) by Perl::Critic::Document::BEGIN@28 at line 88 use base qw{ PPIx::Regexp::Node }; # spent 4.17ms making 1 call to PPIx::Regexp::BEGIN@88 # spent 4.16ms making 1 call to base::import
89
90	2	98µs	1	32.9ms	# spent 32.9ms (1.79+31.1) within PPIx::Regexp::BEGIN@90 which was called: # once (1.79ms+31.1ms) by Perl::Critic::Document::BEGIN@28 at line 90 use PPIx::Regexp::Lexer (); # spent 32.9ms making 1 call to PPIx::Regexp::BEGIN@90
91	2	20µs	1	4µs	# spent 4µs within PPIx::Regexp::BEGIN@91 which was called: # once (4µs+0s) by Perl::Critic::Document::BEGIN@28 at line 91 use PPIx::Regexp::Token::Modifier (); # For its modifier manipulations. # spent 4µs making 1 call to PPIx::Regexp::BEGIN@91
92	2	21µs	2	50µs	# spent 29µs (8+22) within PPIx::Regexp::BEGIN@92 which was called: # once (8µs+22µs) by Perl::Critic::Document::BEGIN@28 at line 92 use PPIx::Regexp::Util qw{ __instance }; # spent 29µs making 1 call to PPIx::Regexp::BEGIN@92 # spent 22µs making 1 call to Exporter::import
93	2	675µs	2	46µs	# spent 26µs (7+20) within PPIx::Regexp::BEGIN@93 which was called: # once (7µs+20µs) by Perl::Critic::Document::BEGIN@28 at line 93 use Scalar::Util qw{ refaddr }; # spent 26µs making 1 call to PPIx::Regexp::BEGIN@93 # spent 20µs making 1 call to Exporter::import
94
95	1	600ns			our $VERSION = '0.036';
96
97					=head2 new
98
99					my $re = PPIx::Regexp->new('/foo/');
100
101					This method instantiates a C<PPIx::Regexp> object from a string, a
102					L<PPI::Token::QuoteLike::Regexp\|PPI::Token::QuoteLike::Regexp>, a
103					L<PPI::Token::Regexp::Match\|PPI::Token::Regexp::Match>, or a
104					L<PPI::Token::Regexp::Substitute\|PPI::Token::Regexp::Substitute>.
105					Honestly, any L<PPI::Element\|PPI::Element> will do, but only the three
106					Regexp classes mentioned previously are likely to do anything useful.
107
108					Optionally you can pass one or more name/value pairs after the regular
109					expression. The possible options are:
110
111					=over
112
113					=item default_modifiers array_reference
114
115					This option specifies a reference to an array of default modifiers to
116					apply to the regular expression being parsed. Each modifier is specified
117					as a string. Any actual modifiers found supersede the defaults.
118
119					When applying the defaults, C<'?'> and C<'/'> are completely ignored,
120					and C<'^'> is ignored unless it occurs at the beginning of the modifier.
121					The first dash (C<'-'>) causes subsequent modifiers to be negated.
122
123					So, for example, if you wish to produce a C<PPIx::Regexp> object
124					representing the regular expression in
125
126					use re '/smx';
127					{
128					no re '/x';
129					m/ foo /;
130					}
131
132					you would (after some help from L<PPI\|PPI> in finding the relevant
133					statements), do something like
134
135					my $re = PPIx::Regexp->new( 'm/ foo /',
136					default_modifiers => [ '/smx', '-/x' ] );
137					`
138					=item encoding name
139
140					This option specifies the encoding of the regular expression. This is
141					passed to the tokenizer, which will C<decode> the regular expression
142					string before it tokenizes it. For example:
143
144					my $re = PPIx::Regexp->new( '/foo/',
145					encoding => 'iso-8859-1',
146					);
147
148					=item trace number
149
150					If greater than zero, this option causes trace output from the parse.
151					The author reserves the right to change or eliminate this without
152					notice.
153
154					=back
155
156					Passing optional input other than the above is not an error, but neither
157					is it supported.
158
159					=cut
160
161					{
162
163	2	600ns			my $errstr;
164
165					sub new {
166					my ( $class, $content, %args ) = @_;
167					ref $class and $class = ref $class;
168
169					$errstr = undef;
170
171					my $tokenizer = PPIx::Regexp::Tokenizer->new(
172					$content, %args ) or do {
173					$errstr = PPIx::Regexp::Tokenizer->errstr();
174					return;
175					};
176
177					my $lexer = PPIx::Regexp::Lexer->new( $tokenizer, %args );
178					my @nodes = $lexer->lex();
179					my $self = $class->SUPER::_new( @nodes );
180					$self->{source} = $content;
181					$self->{failures} = $lexer->failures();
182					$self->{effective_modifiers} =
183					$tokenizer->__effective_modifiers();
184					return $self;
185					}
186
187					sub errstr {
188					return $errstr;
189					}
190
191					}
192
193					=head2 new_from_cache
194
195					This static method wraps L</new> in a caching mechanism. Only one object
196					will be generated for a given L<PPI::Element\|PPI::Element>, no matter
197					how many times this method is called. Calls after the first for a given
198					L<PPI::Element\|PPI::ELement> simply return the same C<PPIx::Regexp>
199					object.
200
201					When the C<PPIx::Regexp> object is returned from cache, the values of
202					the optional arguments are ignored.
203
204					Calls to this method with the regular expression in a string rather than
205					a L<PPI::Element\|PPI::Element> will not be cached.
206
207					B<Caveat:> This method is provided for code like
208					L<Perl::Critic\|Perl::Critic> which might instantiate the same object
209					multiple times. The cache will persist until L</flush_cache> is called.
210
211					=head2 flush_cache
212
213					$re->flush_cache(); # Remove $re from cache
214					PPIx::Regexp->flush_cache(); # Empty the cache
215
216					This method flushes the cache used by L</new_from_cache>. If called as a
217					static method with no arguments, the entire cache is emptied. Otherwise
218					any objects specified are removed from the cache.
219
220					=cut
221
222					{
223
224	2	100ns			my %cache;
225
226	1	200ns			our $DISABLE_CACHE; # Leave this undocumented, at least for
227					# now.
228
229					sub _cache_size {
230					return scalar keys %cache;
231					}
232
233					sub new_from_cache {
234					my ( $class, $content, %args ) = @_;
235
236					__instance( $content, 'PPI::Element' )
237					or return $class->new( $content, %args );
238
239					$DISABLE_CACHE and return $class->new( $content, %args );
240
241					my $addr = refaddr( $content );
242					exists $cache{$addr} and return $cache{$addr};
243
244					my $self = $class->new( $content, %args )
245					or return;
246
247					$cache{$addr} = $self;
248
249					return $self;
250
251					}
252
253					sub flush_cache {
254					my @args = @_;
255
256					ref $args[0] or shift @args;
257
258					if ( @args ) {
259					foreach my $obj ( @args ) {
260					if ( __instance( $obj, __PACKAGE__ ) &&
261					__instance( ( my $parent = $obj->source() ),
262					'PPI::Element' ) ) {
263					delete $cache{ refaddr( $parent ) };
264					}
265					}
266					} else {
267					%cache = ();
268					}
269					return;
270					}
271
272					}
273
274					sub can_be_quantified { return; }
275
276
277					=head2 capture_names
278
279					foreach my $name ( $re->capture_names() ) {
280					print "Capture name '$name'\n";
281					}
282
283					This convenience method returns the capture names found in the regular
284					expression.
285
286					This method is equivalent to
287
288					$self->regular_expression()->capture_names();
289
290					except that if C<< $self->regular_expression() >> returns C<undef>
291					(meaning that something went terribly wrong with the parse) this method
292					will simply return.
293
294					=cut
295
296					sub capture_names {
297					my ( $self ) = @_;
298					my $re = $self->regular_expression() or return;
299					return $re->capture_names();
300					}
301
302					=head2 delimiters
303
304					print join("\t", PPIx::Regexp->new('s/foo/bar/')->delimiters());
305					# prints '// //'
306
307					When called in list context, this method returns either one or two
308					strings, depending on whether the parsed expression has a replacement
309					string. In the case of non-bracketed substitutions, the start delimiter
310					of the replacement string is considered to be the same as its finish
311					delimiter, as illustrated by the above example.
312
313					When called in scalar context, you get the delimiters of the regular
314					expression; that is, element 0 of the array that is returned in list
315					context.
316
317					Optionally, you can pass an index value and the corresponding delimiters
318					will be returned; index 0 represents the regular expression's
319					delimiters, and index 1 represents the replacement string's delimiters,
320					which may be undef. For example,
321
322					print PPIx::Regexp->new('s{foo}<bar>')-delimiters(1);
323					# prints '<>'
324
325					If the object was not initialized with a valid regexp of some sort, the
326					results of this method are undefined.
327
328					=cut
329
330					sub delimiters {
331					my ( $self, $inx ) = @_;
332
333					my @rslt;
334					foreach my $method ( qw{ regular_expression replacement } ) {
335					defined ( my $obj = $self->$method() ) or next;
336					push @rslt, $obj->delimiters();
337					}
338
339					defined $inx and return $rslt[$inx];
340					wantarray and return @rslt;
341					defined wantarray and return $rslt[0];
342					return;
343					}
344
345					=head2 errstr
346
347					This static method returns the error string from the most recent attempt
348					to instantiate a C<PPIx::Regexp>. It will be C<undef> if the most recent
349					attempt succeeded.
350
351					=cut
352
353					# defined above, just after sub new.
354
355					=head2 failures
356
357					print "There were ", $re->failures(), " parse failures\n";
358
359					This method returns the number of parse failures. This is a count of the
360					number of unknown tokens plus the number of unterminated structures plus
361					the number of unmatched right brackets of any sort.
362
363					=cut
364
365					sub failures {
366					my ( $self ) = @_;
367					return $self->{failures};
368					}
369
370					=head2 max_capture_number
371
372					print "Highest used capture number ",
373					$re->max_capture_number(), "\n";
374
375					This convenience method returns the highest capture number used by the
376					regular expression. If there are no captures, the return will be 0.
377
378					This method is equivalent to
379
380					$self->regular_expression()->max_capture_number();
381
382					except that if C<< $self->regular_expression() >> returns C<undef>
383					(meaning that something went terribly wrong with the parse) this method
384					will too.
385
386					=cut
387
388					sub max_capture_number {
389					my ( $self ) = @_;
390					my $re = $self->regular_expression() or return;
391					return $re->max_capture_number();
392					}
393
394					=head2 modifier
395
396					my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
397					print $re->modifier()->content(), "\n";
398					# prints 'smx'.
399
400					This method retrieves the modifier of the object. This comes from the
401					end of the initializing string or object and will be a
402					L<PPIx::Regexp::Token::Modifier\|PPIx::Regexp::Token::Modifier>.
403
404					B<Note> that this object represents the actual modifiers present on the
405					regexp, and does not take into account any that may have been applied by
406					default (i.e. via the C<default_modifiers> argument to C<new()>). For
407					something that takes account of default modifiers, see
408					L<modifier_asserted()\|/modifier_asserted>, below.
409
410					In the event of a parse failure, there may not be a modifier present, in
411					which case nothing is returned.
412
413					=cut
414
415					sub modifier {
416					my ( $self ) = @_;
417					return $self->_component( 'PPIx::Regexp::Token::Modifier' );
418					}
419
420					=head2 modifier_asserted
421
422					my $re = PPIx::Regexp->new( '/ . /',
423					default_modifiers => [ 'smx' ] );
424					print $re->modifier_asserted( 'x' ) ? "yes\n" : "no\n";
425					# prints 'yes'.
426
427					This method returns true if the given modifier is asserted for the
428					regexp, whether explicitly or by the modifiers passed in the
429					C<default_modifiers> argument.
430
431					=cut
432
433					sub modifier_asserted {
434					my ( $self, $modifier ) = @_;
435					return PPIx::Regexp::Token::Modifier::__asserts(
436					$self->{effective_modifiers},
437					$modifier,
438					);
439					}
440
441					# This is a kluge for both determining whether the object asserts
442					# modifiers (hence the 'ductype') and determining whether the given
443					# modifier is actually asserted. The signature is the invocant and the
444					# modifier name, which must not be undef. The return is a boolean.
445	1	800ns			*__ducktype_modifier_asserted = \&modifier_asserted;
446
447					=head2 regular_expression
448
449					my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
450					print $re->regular_expression()->content(), "\n";
451					# prints '/(foo)/'.
452
453					This method returns that portion of the object which actually represents
454					a regular expression.
455
456					=cut
457
458					sub regular_expression {
459					my ( $self ) = @_;
460					return $self->_component( 'PPIx::Regexp::Structure::Regexp' );
461					}
462
463					=head2 replacement
464
465					my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
466					print $re->replacement()->content(), "\n";
467					# prints '${1}bar/'.
468
469					This method returns that portion of the object which represents the
470					replacement string. This will be C<undef> unless the regular expression
471					actually has a replacement string. Delimiters will be included, but
472					there will be no beginning delimiter unless the regular expression was
473					bracketed.
474
475					=cut
476
477					sub replacement {
478					my ( $self ) = @_;
479					return $self->_component( 'PPIx::Regexp::Structure::Replacement' );
480					}
481
482					=head2 source
483
484					my $source = $re->source();
485
486					This method returns the object or string that was used to instantiate
487					the object.
488
489					=cut
490
491					sub source {
492					my ( $self ) = @_;
493					return $self->{source};
494					}
495
496					=head2 type
497
498					my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
499					print $re->type()->content(), "\n";
500					# prints 's'.
501
502					This method retrieves the type of the object. This comes from the
503					beginning of the initializing string or object, and will be a
504					L<PPIx::Regexp::Token::Structure\|PPIx::Regexp::Token::Structure>
505					whose C<content> is one of 's',
506					'm', 'qr', or ''.
507
508					=cut
509
510					sub type {
511					my ( $self ) = @_;
512					return $self->_component( 'PPIx::Regexp::Token::Structure' );
513					}
514
515					sub _component {
516					my ( $self, $class ) = @_;
517					foreach my $elem ( $self->children() ) {
518					$elem->isa( $class ) and return $elem;
519					}
520					return;
521					}
522
523	1	4µs			1;
524
525					__END__