Filename | /Users/timbo/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/PPI/Token/HereDoc.pm |
Statements | Executed 39 statements in 966µs |
Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
---|---|---|---|---|---|
1 | 1 | 1 | 115µs | 138µs | __TOKENIZER__on_char | PPI::Token::HereDoc::
1 | 1 | 1 | 20µs | 43µs | BEGIN@87 | PPI::Token::HereDoc::
1 | 1 | 1 | 15µs | 15µs | BEGIN@91 | PPI::Token::HereDoc::
1 | 1 | 1 | 10µs | 59µs | BEGIN@90 | PPI::Token::HereDoc::
2 | 2 | 1 | 7µs | 7µs | heredoc | PPI::Token::HereDoc::
1 | 1 | 1 | 6µs | 6µs | BEGIN@88 | PPI::Token::HereDoc::
3 | 2 | 1 | 5µs | 5µs | CORE:match (opcode) | PPI::Token::HereDoc::
1 | 1 | 1 | 1µs | 1µs | CORE:subst (opcode) | PPI::Token::HereDoc::
0 | 0 | 0 | 0s | 0s | terminator | PPI::Token::HereDoc::
Line | State ments |
Time on line |
Calls | Time in subs |
Code |
---|---|---|---|---|---|
1 | package PPI::Token::HereDoc; | ||||
2 | |||||
3 | =pod | ||||
4 | |||||
5 | =head1 NAME | ||||
6 | |||||
7 | PPI::Token::HereDoc - Token class for the here-doc | ||||
8 | |||||
9 | =head1 INHERITANCE | ||||
10 | |||||
11 | PPI::Token::HereDoc | ||||
12 | isa PPI::Token | ||||
13 | isa PPI::Element | ||||
14 | |||||
15 | =head1 DESCRIPTION | ||||
16 | |||||
17 | Here-docs are incredibly handy when writing Perl, but incredibly tricky | ||||
18 | when parsing it, primarily because they don't follow the general flow of | ||||
19 | input. | ||||
20 | |||||
21 | They jump ahead and nab lines directly off the input buffer. Whitespace | ||||
22 | and newlines may not matter in most Perl code, but they matter in here-docs. | ||||
23 | |||||
24 | They are also tricky to store as an object. They look sort of like an | ||||
25 | operator and a string, but they don't act like it. And they have a second | ||||
26 | section that should be something like a separate token, but isn't because a | ||||
27 | strong can span from above the here-doc content to below it. | ||||
28 | |||||
29 | So when parsing, this is what we do. | ||||
30 | |||||
31 | Firstly, the PPI::Token::HereDoc object, does not represent the C<<< << >>> | ||||
32 | operator, or the "END_FLAG", or the content, or even the terminator. | ||||
33 | |||||
34 | It represents all of them at once. | ||||
35 | |||||
36 | The token itself has only the declaration part as its "content". | ||||
37 | |||||
38 | # This is what the content of a HereDoc token is | ||||
39 | <<FOO | ||||
40 | |||||
41 | # Or this | ||||
42 | <<"FOO" | ||||
43 | |||||
44 | # Or even this | ||||
45 | << 'FOO' | ||||
46 | |||||
47 | That is, the "operator", any whitespace separator, and the quoted or bare | ||||
48 | terminator. So when you call the C<content> method on a HereDoc token, you | ||||
49 | get '<< "FOO"'. | ||||
50 | |||||
51 | As for the content and the terminator, when treated purely in "content" terms | ||||
52 | they do not exist. | ||||
53 | |||||
54 | The content is made available with the C<heredoc> method, and the name of | ||||
55 | the terminator with the C<terminator> method. | ||||
56 | |||||
57 | To make things work in the way you expect, PPI has to play some games | ||||
58 | when doing line/column location calculation for tokens, and also during | ||||
59 | the content parsing and generation processes. | ||||
60 | |||||
61 | Documents cannot simply by recreated by stitching together the token | ||||
62 | contents, and involve a somewhat more expensive procedure, but the extra | ||||
63 | expense should be relatively negligible unless you are doing huge | ||||
64 | quantities of them. | ||||
65 | |||||
66 | Please note that due to the immature nature of PPI in general, we expect | ||||
67 | C<HereDocs> to be a rich (bad) source of corner-case bugs for quite a while, | ||||
68 | but for the most part they should more or less DWYM. | ||||
69 | |||||
70 | =head2 Comparison to other string types | ||||
71 | |||||
72 | Although technically it can be considered a quote, for the time being | ||||
73 | C<HereDocs> are being treated as a completely separate C<Token> subclass, | ||||
74 | and will not be found in a search for L<PPI::Token::Quote> or | ||||
75 | L<PPI::Token::QuoteLike objects>. | ||||
76 | |||||
77 | This may change in the future, with it most likely to end up under | ||||
78 | QuoteLike. | ||||
79 | |||||
80 | =head1 METHODS | ||||
81 | |||||
82 | Although it has the standard set of C<Token> methods, C<HereDoc> objects | ||||
83 | have a relatively large number of unique methods all of their own. | ||||
84 | |||||
85 | =cut | ||||
86 | |||||
87 | 2 | 35µs | 2 | 66µs | # spent 43µs (20+23) within PPI::Token::HereDoc::BEGIN@87 which was called:
# once (20µs+23µs) by PPI::Token::BEGIN@70 at line 87 # spent 43µs making 1 call to PPI::Token::HereDoc::BEGIN@87
# spent 23µs making 1 call to strict::import |
88 | 2 | 32µs | 1 | 6µs | # spent 6µs within PPI::Token::HereDoc::BEGIN@88 which was called:
# once (6µs+0s) by PPI::Token::BEGIN@70 at line 88 # spent 6µs making 1 call to PPI::Token::HereDoc::BEGIN@88 |
89 | |||||
90 | 2 | 50µs | 2 | 108µs | # spent 59µs (10+49) within PPI::Token::HereDoc::BEGIN@90 which was called:
# once (10µs+49µs) by PPI::Token::BEGIN@70 at line 90 # spent 59µs making 1 call to PPI::Token::HereDoc::BEGIN@90
# spent 49µs making 1 call to vars::import |
91 | # spent 15µs within PPI::Token::HereDoc::BEGIN@91 which was called:
# once (15µs+0s) by PPI::Token::BEGIN@70 at line 94 | ||||
92 | 1 | 600ns | $VERSION = '1.215'; | ||
93 | 1 | 22µs | @ISA = 'PPI::Token'; | ||
94 | 1 | 706µs | 1 | 15µs | } # spent 15µs making 1 call to PPI::Token::HereDoc::BEGIN@91 |
95 | |||||
- - | |||||
100 | ##################################################################### | ||||
101 | # PPI::Token::HereDoc Methods | ||||
102 | |||||
103 | =pod | ||||
104 | |||||
105 | =head2 heredoc | ||||
106 | |||||
107 | The C<heredoc> method is the authoritative method for accessing the contents | ||||
108 | of the C<HereDoc> object. | ||||
109 | |||||
110 | It returns the contents of the here-doc as a list of newline-terminated | ||||
111 | strings. If called in scalar context, it returns the number of lines in | ||||
112 | the here-doc, B<excluding> the terminator line. | ||||
113 | |||||
114 | =cut | ||||
115 | |||||
116 | # spent 7µs within PPI::Token::HereDoc::heredoc which was called 2 times, avg 3µs/call:
# once (4µs+0s) by PPI::Document::serialize at line 464 of PPI/Document.pm
# once (3µs+0s) by PPI::Document::index_locations at line 627 of PPI/Document.pm | ||||
117 | wantarray | ||||
118 | ? @{shift->{_heredoc}} | ||||
119 | 2 | 12µs | : scalar @{shift->{_heredoc}}; | ||
120 | } | ||||
121 | |||||
122 | =pod | ||||
123 | |||||
124 | =head2 terminator | ||||
125 | |||||
126 | The C<terminator> method returns the name of the terminating string for the | ||||
127 | here-doc. | ||||
128 | |||||
129 | Returns the terminating string as an unescaped string (in the rare case | ||||
130 | the terminator has an escaped quote in it). | ||||
131 | |||||
132 | =cut | ||||
133 | |||||
134 | sub terminator { | ||||
135 | shift->{_terminator}; | ||||
136 | } | ||||
137 | |||||
- - | |||||
142 | ##################################################################### | ||||
143 | # Tokenizer Methods | ||||
144 | |||||
145 | # Parse in the entire here-doc in one call | ||||
146 | # spent 138µs (115+23) within PPI::Token::HereDoc::__TOKENIZER__on_char which was called:
# once (115µs+23µs) by PPI::Token::Operator::__TOKENIZER__on_char at line 102 of PPI/Token/Operator.pm | ||||
147 | 1 | 600ns | my $t = $_[1]; | ||
148 | |||||
149 | # We are currently located on the first char after the << | ||||
150 | |||||
151 | # Handle the most common form first for simplicity and speed reasons | ||||
152 | ### FIXME - This regex, and this method in general, do not yet allow | ||||
153 | ### for the null here-doc, which terminates at the first | ||||
154 | ### empty line. | ||||
155 | 1 | 1µs | my $rest_of_line = substr( $t->{line}, $t->{line_cursor} ); | ||
156 | 1 | 9µs | 1 | 2µs | unless ( $rest_of_line =~ /^( \s* (?: "[^"]*" | '[^']*' | `[^`]*` | \\?\w+ ) )/x ) { # spent 2µs making 1 call to PPI::Token::HereDoc::CORE:match |
157 | # Degenerate to a left-shift operation | ||||
158 | $t->{token}->set_class('Operator'); | ||||
159 | return $t->_finalize_token->__TOKENIZER__on_char( $t ); | ||||
160 | } | ||||
161 | |||||
162 | # Add the rest of the token, work out what type it is, | ||||
163 | # and suck in the content until the end. | ||||
164 | 1 | 500ns | my $token = $t->{token}; | ||
165 | 1 | 54µs | $token->{content} .= $1; | ||
166 | 1 | 1µs | $t->{line_cursor} += length $1; | ||
167 | |||||
168 | # Find the terminator, clean it up and determine | ||||
169 | # the type of here-doc we are dealing with. | ||||
170 | 1 | 900ns | my $content = $token->{content}; | ||
171 | 1 | 8µs | 2 | 3µs | if ( $content =~ /^\<\<(\w+)$/ ) { # spent 3µs making 2 calls to PPI::Token::HereDoc::CORE:match, avg 1µs/call |
172 | # Bareword | ||||
173 | $token->{_mode} = 'interpolate'; | ||||
174 | $token->{_terminator} = $1; | ||||
175 | |||||
176 | } elsif ( $content =~ /^\<\<\s*\'(.*)\'$/ ) { | ||||
177 | # ''-quoted literal | ||||
178 | 1 | 1µs | $token->{_mode} = 'literal'; | ||
179 | 1 | 1µs | $token->{_terminator} = $1; | ||
180 | 1 | 7µs | 1 | 1µs | $token->{_terminator} =~ s/\\'/'/g; # spent 1µs making 1 call to PPI::Token::HereDoc::CORE:subst |
181 | |||||
182 | } elsif ( $content =~ /^\<\<\s*\"(.*)\"$/ ) { | ||||
183 | # ""-quoted literal | ||||
184 | $token->{_mode} = 'interpolate'; | ||||
185 | $token->{_terminator} = $1; | ||||
186 | $token->{_terminator} =~ s/\\"/"/g; | ||||
187 | |||||
188 | } elsif ( $content =~ /^\<\<\s*\`(.*)\`$/ ) { | ||||
189 | # ``-quoted command | ||||
190 | $token->{_mode} = 'command'; | ||||
191 | $token->{_terminator} = $1; | ||||
192 | $token->{_terminator} =~ s/\\`/`/g; | ||||
193 | |||||
194 | } elsif ( $content =~ /^\<\<\\(\w+)$/ ) { | ||||
195 | # Legacy forward-slashed bareword | ||||
196 | $token->{_mode} = 'literal'; | ||||
197 | $token->{_terminator} = $1; | ||||
198 | |||||
199 | } else { | ||||
200 | # WTF? | ||||
201 | return undef; | ||||
202 | } | ||||
203 | |||||
204 | # Define $line outside of the loop, so that if we encounter the | ||||
205 | # end of the file, we have access to the last line still. | ||||
206 | 1 | 500ns | my $line; | ||
207 | |||||
208 | # Suck in the HEREDOC | ||||
209 | 1 | 1µs | $token->{_heredoc} = []; | ||
210 | 1 | 1µs | my $terminator = $token->{_terminator} . "\n"; | ||
211 | 1 | 2µs | 1 | 3µs | while ( defined($line = $t->_get_line) ) { # spent 3µs making 1 call to PPI::Tokenizer::_get_line |
212 | 6 | 900ns | if ( $line eq $terminator ) { | ||
213 | # Keep the actual termination line for consistency | ||||
214 | # when we are re-assembling the file | ||||
215 | 1 | 600ns | $token->{_terminator_line} = $line; | ||
216 | |||||
217 | # The HereDoc is now fully parsed | ||||
218 | 1 | 6µs | 2 | 5µs | return $t->_finalize_token->__TOKENIZER__on_char( $t ); # spent 2µs making 1 call to PPI::Token::Whitespace::__TOKENIZER__on_char
# spent 2µs making 1 call to PPI::Tokenizer::_finalize_token |
219 | } | ||||
220 | |||||
221 | # Add the line | ||||
222 | 5 | 9µs | 5 | 10µs | push @{$token->{_heredoc}}, $line; # spent 10µs making 5 calls to PPI::Tokenizer::_get_line, avg 2µs/call |
223 | } | ||||
224 | |||||
225 | # End of file. | ||||
226 | # Error: Didn't reach end of here-doc before end of file. | ||||
227 | # $line might be undef if we get NO lines. | ||||
228 | if ( defined $line and $line eq $token->{_terminator} ) { | ||||
229 | # If the last line matches the terminator | ||||
230 | # but is missing the newline, we want to allow | ||||
231 | # it anyway (like perl itself does). In this case | ||||
232 | # perl would normally throw a warning, but we will | ||||
233 | # also ignore that as well. | ||||
234 | pop @{$token->{_heredoc}}; | ||||
235 | $token->{_terminator_line} = $line; | ||||
236 | } else { | ||||
237 | # The HereDoc was not properly terminated. | ||||
238 | $token->{_terminator_line} = undef; | ||||
239 | |||||
240 | # Trim off the trailing whitespace | ||||
241 | if ( defined $token->{_heredoc}->[-1] and $t->{source_eof_chop} ) { | ||||
242 | chop $token->{_heredoc}->[-1]; | ||||
243 | $t->{source_eof_chop} = ''; | ||||
244 | } | ||||
245 | } | ||||
246 | |||||
247 | # Set a hint for PPI::Document->serialize so it can | ||||
248 | # inexpensively repair it if needed when writing back out. | ||||
249 | $token->{_damaged} = 1; | ||||
250 | |||||
251 | # The HereDoc is not fully parsed | ||||
252 | $t->_finalize_token->__TOKENIZER__on_char( $t ); | ||||
253 | } | ||||
254 | |||||
255 | 1 | 3µs | 1; | ||
256 | |||||
257 | =pod | ||||
258 | |||||
259 | =head1 TO DO | ||||
260 | |||||
261 | - Implement PPI::Token::Quote interface compatibility | ||||
262 | |||||
263 | - Check CPAN for any use of the null here-doc or here-doc-in-s///e | ||||
264 | |||||
265 | - Add support for the null here-doc | ||||
266 | |||||
267 | - Add support for here-doc in s///e | ||||
268 | |||||
269 | =head1 SUPPORT | ||||
270 | |||||
271 | See the L<support section|PPI/SUPPORT> in the main module. | ||||
272 | |||||
273 | =head1 AUTHOR | ||||
274 | |||||
275 | Adam Kennedy E<lt>adamk@cpan.orgE<gt> | ||||
276 | |||||
277 | =head1 COPYRIGHT | ||||
278 | |||||
279 | Copyright 2001 - 2011 Adam Kennedy. | ||||
280 | |||||
281 | This program is free software; you can redistribute | ||||
282 | it and/or modify it under the same terms as Perl itself. | ||||
283 | |||||
284 | The full text of the license can be found in the | ||||
285 | LICENSE file included with this module. | ||||
286 | |||||
287 | =cut | ||||
sub PPI::Token::HereDoc::CORE:match; # opcode | |||||
# spent 1µs within PPI::Token::HereDoc::CORE:subst which was called:
# once (1µs+0s) by PPI::Token::HereDoc::__TOKENIZER__on_char at line 180 |