| Filename | /Users/timbo/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/PPI/Token/HereDoc.pm |
| Statements | Executed 39 statements in 966µs |
| Calls | P | F | Exclusive Time |
Inclusive Time |
Subroutine |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 115µs | 138µs | PPI::Token::HereDoc::__TOKENIZER__on_char |
| 1 | 1 | 1 | 20µs | 43µs | PPI::Token::HereDoc::BEGIN@87 |
| 1 | 1 | 1 | 15µs | 15µs | PPI::Token::HereDoc::BEGIN@91 |
| 1 | 1 | 1 | 10µs | 59µs | PPI::Token::HereDoc::BEGIN@90 |
| 2 | 2 | 1 | 7µs | 7µs | PPI::Token::HereDoc::heredoc |
| 1 | 1 | 1 | 6µs | 6µs | PPI::Token::HereDoc::BEGIN@88 |
| 3 | 2 | 1 | 5µs | 5µs | PPI::Token::HereDoc::CORE:match (opcode) |
| 1 | 1 | 1 | 1µs | 1µs | PPI::Token::HereDoc::CORE:subst (opcode) |
| 0 | 0 | 0 | 0s | 0s | PPI::Token::HereDoc::terminator |
| Line | State ments |
Time on line |
Calls | Time in subs |
Code |
|---|---|---|---|---|---|
| 1 | package PPI::Token::HereDoc; | ||||
| 2 | |||||
| 3 | =pod | ||||
| 4 | |||||
| 5 | =head1 NAME | ||||
| 6 | |||||
| 7 | PPI::Token::HereDoc - Token class for the here-doc | ||||
| 8 | |||||
| 9 | =head1 INHERITANCE | ||||
| 10 | |||||
| 11 | PPI::Token::HereDoc | ||||
| 12 | isa PPI::Token | ||||
| 13 | isa PPI::Element | ||||
| 14 | |||||
| 15 | =head1 DESCRIPTION | ||||
| 16 | |||||
| 17 | Here-docs are incredibly handy when writing Perl, but incredibly tricky | ||||
| 18 | when parsing it, primarily because they don't follow the general flow of | ||||
| 19 | input. | ||||
| 20 | |||||
| 21 | They jump ahead and nab lines directly off the input buffer. Whitespace | ||||
| 22 | and newlines may not matter in most Perl code, but they matter in here-docs. | ||||
| 23 | |||||
| 24 | They are also tricky to store as an object. They look sort of like an | ||||
| 25 | operator and a string, but they don't act like it. And they have a second | ||||
| 26 | section that should be something like a separate token, but isn't because a | ||||
| 27 | strong can span from above the here-doc content to below it. | ||||
| 28 | |||||
| 29 | So when parsing, this is what we do. | ||||
| 30 | |||||
| 31 | Firstly, the PPI::Token::HereDoc object, does not represent the C<<< << >>> | ||||
| 32 | operator, or the "END_FLAG", or the content, or even the terminator. | ||||
| 33 | |||||
| 34 | It represents all of them at once. | ||||
| 35 | |||||
| 36 | The token itself has only the declaration part as its "content". | ||||
| 37 | |||||
| 38 | # This is what the content of a HereDoc token is | ||||
| 39 | <<FOO | ||||
| 40 | |||||
| 41 | # Or this | ||||
| 42 | <<"FOO" | ||||
| 43 | |||||
| 44 | # Or even this | ||||
| 45 | << 'FOO' | ||||
| 46 | |||||
| 47 | That is, the "operator", any whitespace separator, and the quoted or bare | ||||
| 48 | terminator. So when you call the C<content> method on a HereDoc token, you | ||||
| 49 | get '<< "FOO"'. | ||||
| 50 | |||||
| 51 | As for the content and the terminator, when treated purely in "content" terms | ||||
| 52 | they do not exist. | ||||
| 53 | |||||
| 54 | The content is made available with the C<heredoc> method, and the name of | ||||
| 55 | the terminator with the C<terminator> method. | ||||
| 56 | |||||
| 57 | To make things work in the way you expect, PPI has to play some games | ||||
| 58 | when doing line/column location calculation for tokens, and also during | ||||
| 59 | the content parsing and generation processes. | ||||
| 60 | |||||
| 61 | Documents cannot simply by recreated by stitching together the token | ||||
| 62 | contents, and involve a somewhat more expensive procedure, but the extra | ||||
| 63 | expense should be relatively negligible unless you are doing huge | ||||
| 64 | quantities of them. | ||||
| 65 | |||||
| 66 | Please note that due to the immature nature of PPI in general, we expect | ||||
| 67 | C<HereDocs> to be a rich (bad) source of corner-case bugs for quite a while, | ||||
| 68 | but for the most part they should more or less DWYM. | ||||
| 69 | |||||
| 70 | =head2 Comparison to other string types | ||||
| 71 | |||||
| 72 | Although technically it can be considered a quote, for the time being | ||||
| 73 | C<HereDocs> are being treated as a completely separate C<Token> subclass, | ||||
| 74 | and will not be found in a search for L<PPI::Token::Quote> or | ||||
| 75 | L<PPI::Token::QuoteLike objects>. | ||||
| 76 | |||||
| 77 | This may change in the future, with it most likely to end up under | ||||
| 78 | QuoteLike. | ||||
| 79 | |||||
| 80 | =head1 METHODS | ||||
| 81 | |||||
| 82 | Although it has the standard set of C<Token> methods, C<HereDoc> objects | ||||
| 83 | have a relatively large number of unique methods all of their own. | ||||
| 84 | |||||
| 85 | =cut | ||||
| 86 | |||||
| 87 | 2 | 35µs | 2 | 66µs | # spent 43µs (20+23) within PPI::Token::HereDoc::BEGIN@87 which was called:
# once (20µs+23µs) by PPI::Token::BEGIN@70 at line 87 # spent 43µs making 1 call to PPI::Token::HereDoc::BEGIN@87
# spent 23µs making 1 call to strict::import |
| 88 | 2 | 32µs | 1 | 6µs | # spent 6µs within PPI::Token::HereDoc::BEGIN@88 which was called:
# once (6µs+0s) by PPI::Token::BEGIN@70 at line 88 # spent 6µs making 1 call to PPI::Token::HereDoc::BEGIN@88 |
| 89 | |||||
| 90 | 2 | 50µs | 2 | 108µs | # spent 59µs (10+49) within PPI::Token::HereDoc::BEGIN@90 which was called:
# once (10µs+49µs) by PPI::Token::BEGIN@70 at line 90 # spent 59µs making 1 call to PPI::Token::HereDoc::BEGIN@90
# spent 49µs making 1 call to vars::import |
| 91 | # spent 15µs within PPI::Token::HereDoc::BEGIN@91 which was called:
# once (15µs+0s) by PPI::Token::BEGIN@70 at line 94 | ||||
| 92 | 1 | 600ns | $VERSION = '1.215'; | ||
| 93 | 1 | 22µs | @ISA = 'PPI::Token'; | ||
| 94 | 1 | 706µs | 1 | 15µs | } # spent 15µs making 1 call to PPI::Token::HereDoc::BEGIN@91 |
| 95 | |||||
| - - | |||||
| 100 | ##################################################################### | ||||
| 101 | # PPI::Token::HereDoc Methods | ||||
| 102 | |||||
| 103 | =pod | ||||
| 104 | |||||
| 105 | =head2 heredoc | ||||
| 106 | |||||
| 107 | The C<heredoc> method is the authoritative method for accessing the contents | ||||
| 108 | of the C<HereDoc> object. | ||||
| 109 | |||||
| 110 | It returns the contents of the here-doc as a list of newline-terminated | ||||
| 111 | strings. If called in scalar context, it returns the number of lines in | ||||
| 112 | the here-doc, B<excluding> the terminator line. | ||||
| 113 | |||||
| 114 | =cut | ||||
| 115 | |||||
| 116 | # spent 7µs within PPI::Token::HereDoc::heredoc which was called 2 times, avg 3µs/call:
# once (4µs+0s) by PPI::Document::serialize at line 464 of PPI/Document.pm
# once (3µs+0s) by PPI::Document::index_locations at line 627 of PPI/Document.pm | ||||
| 117 | wantarray | ||||
| 118 | ? @{shift->{_heredoc}} | ||||
| 119 | 2 | 12µs | : scalar @{shift->{_heredoc}}; | ||
| 120 | } | ||||
| 121 | |||||
| 122 | =pod | ||||
| 123 | |||||
| 124 | =head2 terminator | ||||
| 125 | |||||
| 126 | The C<terminator> method returns the name of the terminating string for the | ||||
| 127 | here-doc. | ||||
| 128 | |||||
| 129 | Returns the terminating string as an unescaped string (in the rare case | ||||
| 130 | the terminator has an escaped quote in it). | ||||
| 131 | |||||
| 132 | =cut | ||||
| 133 | |||||
| 134 | sub terminator { | ||||
| 135 | shift->{_terminator}; | ||||
| 136 | } | ||||
| 137 | |||||
| - - | |||||
| 142 | ##################################################################### | ||||
| 143 | # Tokenizer Methods | ||||
| 144 | |||||
| 145 | # Parse in the entire here-doc in one call | ||||
| 146 | # spent 138µs (115+23) within PPI::Token::HereDoc::__TOKENIZER__on_char which was called:
# once (115µs+23µs) by PPI::Token::Operator::__TOKENIZER__on_char at line 102 of PPI/Token/Operator.pm | ||||
| 147 | 1 | 600ns | my $t = $_[1]; | ||
| 148 | |||||
| 149 | # We are currently located on the first char after the << | ||||
| 150 | |||||
| 151 | # Handle the most common form first for simplicity and speed reasons | ||||
| 152 | ### FIXME - This regex, and this method in general, do not yet allow | ||||
| 153 | ### for the null here-doc, which terminates at the first | ||||
| 154 | ### empty line. | ||||
| 155 | 1 | 1µs | my $rest_of_line = substr( $t->{line}, $t->{line_cursor} ); | ||
| 156 | 1 | 9µs | 1 | 2µs | unless ( $rest_of_line =~ /^( \s* (?: "[^"]*" | '[^']*' | `[^`]*` | \\?\w+ ) )/x ) { # spent 2µs making 1 call to PPI::Token::HereDoc::CORE:match |
| 157 | # Degenerate to a left-shift operation | ||||
| 158 | $t->{token}->set_class('Operator'); | ||||
| 159 | return $t->_finalize_token->__TOKENIZER__on_char( $t ); | ||||
| 160 | } | ||||
| 161 | |||||
| 162 | # Add the rest of the token, work out what type it is, | ||||
| 163 | # and suck in the content until the end. | ||||
| 164 | 1 | 500ns | my $token = $t->{token}; | ||
| 165 | 1 | 54µs | $token->{content} .= $1; | ||
| 166 | 1 | 1µs | $t->{line_cursor} += length $1; | ||
| 167 | |||||
| 168 | # Find the terminator, clean it up and determine | ||||
| 169 | # the type of here-doc we are dealing with. | ||||
| 170 | 1 | 900ns | my $content = $token->{content}; | ||
| 171 | 1 | 8µs | 2 | 3µs | if ( $content =~ /^\<\<(\w+)$/ ) { # spent 3µs making 2 calls to PPI::Token::HereDoc::CORE:match, avg 1µs/call |
| 172 | # Bareword | ||||
| 173 | $token->{_mode} = 'interpolate'; | ||||
| 174 | $token->{_terminator} = $1; | ||||
| 175 | |||||
| 176 | } elsif ( $content =~ /^\<\<\s*\'(.*)\'$/ ) { | ||||
| 177 | # ''-quoted literal | ||||
| 178 | 1 | 1µs | $token->{_mode} = 'literal'; | ||
| 179 | 1 | 1µs | $token->{_terminator} = $1; | ||
| 180 | 1 | 7µs | 1 | 1µs | $token->{_terminator} =~ s/\\'/'/g; # spent 1µs making 1 call to PPI::Token::HereDoc::CORE:subst |
| 181 | |||||
| 182 | } elsif ( $content =~ /^\<\<\s*\"(.*)\"$/ ) { | ||||
| 183 | # ""-quoted literal | ||||
| 184 | $token->{_mode} = 'interpolate'; | ||||
| 185 | $token->{_terminator} = $1; | ||||
| 186 | $token->{_terminator} =~ s/\\"/"/g; | ||||
| 187 | |||||
| 188 | } elsif ( $content =~ /^\<\<\s*\`(.*)\`$/ ) { | ||||
| 189 | # ``-quoted command | ||||
| 190 | $token->{_mode} = 'command'; | ||||
| 191 | $token->{_terminator} = $1; | ||||
| 192 | $token->{_terminator} =~ s/\\`/`/g; | ||||
| 193 | |||||
| 194 | } elsif ( $content =~ /^\<\<\\(\w+)$/ ) { | ||||
| 195 | # Legacy forward-slashed bareword | ||||
| 196 | $token->{_mode} = 'literal'; | ||||
| 197 | $token->{_terminator} = $1; | ||||
| 198 | |||||
| 199 | } else { | ||||
| 200 | # WTF? | ||||
| 201 | return undef; | ||||
| 202 | } | ||||
| 203 | |||||
| 204 | # Define $line outside of the loop, so that if we encounter the | ||||
| 205 | # end of the file, we have access to the last line still. | ||||
| 206 | 1 | 500ns | my $line; | ||
| 207 | |||||
| 208 | # Suck in the HEREDOC | ||||
| 209 | 1 | 1µs | $token->{_heredoc} = []; | ||
| 210 | 1 | 1µs | my $terminator = $token->{_terminator} . "\n"; | ||
| 211 | 1 | 2µs | 1 | 3µs | while ( defined($line = $t->_get_line) ) { # spent 3µs making 1 call to PPI::Tokenizer::_get_line |
| 212 | 6 | 900ns | if ( $line eq $terminator ) { | ||
| 213 | # Keep the actual termination line for consistency | ||||
| 214 | # when we are re-assembling the file | ||||
| 215 | 1 | 600ns | $token->{_terminator_line} = $line; | ||
| 216 | |||||
| 217 | # The HereDoc is now fully parsed | ||||
| 218 | 1 | 6µs | 2 | 5µs | return $t->_finalize_token->__TOKENIZER__on_char( $t ); # spent 2µs making 1 call to PPI::Token::Whitespace::__TOKENIZER__on_char
# spent 2µs making 1 call to PPI::Tokenizer::_finalize_token |
| 219 | } | ||||
| 220 | |||||
| 221 | # Add the line | ||||
| 222 | 5 | 9µs | 5 | 10µs | push @{$token->{_heredoc}}, $line; # spent 10µs making 5 calls to PPI::Tokenizer::_get_line, avg 2µs/call |
| 223 | } | ||||
| 224 | |||||
| 225 | # End of file. | ||||
| 226 | # Error: Didn't reach end of here-doc before end of file. | ||||
| 227 | # $line might be undef if we get NO lines. | ||||
| 228 | if ( defined $line and $line eq $token->{_terminator} ) { | ||||
| 229 | # If the last line matches the terminator | ||||
| 230 | # but is missing the newline, we want to allow | ||||
| 231 | # it anyway (like perl itself does). In this case | ||||
| 232 | # perl would normally throw a warning, but we will | ||||
| 233 | # also ignore that as well. | ||||
| 234 | pop @{$token->{_heredoc}}; | ||||
| 235 | $token->{_terminator_line} = $line; | ||||
| 236 | } else { | ||||
| 237 | # The HereDoc was not properly terminated. | ||||
| 238 | $token->{_terminator_line} = undef; | ||||
| 239 | |||||
| 240 | # Trim off the trailing whitespace | ||||
| 241 | if ( defined $token->{_heredoc}->[-1] and $t->{source_eof_chop} ) { | ||||
| 242 | chop $token->{_heredoc}->[-1]; | ||||
| 243 | $t->{source_eof_chop} = ''; | ||||
| 244 | } | ||||
| 245 | } | ||||
| 246 | |||||
| 247 | # Set a hint for PPI::Document->serialize so it can | ||||
| 248 | # inexpensively repair it if needed when writing back out. | ||||
| 249 | $token->{_damaged} = 1; | ||||
| 250 | |||||
| 251 | # The HereDoc is not fully parsed | ||||
| 252 | $t->_finalize_token->__TOKENIZER__on_char( $t ); | ||||
| 253 | } | ||||
| 254 | |||||
| 255 | 1 | 3µs | 1; | ||
| 256 | |||||
| 257 | =pod | ||||
| 258 | |||||
| 259 | =head1 TO DO | ||||
| 260 | |||||
| 261 | - Implement PPI::Token::Quote interface compatibility | ||||
| 262 | |||||
| 263 | - Check CPAN for any use of the null here-doc or here-doc-in-s///e | ||||
| 264 | |||||
| 265 | - Add support for the null here-doc | ||||
| 266 | |||||
| 267 | - Add support for here-doc in s///e | ||||
| 268 | |||||
| 269 | =head1 SUPPORT | ||||
| 270 | |||||
| 271 | See the L<support section|PPI/SUPPORT> in the main module. | ||||
| 272 | |||||
| 273 | =head1 AUTHOR | ||||
| 274 | |||||
| 275 | Adam Kennedy E<lt>adamk@cpan.orgE<gt> | ||||
| 276 | |||||
| 277 | =head1 COPYRIGHT | ||||
| 278 | |||||
| 279 | Copyright 2001 - 2011 Adam Kennedy. | ||||
| 280 | |||||
| 281 | This program is free software; you can redistribute | ||||
| 282 | it and/or modify it under the same terms as Perl itself. | ||||
| 283 | |||||
| 284 | The full text of the license can be found in the | ||||
| 285 | LICENSE file included with this module. | ||||
| 286 | |||||
| 287 | =cut | ||||
sub PPI::Token::HereDoc::CORE:match; # opcode | |||||
# spent 1µs within PPI::Token::HereDoc::CORE:subst which was called:
# once (1µs+0s) by PPI::Token::HereDoc::__TOKENIZER__on_char at line 180 |