I wanted to use the tokenizer functions to count source lines of code, including counting comments. Attempting to do this with regular expressions does not work well because of situations where /* appears in a string, or other situations. The token_get_all() function makes this task easy by detecting all the comments properly. However, it does not tokenize newline characters. I wrote the below set of functions to also tokenize newline characters as T_NEW_LINE.
<?php
define('T_NEW_LINE', -1);
function token_get_all_nl($source)
{
$new_tokens = array();
// Get the tokens
$tokens = token_get_all($source);
// Split newlines into their own tokens
foreach ($tokens as $token)
{
$token_name = is_array($token) ? $token[0] : null;
$token_data = is_array($token) ? $token[1] : $token;
// Do not split encapsed strings or multiline comments
if ($token_name == T_CONSTANT_ENCAPSED_STRING || substr($token_data, 0, 2) == '/*')
{
$new_tokens[] = array($token_name, $token_data);
continue;
}
// Split the data up by newlines
$split_data = preg_split('#(\r\n|\n)#', $token_data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
foreach ($split_data as $data)
{
if ($data == "\r\n" || $data == "\n")
{
// This is a new line token
$new_tokens[] = array(T_NEW_LINE, $data);
}
else
{
// Add the token under the original token name
$new_tokens[] = is_array($token) ? array($token_name, $data) : $data;
}
}
}
return $new_tokens;
}
function token_name_nl($token)
{
if ($token === T_NEW_LINE)
{
return 'T_NEW_LINE';
}
return token_name($token);
}
?>
Example usage:
<?php
$tokens = token_get_all_nl(file_get_contents('somecode.php'));
foreach ($tokens as $token)
{
if (is_array($token))
{
echo (token_name_nl($token[0]) . ': "' . $token[1] . '"<br />');
}
else
{
echo ('"' . $token . '"<br />');
}
}
?>
I'm sure you can figure out how to count the lines of code, and lines of comments with these functions. This was a huge improvement on my previous attempt at counting lines of code with regular expressions. I hope this helps someone, as many of the user contributed examples on this website have helped me in the past.token_get_all
Почист и полокален преглед на PHP референцата, со задржана структура од PHP.net и подобра читливост за примери, секции и белешки.
token_get_all
Референца за `function.token-get-all.php` со подобрена типографија и навигација.
token_get_all
(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)
token_get_all — Подели го дадениот извор на PHP токени
= NULL
token_get_all() парсира даден code
низа во PHP токени користејќи го лексичкиот скенер на Zend engine.
За список на парсер токени, видете Список на токени на парсер, или користете token_name() за превод на вредноста на токенот во неговата текстуална репрезентација.
Параметри
code-
PHP изворот за парсирање.
flags-
Валидни знаменца:
-
TOKEN_PARSE- Препознава можност за користење на резервирани зборови во специфични контексти.
-
Вратени вредности
Низа од идентификатори на токени. Секој поединечен идентификатор на токен е или еден знак (т.е.: ;, .,
>, !, etc...), or a three element array containing the token index in element 0, the string content of the original token in element 1 and the line number in element 2.
Примери
Пример #1 token_get_all() example
<?php
$tokens = token_get_all('<?php echo; ?>');
foreach ($tokens as $token) {
if (is_array($token)) {
echo "Line {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}
?>Горниот пример ќе прикаже нешто слично на:
Line 1: T_OPEN_TAG ('<?php ')
Line 1: T_ECHO ('echo')
Line 1: T_WHITESPACE (' ')
Line 1: T_CLOSE_TAG ('?>')
Пример #2 token_get_all() неправилен пример за употреба
<?php
$tokens = token_get_all('/* comment */');
foreach ($tokens as $token) {
if (is_array($token)) {
echo "Line {$token[2]}: ", token_name($token[0]), " ('{$token[1]}')", PHP_EOL;
}
}
?>Горниот пример ќе прикаже нешто слично на:
Line 1: T_INLINE_HTML ('/* comment */')
T_INLINE_HTML наместо очекуваното
T_COMMENT. Ова е затоа што не беше користен отворен таг во дадениот код. Ова би било еквивалентно на ставање коментар надвор од PHP таговите во нормална датотека.
Пример #3 token_get_all() на класа што користи резервиран збор пример
<?php
$source = <<<'code'
<?php
class A
{
const PUBLIC = 1;
}
code;
$tokens = token_get_all($source, TOKEN_PARSE);
foreach ($tokens as $token) {
if (is_array($token)) {
echo token_name($token[0]) , PHP_EOL;
}
}
?>Горниот пример ќе прикаже нешто слично на:
T_OPEN_TAG T_WHITESPACE T_CLASS T_WHITESPACE T_STRING T_CONST T_WHITESPACE T_STRING T_LNUMBER
TOKEN_PARSE знаменце, претпоследниот токен (T_STRING) би бил
T_PUBLIC.
Види Исто така
- PhpToken::getTokenName() - Го дели дадениот извор на PHP токени, претставени со PhpToken објекти.
- token_name() - Го добива симболичното име на даден PHP токен
Белешки од корисници 6 белешки
Yes, some problems (On WAMP, PHP 5.3.0 ) with get_token_all()
1 : bug line numbers
Since PHP 5.2.2 token_get_all() should return Line numbers in element 2..
.. but for instance (5.3.0 on WAMP), it work perfectly only with PHP code (not HMTL miwed), but if you have some T_INLINE_HTML detected by token_get_all() , sometimes you find wrongs line numbers (return next line)... :(
2: bug warning message can impact loops
Warning with php code uncompleted (ex : php code line by line) :
for example if a comment tag is not closed token_get_all() can block loops on this warning :
Warning: Unterminated comment starting line
This problem seem not occur in CLI mod (php command line), but only in web mod.
Waiting more stability, used token_get_all() only on PHP code (not HMTL miwed) :
First extract entirely PHP code (with open et close php tag),
Second use token_get_all() on the pure PHP code.
3 : Why there not function to extract PHP code (to extract HTML, we have Tidy..)?
Waiting, I used a function :
The code at end this post :
http://www.developpez.net/forums/d786381/php/langage/
fonctions/analyser-fichier-php-token_get_all/
This function not support :
- Old notation : "<? ?>" and "<% %>"
- heredoc syntax
- nowdoc syntax (since PHP 5.3.0)As a caution: when using TOKEN_PARSE with an invalid php-file, one can get an error like this:
Parse error: syntax error, unexpected '__construct' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in on line 15
Notice the missing filename as this function accepts a string, not a filename and thus has no idea of the latter.
However an exception would be more appreciated.The T_OPEN_TAG token will include the first trailing newline (\r, \n, or \r\n), tab (\t), or space. Any additional space after this token will be in a T_WHITESPACE token.
The T_CLOSE_TAG token will include the first trailing newline (\r, \n, or \r\n; as described here http://php.net/manual/en/language.basic-syntax.instruction-separation.php). Any additional space after this token will be in a T_INLINE_HTML token.Not all tokens are returned as an array. The rule appears to be that if a token is not variable, but instead it is one particular constant string, it is returned as a string instead. You don't get a line number. This is the case for braces( "{", "}"), parentheses ("(", ")"), brackets ("[", "]"), comma (","), semi-colon (";"), and a whole slew of operator signs ("!", "=", "+", "*", "/", ".", "+=", ...).Well, there is a way to parse for errors. See
http://www.php.net/manual/function.php-check-syntax.php#77318