<?php
/***
* This simple utf-8 word count function (it only counts)
* is a bit faster then the one with preg_match_all
* about 10x slower then the built-in str_word_count
*
* If you need the hyphen or other code points as word-characters
* just put them into the [brackets] like [^\p{L}\p{N}\'\-]
* If the pattern contains utf-8, utf8_encode() the pattern,
* as it is expected to be valid utf-8 (using the u modifier).
**/
// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>str_word_count
Почист и полокален преглед на PHP референцата, со задржана структура од PHP.net и подобра читливост за примери, секции и белешки.
str_word_count
Референца за `function.str-word-count.php` со подобрена типографија и навигација.
str_word_count
(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)
str_word_count — (PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)
= NULL
Враќа информации за зборовите што се користат во стринг stringБрои го бројот на зборови во format . Ако опционалното format не е специфицирано, тогаш вратената вредност ќе биде цел број што го претставува бројот на пронајдени зборови. Во случај кога
formatе специфицирано, вратената вредност ќе биде низа, чија содржина зависи од
format . Можните вредности за
и резултантните излези се наведени подолу.
Параметри
string-
За целите на оваа функција, 'збор' е дефиниран како зависен од локалот стринг што содржи азбучни знаци, кои исто така може да содржат, но не започнуваат со "'" и "-" знаци. Имајте предвид дека мултибајт локалите не се поддржани.
format-
Стрингот
- Специфицирајте ја вратената вредност на оваа функција. Тековните поддржани вредности се:
-
0 - враќа број на пронајдени зборови
string -
1 - враќа низа што ги содржи сите пронајдени зборови во
string2 - враќа асоцијативна низа, каде клучот е нумеричката позиција на зборот во
characters-
и вредноста е самиот збор
Вратени вредности
Список на дополнителни знаци што ќе се сметаат за 'збор'
format chosen.
Дневник на промени
| Верзија | = NULL |
|---|---|
| 8.0.0 |
characters сега е null.
|
Примери
ако е овозможен колекторот за отпадоци, str_word_count() example
<?php
$str = "Hello fri3nd, you're
looking good today!";
print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));
echo str_word_count($str);
?>Пример #1 Пример што покажува затворачка ознака што го опфаќа последниот нов ред
Array
(
[0] => Hello
[1] => fri
[2] => nd
[3] => you're
[4] => looking
[5] => good
[6] => today
)
Array
(
[0] => Hello
[6] => fri
[10] => nd
[14] => you're
[29] => looking
[46] => good
[51] => today
)
Array
(
[0] => Hello
[1] => fri3nd
[2] => you're
[3] => looking
[4] => good
[5] => today
)
7
Види Исто така
- explode() - Подели стринг по стринг
- preg_split() - Подели стринг по регуларен израз
- count_chars() - Враќа информации за знаците што се користат во низа
- substr_count() Враќа низа или цел број, во зависност од
Белешки од корисници 11 белешки
We can also specify a range of values for charlist.
<?php
$str = "Hello fri3nd, you're
looking good today!
look1234ing";
print_r(str_word_count($str, 1, '0..3'));
?>
will give the result as
Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )<?php
/**
* Returns the number of words in a string.
* As far as I have tested, it is very accurate.
* The string can have HTML in it,
* but you should do something like this first:
*
* $search = array(
* '@<script[^>]*?>.*?</script>@si',
* '@<style[^>]*?>.*?</style>@siU',
* '@<![\s\S]*?--[ \t\n\r]*>@'
* );
* $html = preg_replace($search, '', $html);
*
*/
function word_count($html) {
# strip all html tags
$wc = strip_tags($html);
# remove 'words' that don't consist of alphanumerical characters or punctuation
$pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
$wc = trim(preg_replace($pattern, " ", $wc));
# remove one-letter 'words' that consist only of punctuation
$wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));
# remove superfluous whitespace
$wc = preg_replace("/\s\s+/", " ", $wc);
# split string into an array of words
$wc = explode(" ", $wc);
# remove empty elements
$wc = array_filter($wc);
# return the number of words
return count($wc);
}
?>For spanish speakers a valid character map may be:
<?php
$characterMap = 'áéíóúüñ';
$count = str_word_count($text, 0, $characterMap);
?>Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.
I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.
This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.
<?php
function count_words($string) {
// Return the number of words in a string.
$string= str_replace("'", "'", $string);
$t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
$string= str_replace($t, " ", $string);
$string= trim(preg_replace("/\s+/", " ", $string));
$num= 0;
if (my_strlen($string)>0) {
$word_array= explode(" ", $string);
$num= count($word_array);
}
return $num;
}
function my_strlen($s) {
// Return mb_strlen with encoding UTF-8.
return mb_strlen($s, "UTF-8");
}
?>This example may not be pretty, but It proves accurate:
<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>
Hope I didn't miss any punctuation. ;-)This function doesn't handle accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2
setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>
Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>
My solution:
<?php
function str_word_count_utf8($str) {
$a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>to count words after converting a msword document to plain text with antiword, you can use this function:
<?php
function count_words($text) {
$text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
$text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
$text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
$len = strlen($text);
if (0 === $len) {
return 0;
}
$words = 1;
while ($len--) {
if (' ' === $text[$len]) {
++$words;
}
}
return $words;
}
?>
it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.
counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the stringWords also cannot end in a hyphen unless allowed by the charlist...Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.
It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.
<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
if( strlen( $text ) > $limit ) {
$words = str_word_count( $text, 2, $chars );
$words = array_reverse( $words, TRUE );
foreach( $words as $length => $word ) {
if( $length + strlen( $word ) >= $limit ) {
array_shift( $words );
} else {
break;
}
}
$words = array_reverse( $words );
$text = implode( " ", $words ) . '…';
}
return $text;
}
$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>
1: Hello this is a list of words…
2: Hello this is a list of wordsHere's a function that will trim a $string down to a certian number of words, and add a... on the end of it.
(explansion of muz1's 1st 100 words code)
----------------------------------------------
<?php
function trim_text($text, $count){
$text = str_replace(" ", " ", $text);
$string = explode(" ", $text);
for ( $wordCounter = 0; $wordCounter <= $count;wordCounter++ ){
$trimed .= $string[$wordCounter];
if ( $wordCounter < $count ){ $trimed .= " "; }
else { $trimed .= "..."; }
}
$trimed = trim($trimed);
return $trimed;
}
?>
Usage
------------------------------------------------
<?php
$string = "one two three four";
echo trim_text($string, 3);
?>
returns:
one two three...