PHP.mk документација

DOMDocument

Почист и полокален преглед на PHP референцата, со задржана структура од PHP.net и подобра читливост за примери, секции и белешки.

class.domdocument.php PHP.net прокси Преводот се освежува

Оригинал на PHP.net

Патека class.domdocument.php Локална патека за оваа страница.

Извор php.net/manual/en Оригиналниот HTML се реупотребува и локално се стилизира.

Режим Прокси + превод во позадина Кодовите, табелите и белешките остануваат читливи во истиот тек.

Референца

DOMDocument

Референца за `class.domdocument.php` со подобрена типографија и навигација.

class.domdocument.php

Класата DOMDocument

класата mysqli_driver

Вовед

Претставува цел HTML или XML документ; служи како корен на дрвото на документот.

Синопсис на класата

class DOMDocument extends DOMNode implements DOMParentNode {

/* Наследни константи */

public const int DOMNode::DOCUMENT_POSITION_DISCONNECTED = 0x1;

public const int DOMNode::DOCUMENT_POSITION_PRECEDING = 0x2;

public const int DOMNode::DOCUMENT_POSITION_FOLLOWING = 0x4;

public const int DOMNode::DOCUMENT_POSITION_CONTAINS = 0x8;

public const int DOMNode::DOCUMENT_POSITION_CONTAINED_BY = 0x10;

public const int DOMNode::DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC = 0x20;

/* Својства */

public readonly ?DOMDocumentType $doctype;

public readonly DOMImplementation $implementation;

public readonly ?DOMElement $documentElement;

public readonly ?string $actualEncoding;

public ?string $encoding;

public readonly ?string $xmlEncoding;

public bool $standalone;

public bool $xmlStandalone;

public ?string $version;

public ?string $xmlVersion;

public bool $strictErrorChecking;

public ?string $documentURI;

public readonly mixed $config;

public bool $formatOutput;

public bool $validateOnParse;

public bool $resolveExternals;

public bool $preserveWhiteSpace;

public bool $recover;

public bool $substituteEntities;

public readonly ?DOMElement $firstElementChild;

public readonly ?DOMElement $lastElementChild;

public readonly int $childElementCount;

/* Наследени својства */

public readonly string $nodeName;

public ?string $nodeValue;

public readonly int $nodeType;

public readonly ?DOMNode $parentNode;

public readonly ?DOMElement $parentElement;

public readonly DOMNodeList $childNodes;

public readonly ?DOMNode $firstChild;

public readonly ?DOMNode $lastChild;

public readonly ?DOMNode $previousSibling;

public readonly ?DOMNode $nextSibling;

public readonly ?DOMNamedNodeMap $attributes;

public readonly bool $isConnected;

public readonly ?DOMDocument $ownerDocument;

public readonly ?string $namespaceURI;

public string $prefix;

public readonly ?string $localName;

public readonly ?string $baseURI;

public string $textContent;

/* Методи */

public __construct(string $version = "1.0", string $encoding = "")

public adoptNode(DOMNode $node): DOMNode|false

public append(DOMNode|string ...$nodes): void

public createAttribute(string $localName): DOMAttr|false

public createAttributeNS(?string $namespace, string $qualifiedName): DOMAttr|false

public createCDATASection(string $data): DOMCdataSection|false

public createComment(string $data): DOMComment

public createDocumentFragment(): DOMDocumentFragment

public createElement(string $localName, string $value = ""): DOMElement|false

public createElementNS(?string $namespace, string $qualifiedName, string $value = ""): DOMElement|false

public createEntityReference(string $name): DOMEntityReference|false

public createProcessingInstruction(string $target, string $data = ""): DOMProcessingInstruction|false

public createTextNode(string $data): DOMText

public getElementById(string $elementId): ?DOMElement

public getElementsByTagName(string $qualifiedName): DOMNodeList

public getElementsByTagNameNS(?string $namespace, string $localName): DOMNodeList

public importNode(DOMNode $node, bool $deep = false): DOMNode|false

public load(string $filename, int $options = 0): bool

public loadHTML(string $source, int $options = 0): bool

public loadHTMLFile(string $filename, int $options = 0): bool

public loadXML(string $source, int $options = 0): bool

public normalizeDocument(): void

public prepend(DOMNode|string ...$nodes): void

public registerNodeClass(string $baseClass, ?string $extendedClass): true

public relaxNGValidate(string $filename): bool

public relaxNGValidateSource(string $source): bool

public replaceChildren(DOMNode|string ...$nodes): void

public save(string $filename, int $options = 0): int|false

public saveHTML(?DOMNode $node = null): string|false

public saveHTMLFile(string $filename): int|false

public saveXML(?DOMNode $node = null, int $options = 0): string|false

public schemaValidate(string $filename, int $flags = 0): bool

public schemaValidateSource(string $source, int $flags = 0): bool

public validate(): bool

public xinclude(int $options = 0): int|false

/* Наследени методи */

public DOMNode::appendChild(DOMNode $node): DOMNode|false

public DOMNode::C14N(
         bool $exclusive = false,
         bool $withComments = false,
         ?array $xpath = null,
         ?array $nsPrefixes = null
): string|false

public DOMNode::C14NFile(
         string $uri,
         bool $exclusive = false,
         bool $withComments = false,
         ?array $xpath = null,
         ?array $nsPrefixes = null
): int|false

public DOMNode::cloneNode(bool $deep = false): DOMNode|false

public DOMNode::compareDocumentPosition(DOMNode $other): int

public DOMNode::contains(DOMNode|DOMNameSpaceNode|null $other): bool

public DOMNode::getLineNo(): int

public DOMNode::getNodePath(): ?string

public DOMNode::getRootNode(?array $options = null): DOMNode

public DOMNode::hasAttributes(): bool

public DOMNode::hasChildNodes(): bool

public DOMNode::insertBefore(DOMNode $node, ?DOMNode $child = null): DOMNode|false

public DOMNode::isDefaultNamespace(string $namespace): bool

public DOMNode::isEqualNode(?DOMNode $otherNode): bool

public DOMNode::isSameNode(DOMNode $otherNode): bool

public DOMNode::isSupported(string $feature, string $version): bool

public DOMNode::lookupNamespaceURI(?string $prefix): ?string

public DOMNode::lookupPrefix(string $namespace): ?string

public DOMNode::normalize(): void

public DOMNode::removeChild(DOMNode $child): DOMNode|false

public DOMNode::replaceChild(DOMNode $node, DOMNode $child): DOMNode|false

public DOMNode::__sleep(): array

public DOMNode::__wakeup(): void

}

Својства

actualEncoding: Застарено од PHP 8.4.0. Актуелното кодирање на документот, е само за читање еквивалентно на encoding.
childElementCount: Бројот на елементи деца.
config: Застарено од PHP 8.4.0. Конфигурација што се користи кога DOMDocument::normalizeDocument() се повикува.
doctype: Декларацијата за тип на документ поврзана со овој документ.
documentElement: На DOMElement објект што е првиот елемент на документот. Ако не се најде, ова се проценува на null.
documentURI: Локацијата на документот или null ако не е дефинирано.
encoding: Кодирање на документот, како што е наведено во XML декларацијата. Овој атрибут не е присутен во конечната спецификација DOM Level 3, но е единствениот начин за манипулирање со кодирањето на XML документот во оваа имплементација.
firstElementChild: Прв елемент дете или null.
formatOutput: Убаво ги форматира излезите со вовлекување и дополнителен простор. Ова нема ефект ако документот е вчитан со preserveWhitespace enabled.
implementation: На DOMImplementation објект што го обработува овој документ.
lastElementChild: Последен елемент на дете или null.
preserveWhiteSpace: Не отстранувајте вишок бел простор. Стандардно на true. Поставувањето на ова на false има исто дејство како поминување LIBXML_NOBLANKS as option to DOMDocument::save() etc.
recover: Проприетарно. Овозможува режим на опоравување, т.е. обид за парсирање на недобро формирани документи. Овој атрибут не е дел од DOM спецификацијата и е специфичен за libxml.
resolveExternals: Поставете го на true за вчитување надворешни ентитети од декларација на doctype. Ова е корисно за вклучување на карактерни ентитети во вашиот XML документ.
standalone: Застарено. Дали документот е самостоен, како што е наведено во XML декларацијата, соодветствува на xmlStandalone.
strictErrorChecking: ). Ако повикот не успее, ќе врати DOMException на грешки. Стандардно на true.
substituteEntities: Проприетарно. Дали да се заменат ентитетите. Овој атрибут не е дел од DOM спецификацијата и е специфичен за libxml. Стандардно на false.

Безбедност: стандардниот сет на знаци
Овозможувањето на замена на ентитети може да ги олесни нападите со XML надворешни ентитети (XXE).
validateOnParse: Вчитува и валидира според DTD. Стандардно на false.

Безбедност: стандардниот сет на знаци
Овозможувањето на валидирање на DTD може да ги олесни нападите со XML надворешни ентитети (XXE).
version: Застарено. Верзија на XML, соодветствува на xmlVersion.
xmlEncoding: Атрибут што специфицира, како дел од XML декларацијата, кодирањето на овој документ. Ова е null кога не е наведено или кога не е познато, како на пример кога документот е креиран во меморија.
xmlStandalone: Атрибут што специфицира, како дел од XML декларацијата, дали овој документ е самостоен. Ова е false кога не е наведено. Самостоен документ е оној каде што нема надворешни декларации за означување. Пример за таква декларација за означување е кога DTD декларира атрибут со стандардна вредност.
xmlVersion: Атрибут што го специфицира, како дел од XML декларацијата, бројот на верзијата на овој документ. Ако нема декларација и ако овој документ ја поддржува функцијата "XML", вредноста е "1.0".

Дневник на промени

Верзија	= NULL
8.4.0	`actualEncoding` and `config` се формално застарени сега.
8.0.0	DOMDocument implements DOMParentNode now.
8.0.0	Неимплементиран метод DOMDocument::renameNode() е отстрането.

Белешки

Забелешка:
DOM екстензијата користи UTF-8 кодирање. Користете mb_convert_encoding(), GNU Recode документацијата на вашата инсталација за детални инструкции за барања за прекодирање., или iconv() за ракување со други кодирања.

Забелешка:
Кога користите json_encode() на DOMDocument објект резултатот ќе биде оној на кодирање на празен објект.

Види Исто така

» W3C спецификација за Документ

Содржина

DOMDocument::adoptNode — Префрла јазол од друг документ
DOMDocument::append — Додава јазли по последниот јазол-дете
DOMDocument::__construct — Создава нов DOMDocument објект
DOMDocument::createAttribute — Создава нов атрибут
DOMDocument::createAttributeNS — Создава нов атрибут јазол со поврзан простор на имиња
DOMDocument::createCDATASection — Создава нов cdata јазол
DOMDocument::createComment — Создава нов коментар јазол
DOMDocument::createDocumentFragment — Создава нов документ фрагмент
DOMDocument::createElement — Создава нов елемент јазол
DOMDocument::createElementNS — Создава нов елемент јазол со поврзан простор на имиња
DOMDocument::createEntityReference — Создава нов ентитет референца јазол
DOMDocument::createProcessingInstruction — Создава нов PI јазол
DOMDocument::createTextNode — Создава нов текст јазол
DOMDocument::getElementById — Бара елемент со одреден ID
DOMDocument::getElementsByTagName — Бара сите елементи со дадено локално име на ознака
DOMDocument::getElementsByTagNameNS — Бара сите елементи со дадено име на ознака во одреден простор на имиња
DOMDocument::importNode — Увезува јазол во тековниот документ
DOMDocument::load — Вчитува XML од датотека
DOMDocument::loadHTML — Вчитај HTML од стринг
DOMDocument::loadHTMLFile — Вчитај HTML од датотека
DOMDocument::loadXML — Вчитај XML од стринг
DOMDocument::normalizeDocument — Нормализирај го документот
DOMDocument::prepend — Додај јазли пред првиот јазол-дете
DOMDocument::registerNodeClass — Регистрирај проширена класа што се користи за креирање основен тип на јазол
DOMDocument::relaxNGValidate — Изврши relaxNG валидација на документот
DOMDocument::relaxNGValidateSource — Изврши relaxNG валидација на документот
DOMDocument::replaceChildren — Замени ги децата во документот
DOMDocument::save — Исфрли го внатрешното XML дрво назад во датотека
DOMDocument::saveHTML — Исфрли го внатрешниот документ во стринг користејќи HTML форматирање
DOMDocument::saveHTMLFile — Исфрли го внатрешниот документ во датотека користејќи HTML форматирање
DOMDocument::saveXML — Исфрли го внатрешното XML дрво назад во стринг
DOMDocument::schemaValidate — Валидирај документ врз основа на шема. Поддржано е само XML Schema 1.0.
DOMDocument::schemaValidateSource — Валидирај документ врз основа на шема
DOMDocument::validate — Валидирај го документот врз основа на неговиот DTD
DOMDocument::xinclude — Замени XIncludes во DOMDocument објект

Белешки од корисници Пример #3 Ефект на редоследот кога се совпаѓаат повеќе кодови

Фернандо Х ¶

пред 17 години

Showing a quick example of how to use this class, just so that new users can get a quick start without having to figure it all out by themself. ( At the day of posting, this documentation just got added and is lacking examples. )

<?php

// Set the content type to be XML, so that the browser will   recognise it as XML.
header( "content-type: application/xml; charset=ISO-8859-15" );

// "Create" the document.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );

// Create some elements.
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track", "The ninth symphony" );

// Set the attributes.
$xml_track->setAttribute( "length", "0:01:15" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );

// Create another element, just to show you can add any (realistic to computer) number of sublevels.
$xml_note = $xml->createElement( "Note", "The last symphony composed by Ludwig van Beethoven." );

// Append the whole bunch.
$xml_track->appendChild( $xml_note );
$xml_album->appendChild( $xml_track );

// Repeat the above with some different values..
$xml_track = $xml->createElement( "Track", "Highway Blues" );

$xml_track->setAttribute( "length", "0:01:33" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
$xml_album->appendChild( $xml_track );

$xml->appendChild( $xml_album );

// Parse the XML.
print $xml->saveXML();

?>

Output:
<Album>
  <Track length="0:01:15" bitrate="64kb/s" channels="2">
    The ninth symphony
    <Note>
      The last symphony composed by Ludwig van Beethoven.
    </Note>
  </Track>
  <Track length="0:01:33" bitrate="64kb/s" channels="2">Highway Blues</Track>
</Album>

If you want your PHP->DOM code to run under the .xml extension, you should set your webserver up to run the .xml extension with PHP ( Refer to the installation/configuration configuration for PHP on how to do this ).

Note that this:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

is NOT the same as this:
<?php
// Will NOT work.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml_track = new DOMElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

although this will work:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml->appendChild( $xml_album );
?>

developer at nabtron dot com ¶

пред 10 години

For those landing here and checking for encoding issue with utf-8 characteres, it's pretty easy to correct it, without adding any additional output tag to your html.

We'll be utilizing: mb_convert_encoding

Thanks to the user who shared: SmartDOMDocument in previous comments, I got the idea of solving it. However I truly wish that he shared the method instead of giving a link.

Anyway coming back to the solution, you can simply use:

<?php

            // checks if the content we're receiving isn't empty, to avoid the warning
            if ( empty( $content ) ) {
                return false;
            }

            // converts all special characters to utf-8
            $content = mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8');

            // creating new document
            $doc = new DOMDocument('1.0', 'utf-8');

            //turning off some errors
            libxml_use_internal_errors(true);

            // it loads the content without adding enclosing html/body tags and also the doctype declaration
            $doc->LoadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

            // do whatever you want to do with this code now

?>

I hope it solves the issue for someone! If you need my help or service to fix your code, you can reach me on nabtron.com or contact me at the email mentioned with this comment.

andreas at userbrain dot com ¶

пред 4 години

After struggling with parsing and modifying partial HTML content for several hours, I came to this solution which does work for me and is relatively simple compared to what else I found online.

This solution fixes unwanted DOCTYPE and html, body tags as well as encoding issues.

<?php

// Assumption: content is utf-8 encoded
$content = "<h1>This is a heading</h1><p>This is a paragraph</p>";

// Load content to a div and specify encoding with a meta tag
$temp_dom = new DOMDocument();
$temp_dom->loadHTML("<meta http-equiv='Content-Type' content='charset=utf-8' /><div>$content</div>");

// As loadHTML() adds a DOCTYPE as well as <html> and <body> tag, let’s create another DOMDocument and import just the nodes we want
$dom = new DOMDocument();
$first_div = $temp_dom->getElementsByTagName('div')[0];
$first_div_node = $dom->importNode($first_div, true);
$dom->appendChild($first_div_node);

// Do whatever you want to do
$dom->getElementsByTagName('h1')[0]->setAttribute('class', 'happy');

// You could also just echo $dom->saveHtml() if you don’t mind the div and whitespace 
echo substr(trim($dom->saveHtml()), 5, -6);

// Outputs: <h1 class="happy">This is a heading</h1><p>This is a paragraph</p>
?>

jay at jaygilford dot com ¶

пред 16 години

Here's a small function I wrote to get all page links using the DOMDocument which will hopefully be of use to others

<?php
/**
 * @author Jay Gilford
 */
 
/**
 * get_links()
 * 
 * @param string $url
 * @return array
 */
function get_links($url) {
 
    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();
 
    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);
 
    // Empty array to hold all links to return
    $links = array();
 
    //Loop through each <a> tag in the dom and add it to the link array
    foreach($xml->getElementsByTagName('a') as $link) {
        $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
    }
 
    //Return the links
    return $links;
}
?>

tloach at gmail dot com ¶

пред 16 години

For anyone else who has been having issues with formatOuput not working, here is a work-around:

rather than just doing something like:

<?php
$outXML = $xml->saveXML();
?>

force it to reload the XML from scratch, then it will format correctly:

<?php
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
?>

biker dot mike at gmx dot com ¶

пред 9 години

Look out for the following gotcha when loading XML from a string:

<?php
$doc = new \DOMDocument;
$doc->documentURI = $myXmlFilename;
$doc->loadXML($myXmlString);
?>

documentURI is now set to the value of $myXmlFilename, right?

Wrong!

It's set to the current working directory.  If you want to manually set documentURI to something other than the CWD, do so AFTER the call to loadXML().

E.g.:
<?php
$doc = new \DOMDocument;
$doc->loadXML($myXmlString);
$doc->documentURI = $myXmlFilename;
?>

documentURI really is now set to the value of $myXmlFilename.

Ник М ¶

пред 14 години

You may need to save all or part of a DOMDocument as an XHTML-friendly string, something compliant with both XML and HTML 4. Here's the DOMDocument class extended with a saveXHTML method:

<?php

/**
 * XHTML Document
 *
 * Represents an entire XHTML DOM document; serves as the root of the document tree.
 */
class XHTMLDocument extends DOMDocument {

  /**
   * These tags must always self-terminate. Anything else must never self-terminate.
   * 
   * @var array
   */
  public $selfTerminate = array(
      'area','base','basefont','br','col','frame','hr','img','input','link','meta','param'
  );
  
  /**
   * saveXHTML
   *
   * Dumps the internal XML tree back into an XHTML-friendly string.
   *
   * @param DOMNode $node
   *         Use this parameter to output only a specific node rather than the entire document.
   */
  public function saveXHTML(DOMNode $node=null) {
    
    if (!$node) $node = $this->firstChild;
    
    $doc = new DOMDocument('1.0');
    $clone = $doc->importNode($node->cloneNode(false), true);
    $term = in_array(strtolower($clone->nodeName), $this->selfTerminate);
    $inner='';
    
    if (!$term) {
      $clone->appendChild(new DOMText(''));
      if ($node->childNodes) foreach ($node->childNodes as $child) {
        $inner .= $this->saveXHTML($child);
      }
    }
    
    $doc->appendChild($clone);
    $out = $doc->saveXML($clone);
    
    return $term ? substr($out, 0, -2) . ' />' : str_replace('><', ">$inner<", $out);

  }

}

?>

This hasn't been benchmarked, but is probably significantly slower than saveXML or saveHTML and should be used sparingly.

pastormontesinos at gmail dot com ¶

пред 5 години

For using safely with script nodes when parsing, best option is extending DOMDocument, keeping script tags while DOMDocument process and rearrange them just after saveHTML function is called. Here is my custom class.

<?php 

class SafeDOMDocument extends \DOMDocument
{
    const REGEX_JS            = '#(\s*<!--(\[if[^\n]*>)?\s*(<script.*</script>)+\s*(<!\[endif\])?-->)|(\s*<script.*</script>)#isU';
    const SUBSTITUTION_FORMAT = '<!--<script class="script_%s"></script>-->';
    private $matchedScripts = [];

    public function loadHTML($source, $options = 0)
    {
        $this->formatOutput        = false;
        $this->preserveWhiteSpace  = true;
        $this->validateOnParse     = false;
        $this->strictErrorChecking = false;
        $this->recover             = false;
        $this->resolveExternals    = false;
        $this->substituteEntities  = false;
        $matches = [];
        $success = preg_match_all(self::REGEX_JS, $source, $matches);

        if ($success && !empty($matches)) {
            foreach ($matches[0] as $match) {
                $storedScript = rtrim(ltrim($match, "\n\r\t "), "\n\r\t ");
                $scriptId = md5($storedScript);
                $key = sprintf(self::SUBSTITUTION_FORMAT, $scriptId);
                $source = str_replace($match, $key, $source);
                $this->matchedScripts[$key] = $storedScript;
            }
        }

        return parent::loadHTML($source, $options);
    }

    public function saveHTML(DOMNode $node = null)
    {
        $output = parent::saveHTML($node);

        if (count($this->matchedScripts)) {
            foreach ($this->matchedScripts as $substitution => $originalSnippet) {
                $output = str_replace($substitution, $originalSnippet, $output);
            }
        }

        return $output;
    }
}
?>

fcartegnie ¶

пред 16 години

Be careful with formatOutput().

Creating an empty node like this:
createElement('foo','')
instead of
createElement('foo')
will break formatOutput.

evert at er dot nl ¶

пред 15 години

A nice and simple node 2 array I wrote, worth a try ;) 

<?php
function getArray($node)
{
    $array = false;

    if ($node->hasAttributes())
    {
        foreach ($node->attributes as $attr)
        {
            $array[$attr->nodeName] = $attr->nodeValue;
        }
    }

    if ($node->hasChildNodes())
    {
        if ($node->childNodes->length == 1)
        {
            $array[$node->firstChild->nodeName] = $node->firstChild->nodeValue;
        }
        else
        {
            foreach ($node->childNodes as $childNode)
            {
                if ($childNode->nodeType != XML_TEXT_NODE)
                {
                    $array[$childNode->nodeName][] = $this->getArray($childNode);
                }
            }
        }
    }

    return $array;
}
?>

devour at php dot net ¶

пред 1 година

While DOMDocument can technically be used to parse HTML, it is not ideal for HTML documents and is better suited for processing well-formed XML. One of the primary issues with using DOMDocument for HTML is its strict handling of special characters, such as the ampersand (&).

DOMDocument requires that ampersands be escaped as &amp;, which is in line with XML standards but can be counterintuitive for handling real-world HTML, where raw & characters are commonly found, especially in URLs and text. This behavior stems from the underlying XML-based parser (libxml), which treats HTML with the same strictness as XML.

This problem has been reported as far back as 2001, yet the same parsing errors continue to occur when using DOMDocument on HTML documents today.

A common workaround developers use is to suppress the error reporting from DOMDocument, particularly when parsing errors like unescaped ampersands occur. However, suppressing these errors is not recommended, especially in production environments, as it can hide important issues and pose potential security risks. Ignoring or suppressing errors can leave warnings unnoticed, which may result in vulnerabilities if not properly addressed.

For these reasons, it's advisable to use DOMDocument primarily for XML documents, or to consider more appropriate libraries  when working with HTML to avoid these issues.

theCoder / MV

cmyk777 at gmail dot com ¶

пред 16 години

This function may help to debug current dom element:

<?php
function dom_dump($obj) {
    if ($classname = get_class($obj)) {
        $retval = "Instance of $classname, node list: \n";
        switch (true) {
            case ($obj instanceof DOMDocument):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->saveXML($obj);
                break;
            case ($obj instanceof DOMElement):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
                break;
            case ($obj instanceof DOMAttr):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
                //$retval .= $obj->ownerDocument->saveXML($obj);
                break;
            case ($obj instanceof DOMNodeList):
                for ($i = 0; $i < $obj->length; $i++) {
                    $retval .= "Item #$i, XPath: {$obj->item($i)->getNodePath()}\n".
"{$obj->item($i)->ownerDocument->saveXML($obj->item($i))}\n";
                }
                break;
            default:
                return "Instance of unknown class";
        }
    } else {
        return 'no elements...';
    }
    return htmlspecialchars($retval);
}
?>

Example usage:

<?php
$dom = new DomDocument();
$dom->load('test.xml');
$body = $dom->documentElement->getElementsByTagName('book');
echo '<pre>'.dom_dump($body).'<pre>';
?>

Output:

Instance of DOMNodeList, node list: 
Item #0, XPath: /library/book[1]
<book isbn="0345342968">
<title>Fahrenheit 451</title>
<author>R. Bradbury</author>
<publisher>Del Rey</publisher>
</book>
Item #1, XPath: /library/book[2]
<book isbn="0048231398">
<title>The Silmarillion</title>
<author>J.R.R. Tolkien</author>
<publisher>G. Allen &amp; Unwin</publisher>
</book>
Item #2, XPath: /library/book[3]
<book isbn="0451524934">
<title>1984</title>
<author>G. Orwell</author>
<publisher>Signet</publisher>
</book>
Item #3, XPath: /library/book[4]
<book isbn="031219126X">
<title>Frankenstein</title>
<author>M. Shelley</author>
<publisher>Bedford</publisher>
</book>
Item #4, XPath: /library/book[5]
<book isbn="0312863551">
<title>The Moon Is a Harsh Mistress</title>
<author>R. A. Heinlein</author>
<publisher>Orb</publisher>
</book>

sites.sitesbr.net ¶

пред 13 години

How to objetify a DomDocument with hierarchy like:
<root>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
</root>

It's possible to use in object style to retrieve information, as:

<?php
     $theNodeValue = $aitem->prop1;
?>

Here is the code: one Class and 2 functions.

<?php
 class ArrayNode{
       public $nodeName, $nodeValue;
 }

 function getChildNodeElements( $domNode ){
     $nodes = array();
     for( $i=0; $i < $domNode->childNodes->length; $i++){
       $cn = $domNode->childNodes->item($i);
       if( $cn->nodeType == 1){
           $nodes[] = $cn;
           }
     }
    return $nodes;
 }

 function getArrayNodes( $domDoc ){
     $res = array();

       for( $i=0; $i < $domDoc->childNodes->length; $i++){
       $cn = $domDoc->childNodes->item($i);
       # The first is the root tag...
          if( $cn->nodeType == 1){
               # But we want it's childNodes.
                $sub_cn = getChildNodeElements( $cn);
                # Found the tagName:
                $baseItemTagName = $sub_cn[0]->nodeName;
                break;
            }
        }

       $dnl = $domDoc->getElementsByTagName( $baseItemTagName);

       for( $i=0; $i< $dnl->length; $i++){
          $arrayNode = new ArrayNode();

      # Summary
      $arrayNode->nodeName = $dnl->item($i)->nodeName;
      $arrayNode->nodeValue = $dnl->item($i)->nodeValue;

      # Child Nodes
      $cn = $dnl->item($i)->childNodes;
      for( $k=0; $k<$cn->length; $k++){
           if( $cn->item($k)->nodeName == "#text" && trim($cn->item($k)->nodeValue) == "") continue;
           $arrayNode->{$cn->item($k)->nodeName} = $cn->item($k)->nodeValue;
      }

      # Attributes
      $attr = $dnl->item($i)->attributes;
      for( $k=0; $k < $attr->length; $k++){
           if(! is_null($attr)){
            if( $attr->item($k)->nodeName == "#text" && trim($attr->item($k)->nodeValue) == "") continue;
            $arrayNode->{$attr->item($k)->nodeName} = $attr->item($k)->nodeValue;
           }
      }

      $res[] = $arrayNode;

       }

     return $res;
 }
?>

To use it:

<?php

  # First you load a XML in a DomDocument variable.

   $url = "/path/to/yourxmlfile.xml";
   $domSrc = file_get_contents($url);
   $dom = new DomDocument();
   $dom->loadXML( $domSrc );

  # Then, you get the ArrayNodes from the DomDocument.

    $ans = getArrayNodes( $dom );

 
    for( $i=0; $i < count( $ans ) ; $i++){

    $cn =  $ans[ $i];

    $info1 =  $cn->prop1;
    $info2 =  $cn->prop2;
    $info3 =  $cn->prop3;
      
         // ...
 
   }

?>

610010559 на qq точка com ¶

пред 4 години

when you add the new element to formatted XML data through appendChild() method, you would the new element you add is not be formatted(that is not indexed, not line break).  here is my solution (in short load the xml without preserve white space, ), example show as below:
<?php
$doc = new \DOMDocument();
$doc->formatOutput = true;
$doc->preserveWhiteSpace = false;//that is key, default value is true. 
$doc->loadXML($xmlStr);
$doc->appendChild($doc->createElement('php', '666'))
$formattedXMLStr = $doc->saveXML();//DOMDocument wold format the xml str for you
echo $formattedXMlStr;
?>
it take me some time to try it out. hope it save your time.

ashjkshdu283 на gmail точка com ¶

пред 7 години

/* Function evolved from jay at jaygilford dot com post
  * This function will return an array of the values of the specified
  * attribute ($attr) for all the Dom Document object's elements 
  */

<?php

function getAttrData(string $attr, DomDocument $dom) { 
    // Empty array to hold all classes to return 
    $attrData = array(); 

    //Loop through each tag in the dom and add it's attribute data to the array 
    foreach($dom->getElementsByTagName('*') as $tag) {
        if(empty($tag->getAttribute($attr)) === false) {
            array_push($attrData, $tag->getAttribute($attr));
        }
    } 

    //Return the array of attribute data
    return array_unique($attrData); 
}

$html = '
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<a href="#someLink" id="someLink" class="link-class">Some Link</a>
<a href="#someOtherLink" id="someOtherLink" class="link-class">Some Other Link</a>
<h1 id="header1" class="header-class">My First Heading</h1>
<p id="para1" class="para-class">My first paragraph.</p>
</body>
</html>';
$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->saveHTML();
var_dump(getAttrData('class', $dom));

ingjetel на gmail точка com ¶

пред 10 години

Easy function for basic output of XML file via DOM parsing

<?php
$dom = new DomDocument();
$dom->load("./file.xml") or die("error");
$start = $dom->documentElement;
fc($start);

function fc($node) {
  $child = $node->childNodes;
  foreach($child as $item) {
    if ($item->nodeType == XML_TEXT_NODE) {
      if (strlen(trim($item->nodeValue))) echo trim($item->nodeValue)."<br/>";
    }
    else if ($item->nodeType == XML_ELEMENT_NODE) fc($item);
  }
}
?>

администратор на beerpla точка net ¶

пред 16 години

After seeing many complaints about certain DOMDocument shortcomings, such as bad handling of encodings and always saving HTML fragments with <html>, <head>, and DOCTYPE, I decided that a better solution is needed.

So here it is: SmartDOMDocument. You can find it at http://beerpla.net/projects/smartdomdocument/

Currently, the main highlights are:

- SmartDOMDocument inherits from DOMDocument, so it's very easy to use - just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).

- saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want - it saves HTML without adding that extra garbage that DOMDocument does.

- encoding fix - DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you - just use loadHTML() as you would normally.

- SmartDOMDocument Object As String - you can use a SmartDOMDocument object as a string which will print out its contents.
For example:
<?php
echo "Here is the HTML: $smart_dom_doc";
?>

I'm going to maintain this code and try to fix bugs as they come in.

Enjoy.

danny точка nunez15 на gmail точка com ¶

12 години пред

A simple function to grab all links in a page. 

    function get_links($url) {

        // Create a new DOM Document to hold our webpage structure 
        $xml = new DOMDocument();

        // Load the url's contents into the DOM 

        $xml->loadHTMLFile($url);

        // Empty array to hold all links to return 
        $links = array();

        //Loop through each <a> tag in the dom and add it to the link array 
        foreach ($xml->getElementsByTagName('a') as $link) {
            $url = $link->getAttribute('href');
            if (!empty($url)) {
                $links[] = $link->getAttribute('href');
            }
        }

        //Return the links 
        return $links;
    }

qrworld.net ¶

пред 11 години

In this post http://softontherocks.blogspot.com/2014/11/descargar-el-contenido-de-una-url_11.html I found a simple way to get the content of a URL with DOMDocument, loadHTMLFile and saveHTML().

function getURLContent($url){
    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = FALSE;
    @$doc->loadHTMLFile($url);
    return $doc->saveHTML();
}

На оваа страница

Автоматски outline од активната документација.

Насловите ќе се појават тука по вчитување.

Попрегледно читање

Примерите, changelog табелите и user notes се визуелно издвоени за да не се губат во долгата содржина.

Брз совет Користи го outline-от Скокни директно на главните секции од активната страница.

Извор Оригиналниот линк останува достапен Кога ти треба целосен upstream context, отвори го PHP.net во нов tab.

PHP.mk документација

PHP.mk интерфејс

Македонска PHP документација со читлив интерфејс, белешки и прегледна навигација.

Отвори документација

HTTP сервиси

PHP.mk интерфејс

Scrape, IP2Geo, Open Graph и Feed2JSON се достапни во истиот прегледен интерфејс.

Каталог на сервиси

Регистрирај домен

PHP.mk интерфејс

Провери слободен поддомен без најава, а регистрацијата продолжи ја по LoginJet најава.

Провери домен

PHP.mk · 2026 · Македонска PHP документација со локален интерфејс.

DOMDocument

DOMDocument

Класата DOMDocument

Вовед

Синопсис на класата

Својства

Дневник на промени

Белешки

Види Исто така

Содржина

Белешки од корисници Пример #3 Ефект на редоследот кога се совпаѓаат повеќе кодови

Навигација

На оваа страница

Попрегледно читање

PHP.mk документација

HTTP сервиси

Регистрирај домен