HTML scraper API

HTML scraper API / HTML parsing service - documentation

 

Basic information

Service that parses HTML into JSON structure using xPath selector logic.

 

Token and service authentication

You need to be logged in user before you can use the service.

We simplified the login / registration in the easiest way possible. Only you need to sign in with your Google / Gmail account to continue. After logging into the token field (below) you will be able to use your personal token on PHP.mk services.

Personal Token

Login to show token!

 

Using the Service v1.0

Version ({version}): v1.0

Link to the parsing page ({url}): ?url= e.g. "https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php"

Selector ({rules}): ?rules= пр. "div|class=article-info:0,h1|class=specs-phone-name-title:0"


Syntax of rules parameter: tag|attribute=value:index,child_tag|attribute:index

e.g.

<div class="article-info">

   <p>Something...</p>

   <h1 class="specs-phone-name-title">Samsung Galaxy S10 5G</h1>

</div>

The rules for the HTML above would be: div|class=article-info:0,h1|class=specs-phone-name-title:0

service end-point

HTTP/GET

https://api.php.mk/scrape/{version}?token={token}&rules={rules}&url={url}

Example HTTP/GET request

https://api.php.mk/scrape/v1.0?token=&rules=div|class=article-info:0,h1|class=specs-phone-name-title:0&url=https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php

Example PHP

$url='https://api.php.mk/scrape/v1.0?token=&rules=div|class=article-info:0,h1|class=specs-phone-name-title:0&url=https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php';
$jsonResponse=file_get_contents($url);
$response= json_decode($jsonResponse,true);
            
echo '<pre>';
print_r($response['data']);
exit();

Example cUrl

curl -X GET  'https://api.php.mk/scrape/v1.0?token=&rules=div|class=article-info:0,h1|class=specs-phone-name-title:0&url=https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php';

Пример Javascript

var xhr = new XMLHttpRequest();
xhr.addEventListener("readystatechange", function () {
  if (this.readyState === 4) {
    var response_data=JSON.parse(this.responseText).data;
    console.log(response_data);
  }
});
xhr.open("GET", "https://api.php.mk/scrape/v1.0?token=&rules=div|class=article-info:0,h1|class=specs-phone-name-title:0&url=https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php");
xhr.send();

Пример jQuery / JavaScript

$.getJSON('https://api.php.mk/scrape/v1.0?token=&rules=div|class=article-info:0,h1|class=specs-phone-name-title:0&url=https://www.gsmarena.com/samsung_galaxy_s10_5g-9588.php',function(r){

    console.log(r.data);

}).error(function(error){

    console.log(error.responseJSON.msg);

});

 

Service response

The answer is always in JSON format

The response from the service is the content of the HTML selector in JSON format.

Example response

{
    "error": false,
    "status_text": "OK",
    "status_code": 200,
    "data": {
        "tag": "h1",
        "class": "specs-phone-name-title",
        "data-spec": "modelname",
        "text": "Samsung Galaxy S10 5G"
    }
}