Home » Php » Fetch website HTML, then find & copy columns and rows: PHP

Fetch website HTML, then find & copy columns and rows: PHP

Posted by: admin February 25, 2020 Leave a comment

Questions:

I’m fetching the entire HTML code of a website using file_get_contents and saving it into a variable.

The content of this website is time-based and updates frequently. I need to run a script that will fetch specific columns and rows from its HTML which I then plan to turn into human-readable text form.

My problem now is that I have little knowledge of what method I should use to scan the HTML, find the columns and rows I’m looking to extract and only save it if there has been an update to it since the previous time the script was run.

How to&Answers:

If you are familiar with DOM traversal, try using DOMDocument::loadHTML. Then use the other DOMDocument methods to get at the information you need.

Here is some example xml:

<!DOCTYPE html>
<!-- test.html -->
<html><body>
    <table id = "target_table"><tbody>
        <tr><td>this</td><td>something</td></tr>
        <tr><td>is</td><td>in</td></tr>
        <tr><td>a</td><td>a</td></tr>
        <tr><td>test</td><td>column</td></tr>
    </tbody></table>
    <table><tbody>
        <tr><td>ignore</td><td>this</td></tr>
        <tr><td>table</td><td>.</td></tr>
    </tbody></table>
</body></html>

This will grab all of the rows in a specific table and dump their text content:

<?php

$string = file_get_contents("test.html");
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[@id=\"target_table\"]/*/tr");

foreach ($elements as $element) {
  echo $element->textContent, PHP_EOL;
}