Home » Php » PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements

PHP Jquery:Convert HTML to JSON from given url and create a tree view of html elements

Posted by: admin November 29, 2017 Leave a comment

Questions:

Basically I have a textbox where I’ll enter URL and click “OK button”, It will show preview of HTML at left side of page; and right side will have a tree view of HTML tags (body, header, div, span, etc.) used in HTML as attached image. Expected JSON result should be as end of this question. I am failing traversing JSON and creating tree. I tried the following:

HTML and JS code:

<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ABC</title>
<link rel="stylesheet" type="text/css" href="css/main.css" />
</head>
<body>
<div id="wrapper">
    <header>
        <h1 class="logo"><img src="images/logo.png" alt="" title="" /></h1>
    </header>
    <div id="container">
        <div class="search-box">
            <input type="text" id="url" value="" class="txt-box" />
            <input type="button" value="OK" class="btn-search" />
        </div>
        <div class="inner-wrap">
            <div class="left-wrap" id="preview-sec">

            </div>
            <div class="right-wrap" id="tree-sec">

            </div>
        </div>
    </div>    
</div>

<script type="text/javascript" language="javascript" src="js/jquery-1.11.1.js"></script><!-- Jquery plugin -->
<script>
var counter = 0;
$(document).ready(function(){
    $('.btn-search').click(function(){
        if ($('#url').val() != '') {
            $.get(
                'http://localhost/test/getHTML.php', {url:$('#url').val()},
                function(response) {
                    $('#preview-sec').html(response);
            },'html');
            $.getJSON('http://localhost/test/results.json', function(json) {    
                traverse(json,0);               
            });
        }
    });
});
function traverse(obj,id){
    if (typeof(obj)=="object") {
        if (id == 0) {
            $('#tree-sec').append('<ul></ul>');
        } else {
            $(id).append('<ul></ul>');
        }
        $.each(obj, function(i,val){
            if (i != 'attributes' && i != 'value') {
                counter += 1;
                var li_populate = "<li id="+i+"-"+counter+">"+i+"</li>"; 
                if (id == 0) {
                    $('#tree-sec ul').append(li_populate);
                } else {
                    $(id).find('ul').append(li_populate);
                }
                traverse(val,"#"+i+"-"+counter);
            }
        })
    }
}
</script>
</body>
</html>

PHP code:

<?php
    $url = $_GET['url'];
    $html = file_get_contents($url);
    function html_to_obj($html) {
        $dom = new DOMDocument();
        $dom->loadHTML($html);
        return element_to_obj($dom->documentElement);
    }

    function element_to_obj($element) {
        //print_r($element);
        $obj = array();
        $attr = array();
        $arr = array();
        $name = $element->tagName;
        foreach ($element->attributes as $attribute) {
            $attr[$attribute->name] = $attribute->value;
            if ($attribute->name == 'id') {
                $name .= '#'.$attribute->value;
            }
        }
        if (!empty($attr)) {
            $arr["attributes"] = $attr;
        }
        if ($element->nodeValue != '') {
            $arr["value"] = $element->nodeValue;
        }

        foreach ($element->childNodes as $subElement) {         
            if ($subElement->nodeType == XML_TEXT_NODE) {

            }
            elseif ($subElement->nodeType == XML_CDATA_SECTION_NODE) {

            }
            else {
                $arr["child_nodes"][] = element_to_obj($subElement);
            }
        }
        $obj[$name] = $arr;
        return $obj;
    }
    $json = json_encode(html_to_obj($html));
    $fp = fopen('results.json', 'w');
    fwrite($fp,$json);
    fclose($fp);
    echo $html;exit();
?>

JSON tree output:

enter image description here

JSON Result:

Answers:

As per your question, the part where you traverse the returned json object and create the tree is problematic. In your code, the recursive function to traverse the json data had a few minor issues with the generate ul code. The structure of the return object made it a bit challenging.

I was able to modify your html/javascript code a bit (without changing it too much) to print out the tree. The relevant code below:

CSS:

HTML & JS:

This should provide a properly nested ul based tree. If creating an image of the tree is a hard requirement, you’re best bet is to properly style the generated ul code fragment, create an html page with it on the server and then use a server side tool such as wkhtmltoimage from the wkhtmltopdf package that can be used to render the html document into an image.

Also, one other thing I would like to mention is that instead of loading the retrieved html into a div, I would recommend that you use an iframe as then, the retrieved html would not interfere with your current page. In my example above, I have added an iframe in the preview div. In such a case, you can use php to only output the json data and setting the iframe to preview the url would be as simple as assigning the url as the src attribute of the iframe. Like this: $("#preview").prop("src", $("#url").val()).

Edit:
Updated code with a fix. Also added a new js function makeCollapsible() to retro-actively convert the ul into a clickable, collapsible tree structure as per OP’s comment. Also added relevant CSS styles to style the tree structure. The tree now looks like the below picture for me:

Collapsible, Clickable HTML Tree!

Questions:
Answers:

Addendum: This is a long answer, but it addresses specific problems and solutions for the code snippets you provided. I hope you and others will find it worth the time to compare. 🙂


First, modify your PHP to make cleaner JSON

When parsing the DOM, I recommend setting the element names from the returned object as the associative keys in $arr['child_nodes'] using array_merge() instead of pushing them onto the array as indexed items. To do this, $arr['child_nodes'] must be defined as an array first. Later, if no items get merged into it, you simply unset it before $arr gets added to the main object.

This makes the final JSON result simpler to parse by precluding the need to use a nested loop in your javascript when building the tree.

I also recommend inserting conditional checks for ->length before doing foreach loops. Your existing code was throwing “Warning” messages when zero-length elements entered into the loop.

Lastly, you may choose to simplify your logic for handling node types by replacing your current if, else if, else statement with a single if checking for $subElement->nodeType === XML_ELEMENT_NODE, which I think is what you’re trying to accomplish.

Use an iframe

Insert an empty iframe into which you will load your target site. Inserting the markup from another site into yours can (and will likely) cause conflicts with your own code.

Simplify the traverse function, reorder async calls

The traverse function was suffering from three flaws:

  1. The use of a counter to create ids on the fly and then use jQuery to find previously made elements with those ids on which to append list items was a performance drain and confusing to debug.

  2. The use of .find() resulted in jQuery redundantly entering the recursive call and appending multi-redundant child nodes to the tree.

  3. Because it is the callback of a separate asynchronous call, it could execute before the first asynchronous call to getHTML.php had finished.

Move the async call to get the JSON into the callback function on the first async call to prevent it from fetching incomplete or old JSON from the server.

You should also use this first callback to set the iframe src and empty the #tree-sec container, so that subsequent actions don’t append more than one tree. You could accomplish the same thing by using .replace() instead of .empty() followed by .append().

To build the tree, I recommend the following simpler approach, which recursively builds the list as a string so that the .append method is only called once. For larger trees, this will dramatically improve performance.

You may introduce a counter and dynamically assigned ids to this function if you want to, but I left it out to demonstrate more clearly that they are not needed to build the tree.

I also recommend checking for the existence of child nodes before entering a recursive call. Doing this check allows you to pass in only the child nodes object, which – because of the new JSON resulting from changes made to the PHP script – now contains tag names as the keys instead of indexed keys with the elements as children. If we hadn’t simplified the JSON, a second loop would have been required at this point to retrieve each element.

You’ll also notice the inclusion of aria- attributes and role attributes. This sets you up to be fully accessible if you choose.

See: Using the WAI-ARIA aria-expanded state to mark expandable and collapsible regions (w3.org)

It also provides you with a convenient and semantic way to control CSS and toggle state, which you can see demonstrated in the additional click handler added at the bottom of the script and the CSS example at the bottom of this answer.

Bonus: Use attribute selectors in the CSS

Finally, as mentioned above, the existence of the aria- and role attributes provides a semantic and convenient way to control the styles.

Questions:
Answers:

Check out this Library which is wrote by Jack.

https://github.com/Jxck/html2json

Hope it helps you.

Questions:
Answers:

Look at XSLT processing. works fine with much less code effort

Questions:
Answers:

Construct Multi-dimension PHP array for your HTML tags and then give that array as input to the php inbuilt function json_encode($array) and this returns tree structured json output