Home » Android » My HTML fetcher program in java returns incomplete results

My HTML fetcher program in java returns incomplete results

Posted by: admin November 1, 2017 Leave a comment


My java code is:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class celebGrepper {

    static class CelebData {
        URL link;
        String name;

        CelebData(URL link, String name) {

    public static String grepper(String url) {
        URL source;
        String data = null;

        try {
            source = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) source.openConnection();

            InputStream is = connection.getInputStream();

             * Attempting to fetch an entire line at a time instead of just a character each time!
            StringBuilder str = new StringBuilder();
            BufferedReader br = new BufferedReader(new InputStreamReader(is));

            while((data = br.readLine()) != null)


        } catch (IOException e) {

        return data;

    public static ArrayList<CelebData> parser(String html) throws MalformedURLException {
        ArrayList<CelebData> list = new ArrayList<CelebData>();

        Pattern p = Pattern.compile("<td class=\"image\".*<img src=\"(.*?)\"[\s\S]*<td class=\"name\"><a.*?>([\w\s]+)<\/a>");
        Matcher m = p.matcher(html);

        while(m.find()) {
            CelebData current = new CelebData(new URL(m.group(1)),m.group(2));

        return list;

    public static void main(String... args) throws MalformedURLException {
        String html = grepper("https://www.forbes.com/celebrities/list/");
        System.out.println("RAW Input: "+html);
        System.out.println("Start Grepping...");
        ArrayList<CelebData> celebList = parser(html);
        for(CelebData item: celebList) {
            System.out.println("Name:\t\t "+item.name);
            System.out.println("Image URL:\t "+item.link+"\n");
        System.out.println("Grepping Done!");


It’s supposed to fetch the entire HTML content of https://www.forbes.com/celebrities/list/. However, when I compare the actual result below to the original page, I find the entire table that I need is missing! Is it because the page isn’t completely loaded when I start getting the bytes from the page via the input stream? Please help me understand.

The Output of the page:


What can I do to just extract the Image link and the names of the celebs?

I know it’s an extremely bad practice to try to parse HTML using regex and is the stuff of nightmares, but on a certain video training course for android, that’s exactly what the guy did, and I just wanna follow along since it’s just in this one lesson.