Home » Php » web scraping – Can Scrapy work on PHP?

web scraping – Can Scrapy work on PHP?

Posted by: admin July 12, 2020 Leave a comment

Questions:

Can I use Scrapy on PHP or are there similar tools that work with PHP?

I am not a technical person but just researching the available web scraping tools and their features to support my technical colleagues.

How to&Answers:

Scrapy is for python and you can’t use that in PHP.

However, in PHP you can use Goutte to do this job. It uses Guzzle HTTP and Symfony components like BrowserKit and DomCrawler behind the scenes to do this job.

Check this out:

use Goutte\Client;

$client = new Client();

// Go to the symfony.com website
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    echo $node->text().'\n';
});

More on usage

PS: Please do note that it doesn’t do JavaScript.

Answer:

You can check PHP SimpleTest’s ScriptableBrowser

Answer:

You can’t write Scrapy spiders using PHP.

Nevertheless, it’s very usual to use Scrapy (writing spiders in Python) and store the extracted data in a database or something accessible by your application. For example, it’s fairly easy to store the extracted items directly to ElasticSearch and make your application query ES to search/filter/aggregate the data.

But, if your colleagues don’t know Python they will need to spend some time learning the language and then the Scrapy framework.