Home » Php » php – 999 Error Code on HEAD request to LinkedIn

php – 999 Error Code on HEAD request to LinkedIn

Posted by: admin April 23, 2020 Leave a comment

Questions:

We’re using a curl HEAD request in a PHP application to verify the validity of generic links. We check the status code just to make sure that the link the user has entered is valid. Links to all websites have succeeded, except LinkedIn.

While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We’ve tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?

A sample HEAD request:

curl -I --url https://www.linkedin.com/company/linkedin

Sample Response on Ubuntu machine:

HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html

To respond to @alexandru-guzinschi a little better. We’ve tried masking the User Agents. To sum up our trials:

  • Mac machine + Mac UA => works
  • Mac machine + Windows UA => works
  • Ubuntu remote machine + (no UA change) => fails
  • Ubuntu remote machine + Mac UA => fails
  • Ubuntu remote machine + Windows UA => fails
  • Ubuntu local virtual machine (on Mac) + (no UA change) => fails
  • Ubuntu local virtual machine (on Mac) + Windows UA => works
  • Ubuntu local virtual machine (on Mac) + Mac UA => works

So now I’m thinking they block any curl requests that dont provide an alternate UA and also block hosting providers?

Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?

How to&Answers:

It looks like they filter requests based on the user-agent:

$ curl -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 999 Request denied

$ curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 200 OK

Answer:

I found the workaround,
important to set accept-encoding header:

curl --url "https://www.linkedin.com/in/izman" \
--header "user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36" \
--header "accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header "accept-encoding:gzip, deflate, sdch, br" \
| gunzip

Answer:

Seems like LinkedIn filter both user agent AND ip address. I tried this both at home and from an Digital Ocean node:

curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin

From home I got a 200 OK, from DO I got 999 Denied…

So you need a proxy service like HideMyAss or other (haven’t tested it so I couldn’t say if it’s valid or not). Here is a good comparison of proxy services.

Or you could setup a proxy on your home network, for example use a Raspberry PI to proxy your requests. Here is a guide on that.

Answer:

Proxy would work, but I think there’s another way around it. I see that from AWS and other clouds that it’s blocked by IP. I can issue the request from my machine and it works just fine.

I did notice that in the response from the cloud service that it returns some JS that the browser has to execute to take you to a login page. Once there, you can login and access the page. The login page is only for those accessing via a blocked IP.

If you use a headless client that executes JS, or maybe go straight to the subsequent link and provide the credentials of a linkedin user, you may be able to bypass it.

Answer:

LinkedIn does not allow direct access. They have blacklisted Heroku/AWS IP address and the only way to access the data is to use their APIs. it can be accessed from the local machine or headless browser if you want to scrap LinkedIn or you can use proxy to scrap LinkedIn because LinkedIn has blocked many servers IPs