Home » Javascript » Can a website detect when you are using selenium with chromedriver?

Can a website detect when you are using selenium with chromedriver?

Posted by: admin November 30, 2017 Leave a comment

Questions:

I’ve been testing out Selenium with Chromedriver and I noticed that some pages can detect that you’re using Selenium even though there’s no automation at all. Even when I’m just browsing manually just using chrome through Selenium and Xephyr I often get a page saying that suspicious activity was detected. I’ve checked my user agent, and my browser fingerprint, and they are all exactly identical to the normal chrome browser.

When I browse to these sites in normal chrome everything works fine, but the moment I use Selenium I’m detected.

In theory chromedriver and chrome should look literally exactly the same to any webserver, but somehow they can detect it.

If you want some testcode try out this:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

If you browse around stubhub you’ll get redirected and ‘blocked’ within one or two requests. I’ve been investigating this and I can’t figure out how they can tell that a user is using Selenium.

How do they do it?

EDIT UPDATE:

I installed the Selenium IDE plugin in Firefox and I got banned when I went to stubhub.com in the normal firefox browser with only the additional plugin.

EDIT:

When I use Fiddler to view the HTTP requests being sent back and forth I’ve noticed that the ‘fake browser\’s’ requests often have ‘no-cache’ in the response header.

EDIT:

results like this Is there a way to detect that I'm in a Selenium Webdriver page from Javascript suggest that there should be no way to detect when you are using a webdriver. But this evidence suggests otherwise.

EDIT:

The site uploads a fingerprint to their servers, but I checked and the fingerprint of selenium is identical to the fingerprint when using chrome.

EDIT:

This is one of the fingerprint payloads that they send to their servers

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionModule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContentDecryptionModuleapplication/x-ppapi-widevine-cdm","4":"NativeClientExecutableapplication/x-nacl","5":"PortableNativeClientExecutableapplication/x-pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"TrebuchetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationMono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

Its identical in selenium and in chrome

EDIT:

VPNs work for a single use but get detected after I load the first page. Clearly some javascript is being run to detect Selenium.

Answers:

Basically the way the selenium detection works, is that they test for pre-defined javascript variables which appear when running with selenium. The bot detection scripts usually look anything containing word “selenium” / “webdriver” in any of the variables (on window object), and also document variables called $cdc_ and $wdc_. Of course, all of this depends on which browser you are on. All the different browsers expose different things.

For me, I used chrome, so, all that I had to do was to ensure that $cdc_ didn’t exist anymore as document variable, and voila (download chromedriver source code, modify chromedriver and re-compile $cdc_ under different name.)

this is the function I modified in chromedriver:

call_function.js:

function getPageCache(opt_doc) {
  var doc = opt_doc || document;
  //var key = '$cdc_asdjflasutopfhvcZLmcfl_';
  var key = 'randomblabla_';
  if (!(key in doc))
    doc[key] = new Cache();
  return doc[key];
}

(note the comment, all I did I turned ‘$cdc_’ to ‘randomblabla_’.

Here is a pseudo-code which demonstrates some of the techniques that bot networks might use:

runBotDetection = function () {
    var documentDetectionKeys = [
        "__webdriver_evaluate",
        "__selenium_evaluate",
        "__webdriver_script_function",
        "__webdriver_script_func",
        "__webdriver_script_fn",
        "__fxdriver_evaluate",
        "__driver_unwrapped",
        "__webdriver_unwrapped",
        "__driver_evaluate",
        "__selenium_unwrapped",
        "__fxdriver_unwrapped",
    ];

    var windowDetectionKeys = [
        "_phantom",
        "__nightmare",
        "_selenium",
        "callPhantom",
        "callSelenium",
        "_Selenium_IDE_Recorder",
    ];

    for (const windowDetectionKey in windowDetectionKeys) {
        const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
        if (window[windowDetectionKeyValue]) {
            return true;
        }
    };
    for (const documentDetectionKey in documentDetectionKeys) {
        const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
        if (window['document'][documentDetectionKeyValue]) {
            return true;
        }
    };

    for (const documentKey in window['document']) {
        if (documentKey.match(/$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
            return true;
        }
    }

    if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;

    if (window['document']['documentElement']['getAttribute']('selenium')) return true;
    if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
    if (window['document']['documentElement']['getAttribute']('driver')) return true;

    return false;
};

Questions:
Answers:

As we’ve already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called “Distil Networks” in play here. And, according to the company CEO’s interview:

Even though they can create new bots, we figured out a way to identify
Selenium the a tool they’re using, so we’re blocking Selenium no
matter how many times they iterate on that bot
. We’re doing that now
with Python and a lot of different technologies. Once we see a pattern
emerge from one type of bot, then we work to reverse engineer the
technology they use and identify it as malicious.

It’ll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:

  • it’s not related to the actions you take with selenium – once you navigate to the site, you get immediately detected and banned. I’ve tried to add artificial random delays between actions, take a pause after the page is loaded – nothing helped
  • it’s not about browser fingerprint either – tried it in multiple browsers with clean profiles and not, incognito modes – nothing helped
  • since, according to the hint in the interview, this was “reverse engineering”, I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver

Decided to post it as an answer, since clearly:

Can a website detect when you are using selenium with chromedriver?

Yes.


Also, what I haven’t experimented with is older selenium and older browser versions – in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let’s detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It’s just a theory that needs to be tested.

Questions:
Answers:

Example of how it’s implemented on wellsfargo.com:

try {
 if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
 if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
 if ("__webdriver_script_fn" in document) return !+""

Questions:
Answers:

It sounds like they are behind a web application firewall. Take a look at modsecurity and owasp to see how those work. In reality, what you are asking is how to do bot detection evasion. That is not what selenium web driver is for. It is for testing your web application not hitting other web applications. It is possible, but basically, you’d have to look at what a WAF looks for in their rule set and specifically avoid it with selenium if you can. Even then, it might still not work because you don’t know what WAF they are using. You did the right first step, that is faking the user agent. If that didn’t work though, then a WAF is in place and you probably need to get more tricky.

Edit:
Point taken from other answer. Make sure your user agent is actually being set correctly first. Maybe have it hit a local web server or sniff the traffic going out.

Questions:
Answers:

Try to use selenium with a specific user profile of chrome, That way you can use it as specific user and define any thing you want, When doing so it will run as a ‘real’ user, look at chrome process with some process explorer and you’ll see the difference with the tags.

For example:

username = os.getenv("USERNAME")
userProfile = "C:\Users\" + username + "\AppData\Local\Google\Chrome\User Data\Default"
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# add here any tag you want.
options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors", "safebrowsing-disable-download-protection", "safebrowsing-disable-auto-update", "disable-client-side-phishing-detection"])
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

chrome tag list here

Questions:
Answers:

partial interface Navigator {
readonly attribute boolean webdriver;
};

The webdriver IDL attribute of the Navigator interface must return the value of the webdriver-active flag, which is initially false.

This property allows websites to determine that the user agent is under control by WebDriver, and can be used to help mitigate denial-of-service attacks.

Taken directly from the 2017 W3C Editor’s Draft of WebDriver. This heavily implies that at the very least, future iterations of selenium’s drivers will be identifiable to prevent misuse. Ultimately, it’s hard to tell without the source code, what exactly causes chrome driver in specific to be detectable.

Questions:
Answers:

Even if you are sending all the right data (e.g. Selenium doesn’t show up as an extension, you have a reasonable resolution/bit-depth, &c), there are a number of services and tools which profile visitor behaviour to determine whether the actor is a user or an automated system.

For example, visiting a site then immediately going to perform some action by moving the mouse directly to the relevant button, in less than a second, is something no user would actually do.

It might also be useful as a debugging tool to use a site such as https://panopticlick.eff.org/ to check how unique your browser is; it’ll also help you verify whether there are any specific parameters that indicate you’re running in Selenium.

Questions:
Answers:

Firefox is said to set window.navigator.webdriver === true if working with a webdriver. That was according to one of the older specs (e.g.: archive.org) but I couldn’t find it in the new one except for some very vague wording in the appendices.

A test for it is in the selenium code in the file fingerprint_test.js where the comment at the end says “Currently only implemented in firefox” but I wasn’t able to identify any code in that direction with some simple greping, neither in the current (41.0.2) Firefox release-tree nor in the Chromium-tree.

I also found a comment for an older commit regarding fingerprinting in the firefox driver b82512999938 from January 2015. That code is still in the Selenium GIT-master downloaded yesterday at javascript/firefox-driver/extension/content/server.js with a comment linking to the slightly differently worded appendix in the current w3c webdriver spec.

Questions:
Answers:

You can bypass the security issue of the site by adding chrome vpn extension which makes your browser more secure. I have tried it with using ZenMate Security VPN. I have seen that if you set the country as US it works. To activate extension you add chrome extension to chrome browser then find the path it should look like /Users/mesutgunes/Library/Application Support/Google/Chrome/Default/Extensions/fdcgdnkidjaadafnichfpabhfomcebme/5.3.1_0.

Set a option variable with extention path and create chrome driver with that option and navigate the url, activate the VPN then you navigate the stubhub. I wish it would help.

>>> from selenium import webdriver
>>> from selenium.webdriver.chrome.options import Options
>>> opt = Options()
>>> opt.add_argument("load-extension=/Users/mesutgunes/Library/Application Support/Google/Chrome/Default/Extensions/fdcgdnkidjaadafnichfpabhfomcebme/5.3.1_0")
>>> dr = webdriver.Chrome(chrome_options=opt)
>>> dr.get("http://www.stubhub.com/") #activate the vpn with US then get the url again
>>> dr.get("http://www.stubhub.com/")
>>> dr.title
u'Buy sports, concert and theater tickets on StubHub!'
>>>

enter image description here

Questions:
Answers:

Write an html page with the following code. You will see that in the DOM selenium applies a webdriver attribute in the outerHTML

<html>
<head>
  <script type="text/javascript">
  <!--
    function showWindow(){
      javascript:(alert(document.documentElement.outerHTML));
    }
  //-->
  </script>
</head>
<body>
  <form>
    <input type="button" value="Show outerHTML" onclick="showWindow()">
  </form>
</body>
</html>

Questions:
Answers:

Some sites are detecting this:

function d() {
try {
    if (window.document.$cdc_asdjflasutopfhvcZLmcfl_.cache_)
        return !0
} catch (e) {}

try {
    //if (window.document.documentElement.getAttribute(decodeURIComponent("%77%65%62%64%72%69%76%65%72")))
    if (window.document.documentElement.getAttribute("webdriver"))
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%53%65%6C%65%6E%69%75%6D%5F%49%44%45%5F%52%65%63%6F%72%64%65%72") in window)
    if ("_Selenium_IDE_Recorder" in window)
        return !0
} catch (e) {}

try {
    //if (decodeURIComponent("%5F%5F%77%65%62%64%72%69%76%65%72%5F%73%63%72%69%70%74%5F%66%6E") in document)
    if ("__webdriver_script_fn" in document)
        return !0
} catch (e) {}