There was however, a wrinkle in the question that was asked…
An official Google’s developer page says this:
If your site’s robots.txt file disallows crawling of these assets, it directly harms how well our algorithms render and index your content. This can result in suboptimal rankings.”
The person asking the question has a valid reason to be concerned about how Google might react about blocking external resources.
“If you use robots.txt to block JS or CSS on external JS files/CSS files in other domain or if other domain blocks them, so the user will see different things than Googlebot, right?
Would Google distrust this kind of page and downrank them?”
Google’s Martin Splitt answered confidently:
“No, we won’t downrank anything. It’s not cloaking. Cloaking very specifically means misleading the user.
Just because we can’t see content doesn’t necessarily mean that you’re misleading the user.”
Cloaking is a trick that spammers use to show one set of content to Google in order to trick Google into ranking it and show a completely different web page to users, like a virus or spam laden web page.
Cloaking is also a way to keep Google from crawling URLs publishers don’t want Google to see, like affiliate links.
Martin’s answer is coming from the direction of whether blocking external resources will be seen as cloaking and his answer is no.
How Blocking External Resources Can Be Problematic
Martin then goes on to describe how blocking external resources can become an issue:
“It is still potentially problematic if your content only shows up when we can fetch these resources and we don’t see the content in the rendered HTML because it’s blocked by robots.txt.
Then we can’t index it. If there’s content missing, we can’t index that.”
Google’s Testing Tools Will Reveal Problems
Martin then goes on to show how a publisher can diagnose whether blocking resources is problematic.
He further clarified:
The Publisher Asked a Trick Question
That’s an interesting answer that it’s okay to block external resources associated with a chat box or a comment widget. It may be useful to block those resources for example if it helps speed up the site rendering for Google, but…
But there’s a slight wrinkle to the question that was asked: You can’t block external resources (on another domain) using robots.txt.
The original question was a two-parter.
This is the problematic first part:
“If you use robots.txt to block JS or CSS on external JS files/CSS files in other domain…”
That part of the question is impossible to accomplish with Robots.txt.
Google’s developers page mentions this topic about a robots.txt:
“It is valid for all files in all subdirectories on the same host, protocol and port number.”
What was overlooked about that question is that a robots.txt only uses relative URLs, not absolute URLs (except for the location of a site map).
A relative URL means that the URL is “relative” to the page with the link.
On an HTACCESS file all the URLs look like this:
And this is what an absolute URL looks like:
So, if you can’t use an absolute URL in the robots.txt then you can’t block an external resource with a robots.txt.
The second part of the question is technically correct:
“…or if other domain blocks them, so the user will see different things than Googlebot, right? Would Google distrust this kind of page and down rank them?”
External resources are often blocked by the other sites. So the question and answer makes more sense from that direction.
Martin Splitt said that blocking those external resources is not cloaking. That statement is true if you don’t use Robots.txt.
That’s what probably what Martin was referring to but…
But the question was specifically about robots.txt.
In the real world, if one wishes to block external resources with a robots.txt, then one needs to use cloaking.