Author: Piwowar, Heather
"Check out the robot.txt files for PMC for /pmc/articles/ and notice that GoogleBot is allowed, Bing and a few others are allowed, but User-Agent:* (the rest of us) are not. The same is true for ScienceDirect robots.txt: Google may textmine everything, experimenting scientists, nothing. (hat tip to Alf Eaton on twitter)
Is this defensible on the grounds that Google knows what it is doing but The Rest Of Us Can Not Be Trusted? I sure hope not. Scientists are routinely trusted with a lot more than writing a script that won’t bring down a server. There are other ways to ensure someone won’t bring down a server than a global robots.txt ban."