The Future of Research Communications and e-Scholarship

Why may Google textmine but Scientists may not?

Author: Piwowar, Heather

"Check out the robot.txt files for PMC  for /pmc/articles/  and notice that GoogleBot is allowed, Bing and a few others are allowed, but User-Agent:* (the rest of us) are not.  The same is true for ScienceDirect robots.txt:  Google may textmine everything, experimenting scientists, nothing.  (hat tip to Alf Eaton on twitter)

Is this defensible on the grounds that Google knows what it is doing but The Rest Of Us Can Not Be Trusted?  I sure hope not.  Scientists are routinely trusted with a lot more than writing a script that won’t bring down a server.  There are other ways to ensure someone won’t bring down a server than a global robots.txt ban."

http://researchremix.wordpress.com/2013/03/13/why-google/

Archive: https://www.force11.org/node/6435

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Membership

Join the FORCE11 community and take part in our groups, conference, summer school, post on FORCE11, and attend other events.

Membership