[Project_owners] Bot ignoring robots.txt

Pete Collins pete at mozdev.org
Tue Oct 28 09:51:14 EST 2003


Here is an example of a bot ignoring robots.txt just today:

Snip:

xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:01:54 -0500] "GET 
/bugs/describecomponents.cgi?product=xprint HTTP/1.0" 200 20569 "-" 
"Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 0
xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:02:53 -0500] "GET 
/bugs/describekeywords.cgi HTTP/1.0" 200 4378 "-" 
"Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 0
xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:03:23 -0500] "GET 
/bugs/attachment.cgi?id=764&action=view HTTP/1.0" 200 19564 "-" 
"Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 0
xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:03:53 -0500] "GET 
/bugs/attachment.cgi?id=764&action=edit HTTP/1.0" 200 19564 "-" 
"Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 0
xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:04:23 -0500] "GET 
/bugs/attachment.cgi?bugid=3166&action=enter HTTP/1.0" 200 2794 "-" 
"Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 0
xprint.mozdev.org 195.111.1.2 - - [28/Oct/2003:05:04:54 -0500] "GET 
/bugs/attachment.cgi?bugid=3166&action=viewall HTTP/1.0" 200 2794 
"-" "Computer_and_Automation_Research_Institute_Crawler 
nospamspider at nospamspider.nospamilab.nospamsztaki.hu" 1

. . . .

As you can see, it just starts to traverse the bugzilla cgi scripts.

Do I block the ip?

Anyone know the best way to deal w/ this?

Thanks

--pete

-- 
Pete Collins
www.mozdev.org
www.mozdevgroup.com




More information about the Project_owners mailing list