Sets request.is_crawler
Allow bot lockout from certain urls in urlconf ,add view parameter 'deny_crawlers'
ex.
url(r'^foo/$', 'views.foo',{'deny_crawlers' : True},name='foo')
view param is removed after middleware pass.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from django.http import HttpResponseForbidden
BotNames=['Googlebot','Slurp','Twiceler','msnbot','KaloogaBot','YodaoBot','"Baiduspider','googlebot','Speedy Spider','DotBot']
param_name='deny_crawlers'
class CrawlerBlocker:
def process_request(self, request):
user_agent=request.META.get('HTTP_USER_AGENT',None)
if not user_agent:
return HttpResponseForbidden('request without username are not supported. sorry')
request.is_crawler=False
for botname in BotNames:
if botname in user_agent:
request.is_crawler=True
def process_view(self, request, view_func, view_args, view_kwargs):
if param_name in view_kwargs:
if view_kwargs[param_name]:
del view_kwargs[param_name]
if request.is_crawler:
return HttpResponseForbidden('adress removed from crawling. check robots.txt')
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 11 months ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 11 months, 1 week ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 7 months ago
Comments
using user-agent to block bots will only block/stop rocky/noob spammers. user-agent can easily be changed whatever the bot writer wants. I think, much better solution is to white list IP address block of google, msn and save bots and block all other bots.
#
Please login first before commenting.