Middleware to detect visitors who arrived from a search engine

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import urlparse
import cgi
import re

class SearchReferrerMiddleware(object):
    SEARCH_PARAMS = {
        'AltaVista': 'q',
        'Ask': 'q',
        'Google': 'q',
        'Live': 'q',
        'Lycos': 'query',
        'MSN': 'q',
        'Yahoo': 'p',
    }
    
    NETWORK_RE = r"""^
        (?P<subdomain>[-.a-z\d]+\.)?
        (?P<engine>%s)
        (?P<top_level>(?:\.[a-z]{2,3}){1,2})
        (?P<port>:\d+)?
        $(?ix)"""
    
    @classmethod
    def parse_search(cls, url):
    """
    Extract the search engine, domain, and search term from `url`
    and return them as (engine, domain, term). For example,
    ('Google', 'www.google.co.uk', 'django framework'). Note that
    the search term will be converted to lowercase and have normalized
    spaces.

    The first tuple item will be None if the referrer is not a
    search engine.
    """
        try:
            parsed = urlparse.urlsplit(url)
            network = parsed[1]
            query = parsed[3]
        except (AttributeError, IndexError):
            return (None, None, None)
        for engine, param in cls.SEARCH_PARAMS.iteritems():
            match = re.match(cls.NETWORK_RE % engine, network)
            if match and match.group(2):
                term = cgi.parse_qs(query).get(param)
                if term and term[0]:
                    term = ' '.join(term[0].split()).lower()
                    return (engine, network, term)
        return (None, network, None)

    # Here's where your code goes!
    # It can be any middleware method that needs search engine detection
    # functionality... this is just my example.
    def process_view(self, request, view_func, view_args, view_kwargs):
        from django.views.generic.date_based import object_detail
        referrer = request.META.get('HTTP_REFERER')
        engine, domain, term = self.parse_search(referrer)
        if engine and view_func is object_detail:
            # The client got to this object's page from a search engine.
            # This might be useful for determining the object's popularity.
            # Get the object using object_detail's queryset.
            # Log this search using a custom Visit model or something.

More like this

  1. Allow filtering and ordering by counts of related query results by exogen 7 years ago
  2. Search Engine Referrer info in request by zenx 5 years, 4 months ago
  3. Class Feeds DRY TemplateTag by gmandx 3 years, 11 months ago
  4. browscap.ini-parser by henning 6 years, 10 months ago
  5. RequestStack middleware by simonbun 6 years, 12 months ago

Comments

zenx (on December 10, 2008):

On line 42 replace: match = re.match(NETWORK_RE % engine, network)

with: match = re.match(cls.NETWORK_RE % engine, network)

#

exogen (on October 17, 2009):

@zenx: Fixed, thanks!

#

udfalkso (on December 23, 2010):

Thanks, this worked nicely.

#

sdcooke (on January 14, 2011):

We've been using this and after a very long and painful bug hunt have discovered there are couple of tweaks you might want to make.

You'll probably want to edit line 44 to make it "parse_qs(unicode(query))" and line 46 to make it "u' '.join" otherwise you may spend a while trying to work out why your template is throwing a DjangoUnicodeError!

#

sugraskan10 (on May 15, 2012):

How to use this code ? And I need to google referrer code.

#

(Forgotten your password?)