You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
350 lines
13 KiB
350 lines
13 KiB
.. _ref-contrib-sitemaps: |
|
|
|
===================== |
|
The sitemap framework |
|
===================== |
|
|
|
.. module:: django.contrib.sitemaps |
|
:synopsis: A framework for generating Google sitemap XML files. |
|
|
|
Django comes with a high-level sitemap-generating framework that makes |
|
creating sitemap_ XML files easy. |
|
|
|
.. _sitemap: http://www.sitemaps.org/ |
|
|
|
Overview |
|
======== |
|
|
|
A sitemap is an XML file on your Web site that tells search-engine indexers how |
|
frequently your pages change and how "important" certain pages are in relation |
|
to other pages on your site. This information helps search engines index your |
|
site. |
|
|
|
The Django sitemap framework automates the creation of this XML file by letting |
|
you express this information in Python code. |
|
|
|
It works much like Django's :ref:`syndication framework |
|
<ref-contrib-syndication>`. To create a sitemap, just write a |
|
:class:`~django.contrib.sitemaps.Sitemap` class and point to it in your |
|
:ref:`URLconf <topics-http-urls>`. |
|
|
|
Installation |
|
============ |
|
|
|
To install the sitemap app, follow these steps: |
|
|
|
1. Add ``'django.contrib.sitemaps'`` to your :setting:`INSTALLED_APPS` |
|
setting. |
|
|
|
2. Make sure ``'django.template.loaders.app_directories.load_template_source'`` |
|
is in your :setting:`TEMPLATE_LOADERS` setting. It's in there by default, |
|
so you'll only need to change this if you've changed that setting. |
|
|
|
3. Make sure you've installed the |
|
:mod:`sites framework <django.contrib.sites>`. |
|
|
|
(Note: The sitemap application doesn't install any database tables. The only |
|
reason it needs to go into :setting:`INSTALLED_APPS` is so that the |
|
:func:`~django.template.loaders.app_directories.load_template_source` template |
|
loader can find the default templates.) |
|
|
|
Initialization |
|
============== |
|
|
|
To activate sitemap generation on your Django site, add this line to your |
|
:ref:`URLconf <topics-http-urls>`:: |
|
|
|
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|
|
|
This tells Django to build a sitemap when a client accesses :file:`/sitemap.xml`. |
|
|
|
The name of the sitemap file is not important, but the location is. Search |
|
engines will only index links in your sitemap for the current URL level and |
|
below. For instance, if :file:`sitemap.xml` lives in your root directory, it may |
|
reference any URL in your site. However, if your sitemap lives at |
|
:file:`/content/sitemap.xml`, it may only reference URLs that begin with |
|
:file:`/content/`. |
|
|
|
The sitemap view takes an extra, required argument: ``{'sitemaps': sitemaps}``. |
|
``sitemaps`` should be a dictionary that maps a short section label (e.g., |
|
``blog`` or ``news``) to its :class:`~django.contrib.sitemaps.Sitemap` class |
|
(e.g., ``BlogSitemap`` or ``NewsSitemap``). It may also map to an *instance* of |
|
a :class:`~django.contrib.sitemaps.Sitemap` class (e.g., |
|
``BlogSitemap(some_var)``). |
|
|
|
Sitemap classes |
|
=============== |
|
|
|
A :class:`~django.contrib.sitemaps.Sitemap` class is a simple Python |
|
class that represents a "section" of entries in your sitemap. For example, |
|
one :class:`~django.contrib.sitemaps.Sitemap` class could represent |
|
all the entries of your weblog, while another could represent all of the |
|
events in your events calendar. |
|
|
|
In the simplest case, all these sections get lumped together into one |
|
:file:`sitemap.xml`, but it's also possible to use the framework to generate a |
|
sitemap index that references individual sitemap files, one per section. (See |
|
`Creating a sitemap index`_ below.) |
|
|
|
:class:`~django.contrib.sitemaps.Sitemap` classes must subclass |
|
``django.contrib.sitemaps.Sitemap``. They can live anywhere in your codebase. |
|
|
|
A simple example |
|
================ |
|
|
|
Let's assume you have a blog system, with an ``Entry`` model, and you want your |
|
sitemap to include all the links to your individual blog entries. Here's how |
|
your sitemap class might look:: |
|
|
|
from django.contrib.sitemaps import Sitemap |
|
from mysite.blog.models import Entry |
|
|
|
class BlogSitemap(Sitemap): |
|
changefreq = "never" |
|
priority = 0.5 |
|
|
|
def items(self): |
|
return Entry.objects.filter(is_draft=False) |
|
|
|
def lastmod(self, obj): |
|
return obj.pub_date |
|
|
|
Note: |
|
|
|
* :attr:`~Sitemap.changefreq` and :attr:`~Sitemap.priority` are class |
|
attributes corresponding to ``<changefreq>`` and ``<priority>`` elements, |
|
respectively. They can be made callable as functions, as |
|
:attr:`~Sitemap.lastmod` was in the example. |
|
* :attr:`~Sitemap.items()` is simply a method that returns a list of |
|
objects. The objects returned will get passed to any callable methods |
|
corresponding to a sitemap property (:attr:`~Sitemap.location`, |
|
:attr:`~Sitemap.lastmod`, :attr:`~Sitemap.changefreq`, and |
|
:attr:`~Sitemap.priority`). |
|
* :attr:`~Sitemap.lastmod` should return a Python ``datetime`` object. |
|
* There is no :attr:`~Sitemap.location` method in this example, but you |
|
can provide it in order to specify the URL for your object. By default, |
|
:attr:`~Sitemap.location()` calls ``get_absolute_url()`` on each object |
|
and returns the result. |
|
|
|
Sitemap class reference |
|
======================= |
|
|
|
.. class:: Sitemap |
|
|
|
A ``Sitemap`` class can define the following methods/attributes: |
|
|
|
.. attribute:: Sitemap.items |
|
|
|
**Required.** A method that returns a list of objects. The framework |
|
doesn't care what *type* of objects they are; all that matters is that |
|
these objects get passed to the :attr:`~Sitemap.location()`, |
|
:attr:`~Sitemap.lastmod()`, :attr:`~Sitemap.changefreq()` and |
|
:attr:`~Sitemap.priority()` methods. |
|
|
|
.. attribute:: Sitemap.location |
|
|
|
**Optional.** Either a method or attribute. |
|
|
|
If it's a method, it should return the absolute URL for a given object as |
|
returned by :attr:`~Sitemap.items()`. |
|
|
|
If it's an attribute, its value should be a string representing an absolute URL |
|
to use for *every* object returned by :attr:`~Sitemap.items()`. |
|
|
|
In both cases, "absolute URL" means a URL that doesn't include the protocol or |
|
domain. Examples: |
|
|
|
* Good: :file:`'/foo/bar/'` |
|
* Bad: :file:`'example.com/foo/bar/'` |
|
* Bad: :file:`'http://example.com/foo/bar/'` |
|
|
|
If :attr:`~Sitemap.location` isn't provided, the framework will call the |
|
``get_absolute_url()`` method on each object as returned by |
|
:attr:`~Sitemap.items()`. |
|
|
|
.. attribute:: Sitemap.lastmod |
|
|
|
**Optional.** Either a method or attribute. |
|
|
|
If it's a method, it should take one argument -- an object as returned by |
|
:attr:`~Sitemap.items()` -- and return that object's last-modified date/time, as a Python |
|
``datetime.datetime`` object. |
|
|
|
If it's an attribute, its value should be a Python ``datetime.datetime`` object |
|
representing the last-modified date/time for *every* object returned by |
|
:attr:`~Sitemap.items()`. |
|
|
|
.. attribute:: Sitemap.changefreq |
|
|
|
**Optional.** Either a method or attribute. |
|
|
|
If it's a method, it should take one argument -- an object as returned by |
|
:attr:`~Sitemap.items()` -- and return that object's change frequency, as a Python string. |
|
|
|
If it's an attribute, its value should be a string representing the change |
|
frequency of *every* object returned by :attr:`~Sitemap.items()`. |
|
|
|
Possible values for :attr:`~Sitemap.changefreq`, whether you use a method or attribute, are: |
|
|
|
* ``'always'`` |
|
* ``'hourly'`` |
|
* ``'daily'`` |
|
* ``'weekly'`` |
|
* ``'monthly'`` |
|
* ``'yearly'`` |
|
* ``'never'`` |
|
|
|
.. method:: Sitemap.priority |
|
|
|
**Optional.** Either a method or attribute. |
|
|
|
If it's a method, it should take one argument -- an object as returned by |
|
:attr:`~Sitemap.items()` -- and return that object's priority, as either a string or float. |
|
|
|
If it's an attribute, its value should be either a string or float representing |
|
the priority of *every* object returned by :attr:`~Sitemap.items()`. |
|
|
|
Example values for :attr:`~Sitemap.priority`: ``0.4``, ``1.0``. The default priority of a |
|
page is ``0.5``. See the `sitemaps.org documentation`_ for more. |
|
|
|
.. _sitemaps.org documentation: http://www.sitemaps.org/protocol.html#prioritydef |
|
|
|
Shortcuts |
|
========= |
|
|
|
The sitemap framework provides a couple convenience classes for common cases: |
|
|
|
.. class:: FlatPageSitemap |
|
|
|
The :class:`django.contrib.sitemaps.FlatPageSitemap` class looks at all |
|
:mod:`flatpages <django.contrib.flatpages>` defined for the current |
|
:setting:`SITE_ID` (see the |
|
:mod:`sites documentation <django.contrib.sites>`) and |
|
creates an entry in the sitemap. These entries include only the |
|
:attr:`~Sitemap.location` attribute -- not :attr:`~Sitemap.lastmod`, |
|
:attr:`~Sitemap.changefreq` or :attr:`~Sitemap.priority`. |
|
|
|
.. class:: GenericSitemap |
|
|
|
The :class:`django.contrib.sitemaps.GenericSitemap` class works with any |
|
:ref:`generic views <ref-generic-views>` you already have. |
|
To use it, create an instance, passing in the same :data:`info_dict` you pass to |
|
the generic views. The only requirement is that the dictionary have a |
|
:data:`queryset` entry. It may also have a :data:`date_field` entry that specifies a |
|
date field for objects retrieved from the :data:`queryset`. This will be used for |
|
the :attr:`~Sitemap.lastmod` attribute in the generated sitemap. You may |
|
also pass :attr:`~Sitemap.priority` and :attr:`~Sitemap.changefreq` |
|
keyword arguments to the :class:`~django.contrib.sitemaps.GenericSitemap` |
|
constructor to specify these attributes for all URLs. |
|
|
|
Example |
|
------- |
|
|
|
Here's an example of a :ref:`URLconf <topics-http-urls>` using both:: |
|
|
|
from django.conf.urls.defaults import * |
|
from django.contrib.sitemaps import FlatPageSitemap, GenericSitemap |
|
from mysite.blog.models import Entry |
|
|
|
info_dict = { |
|
'queryset': Entry.objects.all(), |
|
'date_field': 'pub_date', |
|
} |
|
|
|
sitemaps = { |
|
'flatpages': FlatPageSitemap, |
|
'blog': GenericSitemap(info_dict, priority=0.6), |
|
} |
|
|
|
urlpatterns = patterns('', |
|
# some generic view using info_dict |
|
# ... |
|
|
|
# the sitemap |
|
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|
) |
|
|
|
.. _URLconf: ../url_dispatch/ |
|
|
|
Creating a sitemap index |
|
======================== |
|
|
|
The sitemap framework also has the ability to create a sitemap index that |
|
references individual sitemap files, one per each section defined in your |
|
:data:`sitemaps` dictionary. The only differences in usage are: |
|
|
|
* You use two views in your URLconf: :func:`django.contrib.sitemaps.views.index` |
|
and :func:`django.contrib.sitemaps.views.sitemap`. |
|
* The :func:`django.contrib.sitemaps.views.sitemap` view should take a |
|
:data:`section` keyword argument. |
|
|
|
Here is what the relevant URLconf lines would look like for the example above:: |
|
|
|
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.index', {'sitemaps': sitemaps}) |
|
(r'^sitemap-(?P<section>.+)\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|
|
|
This will automatically generate a :file:`sitemap.xml` file that references both |
|
:file:`sitemap-flatpages.xml` and :file:`sitemap-blog.xml`. The |
|
:class:`~django.contrib.sitemaps.Sitemap` classes and the :data:`sitemaps` dict |
|
don't change at all. |
|
|
|
If one of your sitemaps is going to have more than 50,000 URLs you should |
|
create an index file. Your sitemap will be paginated and the index will |
|
reflect that. |
|
|
|
Pinging Google |
|
============== |
|
|
|
You may want to "ping" Google when your sitemap changes, to let it know to |
|
reindex your site. The sitemaps framework provides a function to do just |
|
that: :func:`django.contrib.sitemaps.ping_google()`. |
|
|
|
.. function:: ping_google |
|
|
|
:func:`ping_google` takes an optional argument, :data:`sitemap_url`, |
|
which should be the absolute URL of your site's sitemap (e.g., |
|
:file:`'/sitemap.xml'`). If this argument isn't provided, |
|
:func:`ping_google` will attempt to figure out your |
|
sitemap by performing a reverse looking in your URLconf. |
|
|
|
:func:`ping_google` raises the exception |
|
:exc:`django.contrib.sitemaps.SitemapNotFound` if it cannot determine your |
|
sitemap URL. |
|
|
|
.. admonition:: Register with Google first! |
|
|
|
The :func:`ping_google` command only works if you have registered your |
|
site with `Google Webmaster Tools`_. |
|
|
|
.. _`Google Webmaster Tools`: http://www.google.com/webmasters/tools/ |
|
|
|
One useful way to call :func:`ping_google` is from a model's ``save()`` |
|
method:: |
|
|
|
from django.contrib.sitemaps import ping_google |
|
|
|
class Entry(models.Model): |
|
# ... |
|
def save(self, force_insert=False, force_update=False): |
|
super(Entry, self).save(force_insert, force_update) |
|
try: |
|
ping_google() |
|
except Exception: |
|
# Bare 'except' because we could get a variety |
|
# of HTTP-related exceptions. |
|
pass |
|
|
|
A more efficient solution, however, would be to call :func:`ping_google` from a |
|
cron script, or some other scheduled task. The function makes an HTTP request |
|
to Google's servers, so you may not want to introduce that network overhead |
|
each time you call ``save()``. |
|
|
|
Pinging Google via `manage.py` |
|
------------------------------ |
|
|
|
.. versionadded:: 1.0 |
|
|
|
Once the sitemaps application is added to your project, you may also |
|
ping the Google server's through the command line manage.py interface:: |
|
|
|
python manage.py ping_google [/sitemap.xml]
|
|
|