A sitemap is an XML file at a well-known path (usually /sitemap.xml) that lists every URL on your site you want indexed. It's a hint to search engines — not a guarantee — about which pages exist and when they were last updated.
Why a sitemap matters
- Discovery: helps Google find URLs that aren't well-internally-linked (e.g. orphan pages, new posts before any other page links to them).
- Recrawl prioritisation:
<lastmod>tells Google when content changed, which can speed up re-indexing. - Search Console reporting: lets you submit a sitemap and see exactly which URLs Google has indexed vs ignored.
Minimum useful structure
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-08</lastmod>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-04-15</lastmod>
</url>
</urlset>
<changefreq> and <priority> are also valid attributes but Google has publicly said it largely ignores them.
Common mistakes
- Listing non-canonical URLs — submit the canonical version, not parameter-tagged or duplicate variants.
- Listing noindex pages — sends a contradictory signal.
- Stale
lastmod— if you setlastmodto today on every URL every day without actual changes, Google learns to ignore the field. - Forgetting to reference the sitemap in
robots.txt— declareSitemap: https://example.com/sitemap.xmlfor autodetection.
For large sites (>50K URLs or >50MB), use a sitemap index file that references multiple per-section sitemaps.