Filtering sitemap pages and images

Filtering is a powerful feature that allows you to precisely control which pages and images are excluded from your sitemap in our online and windows sitemap generator. This guide will walk you through the process of creating filtering rules using expressions, ranging from simple wildcard expressions to more advanced techniques. We follow the ANSI-92 standard for our filtering expressions.

Simple Wildcard Filters

Wildcard expressions provide a flexible way to match URLs based on patterns. You can use wildcards to filter partial filenames, filename patterns, or even apply regular expressions for more complex matching.

“%” Wildcard

The “%” wildcard matches any number of characters. It can be used as the first or last character in the character string.

Example:

  • wh% matches “what,” “white,” and “why,” but not “awhile” or “watch.”

“_” Wildcard

The “_” wildcard matches any single alphabetic character.

Example:

  • B_ll matches “ball,” “bell,” and “bill.”

“[]” Wildcard

The “[]” wildcard allows you to match any single character within the brackets.

Example:

  • B[ae]ll matches “ball” and “bell,” but not “bill.”

“^” Wildcard

The “^” wildcard is used to match any character not in the brackets.

Example:

  • b[^ae]ll matches “bill” and “bull,” but not “ball” or “bell.”

“-” Wildcard

The “-” wildcard lets you match any character within a specified range, with the range defined in ascending order (e.g., A to Z, not Z to A).

Example:

  • b[a-c]d matches “bad,” “bbd,” and “bcd.”

Advanced Filtering Examples

Now that we’ve covered the basics, let’s explore some more advanced filtering examples to demonstrate the versatility of URL filtering expressions.

Complex Patterns

You can combine wildcards and regular expressions to create complex filters. For example:

  • .*blog.* matches any URL containing the word “blog.”

Parameter-Based Filtering

You can use expressions to filter URLs based on query parameters. For example:

  • ^.*\?utm_source=facebook$ matches URLs with the exact query parameter “utm_source=facebook.”

Exclusion Filters

To exclude specific URLs or patterns, use the negation operator “!”. For example:

  • !/private/* excludes all URLs under the “example.com/private/” directory.

Case-Insensitive Matching

To perform case-insensitive matching, use the “i” flag. For example:

  • /products/i matches URLs containing “products” in any case (e.g., “Products,” “products,” “PrOdUcTs”).

With these advanced filtering examples, you can tailor your URL filtering rules to meet your specific needs, ensuring that your sitemap includes only the content that matters most to you.