Regular expressions

You can use regular expressions for defining Page visits and Multi-step types of goals, as well as for segmenting by conditions that use a URL (such as traffic sources).

The expression is processed according to RE2 syntax and the following rules:

  • The regular expression is applied to the page's full URL, including protocol and domain. For example, you can use the regular expression: ^http://.
  • The regular expression is applied twice: first to the original URL, and then to the URL with and without the www prefix. This means that the results do not depend on whether the www prefix is included in the domain.
  • The regular expression is applied to the decoded URL where URL escape codes (% sequences) are replaced with decoded characters (exception: character codes for /, &, =, ?, # are not replaced. For example, %2F will not be replaced with /). Note that the plus sign (+) is replaced with a space when decoding. For example, the regular expression text=elephant will be processed, while text=%D1%81%D0%BB%D0%BE%D0%BD and text=%\w\w will not.
  • Punycode is not applied to Cyrillic URLs. For example, the regular expression ^http://ввв\.сайт\.рф/ will be processed, but ^http://xn--b1aaa\.xn--80aswg\.xn--p1ai/ will not.
  • Before checking regular expressions, symbols such as ?, #, &, and dots (.) are removed from the end of the URL. For example, for the URLs http://example.com/?, http://example.com/#, and http://example.com/?var=1&, the comparison will be made with http://example.com/, http://example.com/, http://example.com/?var=1, respectively. If the user enters the URL http://example.com./, the regular expression \./$ will not be processed.
  • Quantifiers match the longest possible string when checking regular expressions.
  • The characters in URLs are case-sensitive.

Instructions on regular expressions

In the table below, a, b, c, d, and e are any characters, and n and m are whole positive integers.

Alternative variants
abc|de Matches one of the variants: abc or de.
Character classes
[abc] or [a-c] Matches any (one) character from those listed (or from the range).
[^abc] or [^a-c] Matches any (one) character that is not listed (or does not fall within the range).
\d Matches a digit. Equivalent to [0-9].
\D Matches a non-digit. Equivalent to [^0-9].
\s Matches a space. Equivalent to [\t\n\f\r ].
\S Matches any character that is not a space. Equivalent to [^\t\n\f\r ].
\pL Matches any Unicode character.
\w

Matches an uppercase or lowercase Latin letter, number, or underscore.

When working with Unicode characters, use the \pL class instead of \w.

\W

Matches any character that is not an uppercase or lowercase Latin letter, number, or underscore.

When working with Unicode characters, use the \pL class instead of \w.

Number of occurrences (quantifiers)
a* Matches the character a repeated 0 or more times (the longest of possible sequences is selected).
a+ Matches the character a repeated 1 or more times (the longest of possible sequences is selected).
a? Matches the character a repeated 0 or 1 time (priority is given to the character's occurrence).
a{n,m} Matches the character a repeated no less than n and no more than m times (the longest of possible sequences is selected).
a{n,} Matches the character a repeated no less than n times (the longest of possible sequences is selected).
a{n} Matches the character a repeated exactly n times.
a*? Matches the character a repeated 0 or more times (the shortest of possible sequences is selected).
a+? Matches the character a repeated 1 or more times (the shortest of possible sequences is selected).
a?? Matches the character a repeated 0 or 1 time (priority is given to the character's absence).
a{n,m}? Matches the character a repeated no less than n and no more than m times (the shortest of possible sequences is selected).
a{n,}? Matches the character a repeated no less than n times (the shortest of possible sequences is selected).
Position within the string
^ Matches the beginning of the string.
$ Matches the end of the string.
\b

Matches a word boundary — the position between an alphanumeric character (\w) and a non-alphanumeric character (\W).

\B

Matches the absence of a word boundary. Defined through the classes \w and \W.

Escape sequences
\

Backslash before a special character [ ] \ ^ $ . | ? * + ( ) { } means that this character should be interpreted literally, not as a metacharacter.

Example: \$ corresponds to the dollar sign.

\Q...\E All special characters in the interval between \Q and \E are interpreted as regular characters.