Sampling

When a counter has collected a large amount of statistical data, Yandex Metrica is able to use just part of the data. For example, it can process 1/10 of all sessions (and then multiply the results by 10 where necessary).

The process of forming this data selection is called sampling. Sampling provides a balance between the speed of obtaining results and their accuracy.

For example, as the result of sampling, a report might not contain data on very rarely visited URLs or uncommon keywords.

If you want to always use a complete dataset without sampling for your analytics, subscribe to Yandex Metrica Pro.

You can control sampling using the accuracy request parameter governing the sample size for calculations.

This parameter can accept several values:

  • low: Returns a fast result based on a limited data sample.
  • medium: Returns a result based on a sample that balances speed and accuracy.
  • high: Returns the most precise value by using the largest data sample. In this mode more time may be required to process your data request.
  • full: Returns all data.

This parameter can also take a numerical value from the interval (0,1]:

  • 1: No sampling (corresponds to the full value).
  • 0.1 or 0.01: The share of returned data (10%, 1%). Any value (for example, 0.42) will be rounded to the nearest degree of 10.

By default, the accuracy parameter is set to medium.

In returned results, the applied sampling is described using the following parameters:

  • sampled: Whether data sampling was performed (true if sampled, false if not).
  • sample_share: The share of data used for calculation (value from 0 to 1).
  • sample_size: Number of rows in the data sample.
  • sample_space: Total number of rows in the source data (without sampling).
Previous