Sampling occurs with Google Analytics reports that include a lengthy time frame or a large number of sessions (500k sessions at the property level for GA Free), when GA performs the calculation on a subset of data in order to deliver the report to you quickly.
If your data is fairly consistent, or if the sample size is high (80% to 95% of sessions), then using sampled data might be trustworthy enough for decision-making, because that subset is a good representation of all of your data. However, if your data is larger or fluctuates — for example, a retail business that has volatility due to different sales and promotions, Software-as-a-Service with frequent usage, or a media business or remote-learning site with audio and video content — then sampled data is likely not a fair representation.
Risks of Using Sampled Google Analytics Data
Fundamentally, sampled data can undermine confidence. Analysts have a difficult time explaining data sampling to their leadership teams and helping them understand when it is or isn’t a concern. So when executives see a report based on X% of sessions, doubts arise.
And it can serve as an unreliable foundation upon which to make business decisions. Evaluating the way you’re acquiring traffic, for example, might be difficult when you’re looking at a subset of sessions. Likewise, UX decisions, such as how to design and deploy an interstitial to get users to sign up for gated content, may be compromised by considering the experience of just a small group of users. Sampling becomes more of a problem when you’re looking at the medium- to long-tail of reports with more granular dimensions, precisely the kind of customized reports analysts rely on to answer specific business questions.
Know When Data Sampling Affects Google Analytics
GA is most likely to deliver sampled data:
- When your site traffic is high, generating a high volume of sessions
- When your query involves a long time frame — perhaps a month or a quarter
- When you add segments or secondary dimensions to the analysis, which can double or triple the calculation load
For example, a major business publisher that we work with generates tons of data because of their traffic volume; as soon as they add a secondary dimension into any report — like analyzing landing pages by traffic source to understand where visitors are coming from — they start seeing sampled data. For even a week’s worth of data, the sample size might be 1% — far too small to enable meaningful, accurate analysis.
GA has a signal in the top left corner that will tell you if sampling is happening. A green check indicator means no sampling is happening and the yellow check indicator tells you the report is based on a sample of N%. You can click on the indicator to show the level of sampling that is happening. The lower the percentage, the higher the level of sampling and uncertainty. You can use the drop-down menu in the sampling indicator to increase precision based on time or segments, but that adjustment isn’t likely to produce a meaningful change in the sample size.
How to Address Data Sampling
Data analysts have several options to mitigate data sampling in GA Standard:
1) Run your analysis for shorter time frames and without segmentation to avoid sampling.
2) Download unsampled data and use Excel or a business intelligence tool to perform the analysis. A workaround that we’ve developed for our GA clients involves capturing data on a daily basis and aggregating it over time so that we can visualize it with a product like DOMO or Google’s free Data Studio. Or you can use R or Python packages that allow you to run a bunch of smaller queries and fold them together.
3) Upgrade to Google Analytics 360 or another premium tool with higher thresholds before sample occurs. In GA Standard, sampling occurs when you try to analyze more than 500k sessions at the property level for a selected date range; in Analytics 360, that level is 100M sessions at the view level. GA360 also gives you access to all of your raw data in Google BigQuery, which you can query or export for analysis offline.
Making significant business decisions based on 1% of the data raises concern. And data sampling is worse the longer the date range, so it prevents you from looking at trends over a longer term. You can generate monthly reports, but not quarterly and certainly not yearly, which means that sampled data reinforces a short-term point of view. Be sure to resolve issues that arise with sampled data, to help your leadership team confidently make decisions based on meaningful insights.
how we can help you.