To ensure accurate web analytics data in reports and BI tools, a leading publisher selected Google Analytics 360 (GA 360) and its BigQuery integration to replace Adobe Visual Sciences. The publisher’s data warehouse needed raw, unsampled data for more up-to-date reporting. Management also sought user inputs in key reports, to edit filters and date ranges without coding. In particular, reports using new GA 360 Custom Dimensions and Events suffered from sampled data and a tedious spreadsheet workflow, delaying insights on A/B tests and content engagement.
Empirical Path proposed building an automatically updated, robust, user-friendly report template that could be re-used across the organization and accommodate advances in GA 360 data collection. Working closely with stakeholders, Empirical Path reaffirmed business objectives and reviewed the existing report and underlying data sets. We mocked up a Google Sheets report for client approval. Next, we wrote BigQuery SQL code, modified it to accept dynamic user inputs (using the OWOX BI Google Sheets Add-On) and created an automatically refreshable template. We provided a helpful README tab within the Sheet explaining how to implement, modify, and reuse the template. Final delivery of the BigQuery SQL code was made available on a shared GitHub repository.
To populate our client’s warehouse with unsampled data, Empirical Path first defined the BigQuery GA 360 data export requirements, including output schema, filters, dates and frequency. We then helped the client revise a query to generate the required dataset. Once the data was validated, Empirical Path wrote Python scripts—using the BigQuery API to execute BigQuery jobs—which collected unsampled GA 360 data from BigQuery, transformed it, uploaded the file to Google Cloud Storage, and finally notified a Slack channel. Next, we hosted the Python scripts on a Google Cloud Compute Engine Virtual Machine Instance and scheduled a task to prompt automatic, daily refresh of the data. Finally, a synchronized Compute Instance on a GitHub repository provided further code transparency and enabled seamless collaboration on future updates.
The company has reclaimed an extra 20-25 hours per month to spend on more valuable analytics projects. Instead of relying on error-prone manual processes, management can now replicate a new report from our template and deliver automated reporting. Analysts now have direct access to unsampled Google Analytics 360 data in their warehouse alongside existing enterprise datasets. This enables powerful and rapid reporting of joined sources without the burden of learning of new tools or engineering new workflows.