Google proudly announced that they will show more detailed search query data in Google Webmaster Tools. Since the introduction of the query dashboard, Google has been showing rounded and summarized data points which where really useless, especially for low volume search query data. The only useful insights you could get out of it where the ratios between different types of queries, branded versus non-branded impressions for example. The SEO community actually reacted very positive but I think you always should keep in mind that Google is never doing something just to serve webmasters and SEOs. So did Google really improve the data quality and usability for analyzing purposes?
Last Tuesday I already did a quick calculation on some domains and I saw more people complaining about the erroneous data Google is presenting within the WMT dashboard. Most online marketers already are familiar with the fact that you can never trust the data Google is giving you for free so I decided to dive into the numbers and try to detect cause or general pattern within the data discrepancies. I have used Google Analytics for now, but I’m gathering some log file data to set up a more reliable comparison.
Since Google states the data improvements have been applied retroactively, I will take the month December (31 days) as the period. First of all, I had to collect all the data I needed:
- WMT: Top Search queries, per day and the total of December
- WMT: Top pages, per day and the total of December
- Google Analytics: organic search queries from Google. Add the secondary dimension Source / Medium and include a filter containing [google].
- Google Analytics: Landing Pages, origin from organic Google traffic. Add the secondary dimension Source / Medium and include a filter containing [google / organic].
Within Webmaster Tools, make sure you use the right filters depending on which dimensions and filters you use within Google Analytics. The default filter is on “Web”. Make sure to use the “All” setting to get all your data. This can cause big differences if your website has a lot of images.
Update 14-01: I got some questions how to make use of these filters through their API. If you have a look at the URLs, you can adjust your code accordingly:
Available variables within Search Query dashboard:
- QM: filter keyword lists based on input. If you want to exclude the keyword, add qme=true
- Prop: choose between different type of Google search engines: All, Image, Mobile, Video or Web
- Region: you can choose a specific region
- More: All queries OR Queries with +10 impressions
And for my own website I have compared analytics with logfile data, to check the data validity of Google Analytics (which should be OK for the smaller accounts). To automate this process, I recommend using the APIs from WMT and Google Analytics. For people using PHP the following libraries are useful:
- PHP Server-Side Google Analytics Client
- PHP script for automating downloading of data tables from Google Webmaster Tools
I have only used pages that got more than 10 clicks, either from the WMT or GA dataset. The data I used is not absolut correct for sure, but I just wanted to show that there is no logic wihtin the differences between the data sets. Another factor you have to take into consideration is that WMT is also not showing all the data if you’re having a popular website, so make sure you take into account the ratio between shown / not shown pages or queries:
Some aspects to consider when using Google Analytics data for example.
- I haven’t checked all the profile settings, so for some profiles IP filtering was included for example.
- Sampled data versus unsampled data, more information at Blastam.com
Also Webmaster Tools has some disclaimers:
- “Some processing of our source data might cause your stats to differ from stats listed in other sources (e.g., to eliminate duplicates and visits from robots). However, these changes should not be significant.” by Google
- “To protect user privacy, Google doesn’t aggregate all data. For example, we might not track some queries that are made a very small number of times or those that contain personal or sensitive information.” by Google
Comparison of the datasets
I also have analysed the differences on keyword level, but because of the (not provided) data within Google Analytics, the results are even more alternating than on page level. I did have a look at the total clicks volume per month per domain, WMT versus GA data:
Top pages per URL, WMT vs GA:
I only have used pages with more than 9 clicks, so the absolute difference in percentages is not going to the moon because of the bigger absolute differences:
Top pages per day, WMT vs GA.
WMT = blue triangles, GA = orange squares. If there is a structural reason behind the differences, the trend lines should be parallel. As you can see for domain 2, they are definitely not parallel. For domain 3 they are almost identical. These are the numbers for the three domains:
Even considering the fact that most of Google Analytics data is sampled data for the bigger accounts, the differences between the datasets are really inconsistent which makes the data quite useless. I have tried to look to the ratio’s because there are a lot of differences between accounts and the bigger the accounts are, the bigger the individual data errors will be. I have analysed 25 domains in total, and every account has unique differences between the data presented in Webmaster Tools compared to the data visible in Google Analytics.
My message to everyone: think about data quality and validity before using Google Webmaster Tool data for your research, predictions or reporting purposes. It is ok to analyse trends and movements, using individual data points is not recommended.
Reply by Google’s John Mueller
John’s answer starts at 20:02. He pointed out that you should be aware of deselecting the web only option within WMT, as explained above. So I think I should e-mail John some examples 🙂