The simplest explanation is that it lets you see which search terms are always popular at the same time as one another.
If you’re running a seasonal campaign for Wellington Boots, you’d want to know which other searches follow a similar pattern to your major keywords. Similarly if you sell a product and the accessories you might want to see how well the items are correlated and how much lag there might be. Knowing the delay between searches for a printer and searches for cartridges for that printer would be PPC gold.
Google developed this as a result of the flu outbreak a couple of years ago. They realized that simple flu-related searches (“flu symptoms”, “feverish child”, etc.) were very closely correlated with actual flu cases. This information meant that Google could provide easily accessible data about where flu cases were (probably) worst to help planners deliver medications, etc.
But the system they used to set this up had the potential to be more flexible. What Google has produced is a system that can compare any two sets of searches and check the correlation. Even more powerfully it can take non-search patterns and perform data-mining to look for the best correlated searches.
But Correlation ≠ Causation
I have two major warnings here. The first is from Statistics 101: correlation is not causation.
Creative Commons Licensed Image courtesy of xkcd.com
The fact that two terms are strongly correlated doesn’t mean that they are related. At all. Google push this point very hard in their comic, which they are using to explain the concept.
The second problem is really just an aspect that exacerbates the first: data mining is bad statistics.
Data mining is a blanket term to describe a technique that became popular in the field of econometrics (stats-driven economics) in the 70s. Around this time computers were just starting to be used to perform economic calculations and much larger models could be built.
Then some bright spark suggested: “Why start with the model? We have so much data available, why not just test every possible model and find the best fitting one?”
This ugly piece of non-scientific theory still rears its head today. There is a problem. A major one.
The scientific method is simple and it works well. Hypothesize, test, reject (or not).
Data mining breaks away from that process and removes the hypothesis stage. When a statistical test is performed you get a confidence interval of whether this model accurately describes reality. You might say “We have 95% confidence that this model is true, and there is a 5% chance this data was coincidental.”
Crucially when you run 100 tests, that means 5 of them will return as being true even though they weren’t. The coincidences might not just be included in your data, but might even be the strongest bit of data.
So if you say “Show me the top correlated keywords to ‘boots’” Google will give you a list of tens, even hundreds of keywords with high correlations. Only some of these are really related though. The more searches Google compare against the greater the chance of picking up high correlations for completely unrelated terms. And with as much data as Google have it’s almost inevitable that you’ll end up seeing lots of unrelated terms ranked even higher than the good ones!
So whatever tests you perform using this tool, you need to use human common sense when you look at the results. If some terms look unrelated: they are. Be very ruthless cutting results out of what you’re presented with before you consider using them.
Testing by Search Term
This is the first and most obvious use of this tool. Choose a search term and see what else is correlated. Give yourself an idea of related products, accessories or keyword variations to potentially add to your campaign (or use as a crucial negative!).
The tool will let you select one of the terms from the list and show you the time series of that and the original together over the last few years.
If you want to see shorter cycles you can then select a section to zoom in.
Bear in mind that these figures are correlations rather than absolute numbers. If term A increases by 100% and term B increases by 100% then they will be perfectly correlated, even if term A has three times the volume. The numbers are normalised such that the average search volume for each term is marked as 0 and each unit is one standard deviation.
For those who need reminding: a standard deviation is the “average” amount by which any data set is removed from the mean average. So if you spent 50% of the time at 1 and 50% of the time at 2, your mean average would be 1.5. But you didn’t spend any time there so the mean alone would be misleading. The standard deviation would in this case be 0.5 to show that you were on average 0.5 away from the mean. This gives you an idea how spread out a dataset is.
In this case each unit being 1 standard deviation shows you that although the average is 0 every unit up the axis the graph goes is an extra “usual” amount of fluctuation.
Testing by Pattern
Let’s say you have something that isn’t a Google search term that you want to test against. If you have a theory that searches for sunblock vary a lot with the temperature, why not test that?
Input into Google Correlate the daily temperature for the last few years (several weather monitoring organisations will give you this data). What you have done is given Google a pattern. Google will normalise this data (subtract enough to make the mean 0 and divide enough to make the standard deviation 1) so that they’re only looking at patterns not at overall numbers, then compare this pattern against the patterns of searches.
I introduced a dataset showing all the monthly temperatures in London to see what the correlated terms were. I expected to see searches like “sunblock”, “sandals”, “beach holidays” and other sunshine-related terms. Unfortunately it was at this point that it became clear that only US data is currently available within Google Correlate:
High temperatures in London seem to correlate very closely with golf and cycling interest in the states (London’s top temperatures come in July and August most years).
Finding Negative Correlations
One of the correlation types of most interest is that of a negative correlation. A negative correlation doesn’t imply a lack of a relationship, instead it implies an opposing relationship. When A goes up, B goes down and vice versa.
This is of crucial importance in financial circles: to help keep your stock portfolio safe you can choose investments that move counter to each other: If the interest rate rises, demand for housing goes down. So by including currency and property in the same portfolio you’re reducing your susceptibility to economic cycles.
In a PPC sense you’d want to see what kind of searches are strongly negatively correlated with each other. If search term A increases, which terms are decreasing at the same time?
Google won’t let you sort correlation from low to high, so it’s time for another plan. You need to upload your own data again, this time in the form of an opposite trend to your normal search query patterns.
Download your impression data for a keyword of interest. Subtract the average impressions from each day’s data. Then divide that data by the standard deviation (Microsoft Excel has formulae to help you with this). Then simply add a minus sign in front. All you’ve done here is normalised the data yourself and reversed the trend.
Once that data is uploaded you’ll be able to see other keywords that match the opposite trend to your real keyword!
Conclusion
I’m really excited about this tool and the potential it has. I think this can in some ways be better than Google Insights or the Keyword Tool.
The only thing really lacking right now is the ability to enter two keywords and check the correlation between them. But it’s a step in the right direction for managing PPC campaigns and discovering new keyword ideas.