Using Natural Language Processing (NLP) for keyword research

opinion-mining

I’ve seen a lot articles describing keyword research lately. Most of them relate to the changes Google has made during the past years, which data sources you should use and how to define your set of keywords and keyphrases you are going to target. One of the articles I like, was the one written by Nick Eubanks: “How To Build a Keyword Matrix” in which he describes how to use a two dimensional table to determine the opportunities within a specific set of keywords. I still haven’t seen any clear evidence that people are using more words per query when using search engines compared to 12 or 24 months ago, but I think this will be the case looking to the way search engines are able to interact with their users nowadays.

You can imagine the more words a query contains, the easier it is to determine intent, purpose or sentiment and with that you can predict user behaviour. This is what Google started doing with Hummingbird, making sure they understand the user and their actions. To determine possible traffic sources in terms of which keywords to use, it is important to get some insight in user behaviour based on keywords.

Let’s say you have an enormous list of 100.000 keywords you want to determine intent for, which can be done manually or by tagging the keywords with a macro based on the presence of specific words. Keyphrases containing the word “buy” are mostly transactional, since the intent is to buy a product. Queries containing the word “details” are informational, users still are looking for information about specific products. This is a really basic way of identifying your keywords, so I started looking to ways of doing this automatically and in a scalable way.

Automatic keyword processing

A commercial solutions can be found at http://www.alchemyapi.com/products/demo/ They do have a free API to test, once you have signed up. Try their demo to have an idea how such tools can help you determining relevance and intent just based on a combination of words:

[buy best iphone]
buy-best-iphone

[cheap iphone]
cheap-iphone

[Compare iPhones]
compare-iphones

Another option is building your own tool if you’re able to program yourself :) Google has done a lot of research on machine learning, not only for language processing of course but this was one of their main focus areas in the early days. They actually build a public available solution called Google Prediction API which everyone can use to build their own machine learning systems like spam detection, language processing and sentiment analysis, just to name a few possibilities. The basics of machine learning are easy to understand: “Machine learning focuses on prediction, based on known properties learned from the training data.” So you input training data in your systems, try specific datasets and adjust if is not working properly so it is a continuous process of defining the properties you want the system to use.

First you need to create a Google Developers Console project with both the Google Prediction API and Google Cloud Storage API activated. Once you activated those, just start by picking one of the six available client libraries or use Google’s API explorer: Prediction API v1.6. Once everything is installed execute the following steps:

  1. Create a CSV file of your training data
  2. Create a new project in the Prediction API, make sure you fill in your billing information. (100 requests/day are free)
  3. Upload your CSV to Google Storage
  4. Go to the Prediction API browser and upload the new training set
  5. Use the trainedmodel.predict to make predictions :)

Or just follow the tutorial on their website “Hello World!” and use classifiers like “Positive Transactional”,”Negative transactional”,”Neutral informational” for your training data. Make sure you understand the difference between queries like [compare cheapest iphones] and [find best iphones] especially since the queries are telling you something about the willingness of the user to spend money and value of that. Based on these kind of assumptions you can create more insight into which keywords are really valuable for your website.

The biggest challenge is building a decent training dataset. For the English language you can also start working with SentiWordNet, which is a English language lexical resource for opinion mining to test with Prediction modelling with Google’s API. You will get the best result by investing some time in building a decent training set, which you can use for every next project.

What are your tips & tricks on the subject automating keyword research?

How to directly search with Google’s removed filters

Late january Google removed some of his search filters that you could use to refine your search results, based on specific categories of content, like places, patents or discussions. They removed a number of useful filters within the menu, but that doesn’t mean Google shut down these functions completely. These filters can still be used by adding specific parameters to Google’s URLs. Besides that you can also use direct URLs, as listed below, to start with specific filtering before even searching Google’s databases:

Continue reading

Conferences: I’ll be speaking at #SMX Munich

smxmunchenWithin several weeks, the first conferences of the Spring period will start. One of the conferences I’m really looking forward to is SMX Munich! Not only because it is the first SMX event I will be attending but also because of the fact I will be presenting two sessions during the second day of the event. The overall program looks really diverse and promising so I can’t wait till the end of March! The sessions I will be hosting are the following ones:

majesticseoThe advantages and risks of using rich snippets for eCommerce websites
This session is sponsored by MajesticSEO as I will be representing them as their ambassador: There are multiple possibilities for using eCommerce data as rich snippets for eCommerce websites. Biggest advantage is the fact you can add additional information to your SERP snippets, which will lead to higher click through rates in the end. What are the options for your platform? Which integrations will end up in the best results? Don’t forget Google needs time for indexing and processing, what are the risks caused by this? What is Google accepting as valid data? Make sure you make the right decision on which snippets you implement and which not.

The science behind Hummingbird
Last summer Google launched a new engine behind their search engine, labeled Hummingbird. A lot of articles have been written, not all of them are showing the right implications and effects for the user. It speaks for itself that almost nobody noticed changes in Google before they publicly announcing their new algorithm. By exploring the concepts and patent databases, I will take the public on a journey through the science behind Google, with a focus on Hummingbird.

If you would like to attend the event too, please use ADVANTAGESMX as a discount code!