Yahoo on Web Mining and Improving the Quality of Web Sites

by Bill Slawski

A successful web site is one that fulfills the objectives of its owners and meets the expectations of the visitors that it was created to serve.

This is true of ecommerce web sites, news and informational sites, personal web pages, and even search engines. And, it’s a topic that even the search engines are exploring more deeply. A recent patent application from Yahoo tells us that:

The Web has been characterized by its rapid growth, massive usage, and its ability to facilitate business transactions. This has created an increasing interest for improving and optimizing websites to fit better the needs of their visitors. It is more important than ever for a website to be found easily on the Web and for visitors to reach effortlessly the content for which they are searching. Failing to meet these goals can mean the difference between success and failure on the Internet.

User Query Data Mining and Related Techniques, (US Patent Application 20080065631), by Ricardo Alberto Baeza-Yates and Barbara Poblete.

The patent filing discusses how information about queries that people use, collected from search boxes on a site (if one is used) and from search engines bringing people to a site, can provide useful and helpful information about how people use that site.

The collection of this kind of information is often referred to as Web Mining, and looking closely at the words people use to find information on a site can tell us something about the actual information needs of those visitors.

Search engines have studied searchers’ queries mostly to try to make search engines work better, but looking at the words people use to find a site, and to search within it once they have found it, could help to make the web sites themselves better.

The abstract of Yahoo’s patent filing notes:

Methods and apparatus are described for mining user queries found within the access logs of a website and for relating this information to the website’s overall usage, structure, and content. Such techniques may be used to discover valuable information to improve the quality of the website, allowing the website to become more intuitive and adequate for the needs of its users.

One tool that many site owners use on their pages are analytics programs, though often those are looked at to see how much traffic is coming to a site, and possibly to determine which words people are using to find a site. Analytics programs can provide a stronger role in helping people with web sites improve the experience of people visiting their pages, and the success of their sites.

Web Mining

The Yahoo patent is interesting in that it focuses less on how a search engine works, and more on how the owners of web sites can use the process of Web mining to discover patterns and relations in Web data. Web mining can be broken down into three main areas:

  • Content mining,
  • Structure mining, and;
  • Usage mining.

These relate to three kinds of data that can be found on a web site:

  • Content — the information that a web site provides to visitors such as the text and images and possibly video and audio, that people see when they come to a site.
  • Structure data — this is information about how content is organized on a site, such as the links between pages, the organization of information on pages, the organization of the pages of the site itself, and the links to pages outside of the site.
  • Usage data — this information describes how people actually use the site, and may be reflected in the access log files of the server that the site is on, as well as data collected from specific applications on the site, such as people signing up for newsletters or registering with a site and using it in different ways.

Knowing which pages people visit and which pages people don’t can be helpful in figuring out if there are problems with a site. They can uncover a need to rewrite pages, or to reorganize links, or make other changes.

Mining User Queries

Understanding query terms used to find a site and to search on the site can help improve the overall quality of a site. Yahoo’s approach would be to create a model to use to understand how people are accessing a site, and navigating through it:

According to specific embodiments of the invention, a model is provided for mining user queries found within the access logs of a website, and for relating this information to the website’s overall usage, structure, and content. The aim of this model is to discover valuable information which may be used to improve the quality of the website, thereby allowing the website to become more intuitive and adequate for the needs of its users.

This model presents a methodology of analysis and classification of different types of queries registered in the usage logs of a website, including both queries submitted by users to the website’s internal search engine and queries from global search engines that lead to documents on the website. As will be shown, these queries provide useful information about topics that interest users visiting the website. In addition, the navigation patterns associated with these queries indicate whether or not the documents in the site satisfied the user’s needs.

Queries uncovered might be related to categories drawn from such things as navigational information found on a site.

Traffic through the site could tell someone using this invention how effective the site was at meeting the information needs of the people using certain queries. It could also provide suggestions for:

  1. The addition of new content
  2. Changes or additions in words found in anchor text in links
  3. New links between related documents
  4. Revisions to links between unrelated documents

Information Scent

Visitors to a site will follow links that use words within the links that provide some level of confidence that the information being looked for will be upon the other side of those links (The Right Trigger Words as User Interface Engineering’s Jared Spool calls them). Likewise, when someone searches at a search engine, and sees a page title and a snippet of text for a site in search results, the words used in the title and snippet may persuade someone to visit the page. This is true both for search results from a search engine, and search results from an internal search for a specific site.

Understanding what kind of information is being searched for regarding a specific query, and how the words used in search results, on web pages, and in links to other pages may provide some insight into making those search results, those pages, and that anchor text better.

The patent application describes how pages and the queries used to reach them can be classified based upon how they are typically used by a visitor – from external searches through a search engine, from internal searches through a web site search, or through navigation on the site itself.

It also classifies queries as successful or unsuccessful, based upon things such as whether someone visited a page in response to the display of a search result showing the page, or if they followed other links on pages visited to explore a site in more depth.

Seeing how pages are typically reached on a site in response to certain queries, and seeing which queries are successful and unsuccessful in bringing people to information that they want to find can help a site owner make positive changes to a site.

Example

The patent application provides an example using a portal targeted at university students and future applicants.

It focuses upon exploring how effective the site is when searchers use the queries “university admission test” and “new student application” in searches for the site both on search engines and on a site search for the site. Two initial reports evaluated how effective the site was without making any changes. Twenty of the top suggestions generated from reviewing the model described in this patent application were incorporated into the site’s content and structure:

The suggested improvements were made mainly to the top pages of the site and included adding Information Scent to link descriptions, adding new relevant links, and suggestions extracted from frequent query patterns, and class A and B queries.

Other improvements included broadening the content on certain topics using class C queries, and adding new content to the site using class D queries. For example the site was improved to include more admission test examples, admission test scores, and more detailed information on scholarships, because these were issues consistently showing up in class C and D queries.

The “class C” queries mentioned are ones where there was very little information available on the pages of the web site. The “class D” queries were ones for which there was no information available on the site.

One significant result of these changes showed an increase in traffic from external search engines of more than 20%, due to improvements in content, and in link text.

Conclusion

It’s interesting that a search engine would apply for a patent that explores how to use data mining to improve the quality, content, and navigation of a web site. It’s difficult to tell what Yahoo might do with the method describe in this patent application – whether they will only use it internally, or will offer it to others for a fee, or for free.

Many of the concepts described in this patent application are ones that site owners can presently use to improve how well their site meets their objectives, and the objectives of people visiting their pages.

Understanding the terms that people will try to use to find your pages, and the words and concepts that they expect to see on the pages of your site can make a difference in how successful your site may be.

Using analytics tools to understand how visitors who use certain queries will explore your pages and navigate from one page to another can provide even more value to both searcher and site owner, by pointing out changes that can be made to improve the experience of those visitors.

And those changes may just lead to more visits from search engines.

Google Says Users Won’t be able to Tell Paid Ads from Natural

by Jim Gilbert

By Scott Morrison, Of DOW JONES NEWSWIRES reports a top Google executive (Tim Armstrong, Google’s North American president for advertising and commerce.) of saying:

“Speaking at the Bear Stearns Media Conference in Palm Beach, Fla., Armstrong said Google’s advertising platform will evolve over time so that it won’t distinguish between search and display ads.”

Anyone care to comment on what the heck that means?

Google’s Automatic Match – More Greedy than Expanded Broad Match

by Jim Gilbert

WELL… YOU HEARD IT HERE FIRST and WE WARNED YOU!
Quote from an Official Google email dated 23May2008:

“The feature will be enabled by default, although it
won’t begin to affect your accounts until June 3, 2008.”

UPDATE! 23May2008 — Automatic Match to be the DEFAULT! see the full update at: Automatic Match to be Default

Google Automatic Match Beta

So far this “Automatic Match” option is only a beta and accessible by invitation only. BUT, If this monster goes live and removes our ability to “opt out” (like in Expanded Broad Match), something very, very ugly may happen:

  •   No matter how large your budgets, they WILL be spent — every penny (and dollar)!

I was going to let you read the Beta help file, but it disappeared… IT’S BACK, but you will have to be logged into an AdWords account to get to it: Automatic Match

Summarizing:

Just build your campaigns and they will come. Heck, you no longer even have to offer any keywords — Google will look at your ad and your site and make sure your ads show for any search query that even “smells” relevant.

 

Did Google’s revenue drop in January scare them that badly?

 
 

Pay Per Click

Hanes ‘Wedgie Free’ Campaign Misses Out on Online Marketing

By Li Evans

Madison Avenue advertising agencies may be good at TV commercials, and highly paid PR Firms may know how to write a press release, but when it comes to translating that across to an online medium (i.e. the internet), the majority of them have a lot to learn. I came across a post on AdFreak about Hanes’ new ad campaign for their new product “Wedgie Free” underwear, which features actress Sarah Chalke of Scrubs fame. The commercials really hit the mark by capturing Sarah’s comedic timing and her all around good looks. It can appeal to women by them thinking “wow, ‘She Gets Wedgies Too?'”, yes I know kind of corny, but all of us have been in that situation at least once in our lives.

While the commercials are catchy, and even premiered on American Idol (trying to capture that ‘young adult female’ demographic), I stopped and wondered how this was translating online. To any online marketer, it’s probably not a surprise that it hasn’t translated yet. If you’re a major online brand, maybe even Hanes, you are probably wondering “what is she talking about?” Well lets take a look at this a little closer.

Hanes PR people sent out a press release. It’s nice, contains images of Sarah Chalke from the commercials and also includes the ability to play the videos on PR Newswire. Great! Hanes’ PR company has at least managed to figure out how to get the videos and images into the press release, but that’s where it seems to have stopped. The PR Release isn’t optimized for search – at least the way normal people search – especially if the aim is “Wedgie Free”, “Wedgies”. I’m sorry, but not many women refer to their underwear crawling up their backsides as “no ride up”, its a “wedgie” plain and simple. It make work in a commercial, but that’s not how people search.

When they launched this campaign, they probably didn’t even stop to think about an online strategy. I’m pretty certain it was more of an after thought. Why? Well because if you look at the search results, you’ll see they (meaning Hanes’ website) doesn’t rank for the main phrase “Wedgie Free”, nor “Wedgie Free” Hanes. They could own this term but they don’t and they are missing out – especially with their PR people contacting blogs like AdFreak.

Google Search “Wedgie Free”

Wedgie Free Search Results in Google

Google Search “Wedgie Free Hanes”

Search for Wedgie Free Hanes in Google

Google Blog Search “Wedgie Free Hanes”

Search for Wedgie Free in Google's Blog Search

You can see the results (in both regular search and blog search) brought back are minimal, and probably until this point, not a lot of search were conducted on “wedgie free”. However, if you launch a campaign on American Idol touting “Wedgie Free” underwear, what do you think will happen? Hello – the audience of American Idol is the demographic that uses the internet the most, they are going to go on and search for videos, images and information on “Wedgie Free”. With as little competition as there is for the key phrases around this campaign, they could have really hit the mark online with this campaign without a lot of effort. Instead their Press Release on PR Newswire gets the search results as does AdFreak, who points to the PR NewsWire and Wall Street Journal pieces, not even to the Hanes website.

Multi-Media wise Hanes is sorrily missing out too. They could really capitalize on this campaign if they only took the time to contact an online marketing agency to help them “get more bang for the buck” when it comes to their online efforts. In taking a closer look, I’ll show you some examples of where they are really missing out. First we’ll look at images and then go to video.

A search in Google Image Search shows the screen capture below for “Hanes Wedgie Free”. I also did a search on “Sarach Chalke”. Granted the search for Sarah might be a bit more competitive, but had Hanes optimized their images on their site and in their press release for theses terms, they could be capturing another segment of search, and it’s quite possible they could invoke that these images produce “blended” search results in the search engines (where the images will appear in the search results).

Google Image Search “Hanes Wedge Free”

Results for Hanes Wedgie Free in Google Image Search

Google Image Search “Sarah Chalke”

Search for Sarah Chalke in Google Image Search

Now lets go to video. Here’s another chance that Hanes could quite possibly get “blended” search results to start appearing for these phrases that undoubtedly people are looking for after the appearance of the commercials on American Idol, however, again they are missing out. Google now incorporates relevant YouTube videos into their search results, Yahoo incorporates Yahoo, YouTube, Metacafe and a few others – Hanes is really missing out here!

Hanes doesn’t have a YouTube channel (as of this writing!), and they don’t have any of their videos/commercials out there. Instead other users on YouTube do. This actually does say a lot for their commercials – they are clever and witty, and Cuba Gooding, Jr is just hilarious in those commercials with Michael Jordan. People really LIKE them. It’s too bad Hanes isn’t taking advantage of this, people would subscribe to the channel and it could be another channel to disseminate their message in a quick and easy manner. Instead, with these new commercials, only one video is out there and it’s put up by a division of a PR Company. Plus the video isn’t even optimized for what it should be, it just has that “PR Spin” in the description.

YouTube Search for “Hanes”

YouTube Search Results for Hanes

YouTube Search for “Hanes Wedgie Free”

YouTube Search Results for Hanes Wedgie Free

YouTube Search for “Hanes Wedgie”

YouTube Search Results for Hanes Wedgie

YouTube Video of Wedgie Free Commercial Uploaded By Another Company

Hane's Wedgie Free Sarah Chalke Commercial / Video Uploaded in Another Users Account

The point here is that this ad campaign is clever, it hits its mark in speaking to its target audience, and it’s got a likable spokesperson, but wow, is it missing out on taking this to the next level. The video of how the commercials were made that’s included in the press release on PR Newswire is great but only included there, why they haven’t put together an online marketing strategy to take advantage of this is really befuddling!

Hanes, if you listening, at least get your own YouTube channel! (That’s a little free advice!) 🙂