 |
SEM ClubHousea Key Relevance blog
|
|
|
|
|
|
|
|
|
5:26 pm - March 10th, 2008
by Jim Gilbert
By Scott Morrison, Of DOW JONES NEWSWIRES reports a top Google executive (Tim Armstrong, Google’s North American president for advertising and commerce.) of saying:
“Speaking at the Bear Stearns Media Conference in Palm Beach, Fla., Armstrong said Google’s advertising platform will evolve over time so that it won’t distinguish between search and display ads.”
Anyone care to comment on what the heck that means?
2:33 pm - February 29th, 2008
By Bill Slawski
There are 3 major parts to what a search engine does.
The first is crawling the Web to revisit old pages and find new pages. The second is taking those crawled pages, and extracting data from them to index. The third part is presenting pages to searchers in response to search queries.
There’s been some interesting research published recently on the first of those parts.
Crawling Challenges
Crawling the Web to discover new pages, and identify changes in pages that a search engine already knows about can be a challenge for a search engine. The major issues that search engines face in crawling sites involve:
- How many pages they can crawl without becoming bogged down,
- How quickly they can crawl pages without overwhelming the sites that they visit, and;
- How much resources do they have to use to crawl and then revisit pages.
A search engine needs to be careful on how it spends its time crawling web pages, and choosing which pages to crawl, to keep these issues under control.
A recently published academic paper describes this important aspect of how a search engine works, the Web crawl, in more detail than most papers that have been published on the subject before.
Enter IRLBot at Texas A&M
The Department of Computer Science at Texas A&M University has been running a long term research project know as IRLBot which “investigates algorithms for mapping the topology of the Internet and discovering the various parts of the web.”
In April, researchers from the school will be presenting some of their recent reseach in Beijing, China, at the 17th International World Wide Web Conference (WWW2008).
The title of their presentation is IRLbot: Scaling to 6 Billion Pages and Beyond (pdf), and the focus of the paper is this primary function that a search engine performs – crawling the Web and finding new Web pages.
Their research describes some interesting approaches to finding new pages on the Web, handling web sites with millions of pages, while also avoiding spam pages and infinite loops that could pose problems to web crawlers.
In a recent experiment that they performed that lasted 41 days, their crawler “IRLbot” ran on a single server and “successfully crawled 6.3 billion web pages at an average download time of approximately 1,789 pages per second.” This is a pretty impressive feat, and it’s even more impressive because of some of the obstacles faced while finding those pages.
Problems Facing Crawling Programs
Politeness
One challenge that faces Web crawling programs is that those programs shouldn’t ask for too many pages from the same site at a time, or they could use up too many resources of the site and make the site inoperable. Keeping that from happening is known as politeness, and search crawlers that aren’t polite often find themselves blocked by site owners, or complained about to the internet service provider hosting the crawler.
URL Management
As a crawling program indexes a site, it needs to pay attention to a file on the site known as a robots.txt file, which provides directions on pages and directories that the crawling program shouldn’t visit, so that it doesn’t crawl pages that it isn’t supposed to see. The program also needs to track which pages it has seen, so that it doesn’t try to crawl the same pages over and over again.
Avoiding Spam While Crawling
The crawling process described in the paper also tried to limit the crawling program from accessing pages that might more likely be spam pages. If a crawling program spends a lot of its time on spam pages and link farms, it has less time to spend on sites that may be more valuable to people searching for good results to their queries at search engines.
One key to the method used by this research team in determining how much attention a site should get from their web crawler was in looking at the number of legitimate links into the site there are from other sites, which is what they refer to as domain reputation.
Why This Paper is Important
The authors of the paper tell us that there are:
…only a limited number of papers describing detailed web-crawler algorithms and offering their experimental performance.
The paper provides a lot of details on the crawling process and the steps that the Texas A&M researchers took that enabled them to index multi-million page web sites, avoid spam pages, and remain “polite” while doing so. It explores the experiments that they conducted to test out ideas on how to handle very large sites, and crawls of very large amounts of pages.
They conducted their experiment using only a single computer. The major commercial search engines have considerably more resources to spend on crawling the web, but the issues involving managing which pages they choose to index, being polite to sites that they visit, and avoiding spam pages are problems that commercial search engines face too.
Learning about how search engines may crawl pages can help us understand how a search engine might treat individual sites during that process. If you are interested in learning about the web crawling process in depth, this paper is a good one to spend some time reading.
4:10 pm - January 31st, 2008
By Bill Slawski
If you run a business, and own a web site, it’s not a bad idea to include the address of your site on your invoices, your business cards, within the letterhead of your stationary, and other paperwork that comes out of your office. You may even want to include that URL on shipping boxes, on your business sign, and in other places where the address might be visible.
Every few months, I like to take a walk through the small town I live in, with a pen and notepad in hand, and look for web addresses in places that I haven’t seen them before. On a normal day, I don’t think that I pay too much attention to how the Web and the world interact on a stroll through town, but I see some surprises when I start looking more closely.
My town is a University town, and most of the students are away on winter break, which made this morning quieter than it is when school is in session.
I start searching for URLs as soon as I get out of my front door, and the first one that I see is in a nearby parking lot. There’s a Marine recruiting station close by, and a number of recruiters’ cars in the lot. A number of them had written across their sides and back the Web address “marines.com” and “1-800-marines.”
As I walk past them, I decide to stop for a cup of coffee at one of the local coffee shops. Next to the credit card logos on the door of the shop is a small sign advertising a University meal plan. Students can pay for a card which they can use to buy food at different eating establishments in town, and these signs let them know which ones accept that meal card. It also acts as an advertisment for students, so that they can find out more about the program, and the URL is shown so that they can find out more about the service.
I grab the local paper while I’m getting my coffee, and start looking through it for Web addresses. A front page banner ad, below the fold, looks more like it was designed for a web portal than news print. Appropriately, it advertises a web site.
Turning through the pages of the newspaper, I’m starting to see ads that don’t carry a street address or a phone number – just a URL. I wonder how many of them are actually local businesses, and how many are located somewhere else. The advertisements are for items that could be anywhere in the world.
I finish looking through the paper, and and leave the coffee house onto Main Street, when a bus passes by. I expect a URL on the bus, and don’t see one. I’ve seen their schedules online, so I’m surprised that they don’t include their web address next to their name.
A sticker from a local band, pasted on a utility pole catches my attention, and it provides a URL for their MySpace page. Another sticker, sloppily attached to a mail pickup box a little further down the street is for the state National Guard, and shows their toll free phone number, but not a web address.
A sign at the post office provides a list of dates that the the office will be closed, but tells us that “We’re always open at usps.com.” I’ve been wondering why they didn’t choose the name “mail.gov.”
A paper company truck is stopped on Main Street, to make a delivery, and the side of their truck is a billboard for their goods. Under the sentence where they tell us that they’ve been around since 1919 is the URL for their business.
As I return home, I notice that I’ve received my mail, and on the back of one of the envelopes, I see a message that I can pay my bills online, along with a URL. I’m not sure if I’ve seen a envelope with both web address, and a call to action like that before.
I think I’ve seen more URLs on this walk than I’ve seen in previous trips through town. There are a few on business signs, and on posters in store windows, and in notices posted on the community bulletin board. Next time I try this, I’m going to have to take my phone with me, and see how well those show up on a screen for handhelds.
8:33 am - November 28th, 2007
by Mike Churchill
On 27 November, 2007, Google released an update to Google Maps: they are now including terrain as an optional view. This is especially cool for high-relief areas (mountains, hiking trails, and the like). For example, here are three views of the Grand Canyon:

The ‘Map’ view is pretty boring, and other than showing the size of the National Park and its boundaries, does little to convey the grandeur of the location.

The ‘Satellite’ view gives a better overview, but rather than helping, the various colors of the real terrain create a confusing image.

The new ‘terrain’ view gives the best impression of the feel of the location: the deep rift is clearly visible and the correlation between the valley and the park boundaries are clear. In addition, there are labels identifying certain landmarks. In city locales, the terrain view shows large buildings as well.
This new feature comes at a cost, however: while the terrain view is new, the ‘hybrid’ view which displayed the satellite imagery with the roads overlayed is now a sub-option under the ’satellite’ view. Choose the ’satellite’ view, and a “Show Labels” checkbox becomes available when hovering over the satellite button. Selecting the checkbox will generate the hybrid view. The hybrid view shows vegetation and other non-geological features, so the two views offer complementary insight into certain areas.
8:18 pm - October 18th, 2007
By Mike Churchill
I noticed something interesting tonight as I went to set up a new PPC campaign in Google AdWords. As I went to opt out of the Content Network, I got the following message:

I am guessing that this means that Google has been having some problems with increased opt-outs from the Content Network, and are taking steps to try to stem the loss. Not a good thing; not a bad thing; just an interesting thing that I thought I would share in case you had not seen it yet.
12:01 pm - October 15th, 2007
By Bill Slawski
This is my first post at the SEM Clubhouse, and it is a pleasure joining the team at KeyRelevance.
I’ve been reading patents from search engines for a few years to see what can be learned from them. A number of patent filings usually come out each week from the major search engines, and they often provide some insights into how search engines work.
This past week was no exception, and one of the filings that caught my eye was about the snippets that are shown on the search engine results pages that you see after performing a search.
Since this is my first appearance here, I also want to provide a brief introduction into why I like to look at patent filings from the search engines.
(more…)
11:39 am - August 31st, 2007
By Christine Churchill
I have some exciting news. Two people who I greatly admire are joining Jim, Mike, and the rest of the KeyRelevance team. The soft spoken and intellectual Bill Slawski and the energetic Li Evans are now on board and I’m extremely happy about it.
The travel and client schedule of the last summer convinced me it was time to make some changes. I like having a small company, but the workload was stressful. When your daughter looks in your eyes and says “Mom, you work too much” you know its time to hire.
Bill and Li are established search experts and are well known in the community. They are positively brilliant on search marketing and they are also a blast to work with. I’ve also know each of them for a long time. Years ago when I was on the SEMPO Board of Directors, I frequently used Bill as a sounding board. My position on the Board was to represent smaller SEMs and I would bounce ideas off Bill and always know we were mentally in-sync. I first met Li a few years ago through the High Rankings Forum where I am a moderator. Li was an active contributor and always impressed me with her original creative responses to discussions.
Okay, enough rambling. Here’s a little background on Li and Bill.
Liana (Li) Evans is well known in the search industry for her energy and creativity. Li specializes in social media marketing, blog optimization, link building and viral marketing. She has a background in both Public Relations and information technology and is a regular speaker at industry conferences including Search Engine Strategies and WebmasterWorld’s PubCon. Li also is the creator and main contributor to Search Marketing Gurus and writes for Search Engine Guide and InformIT.
William (Bill) Slawski is a much respected SEO who was one of the founders and administrators for the Cre8asite Forums with my dear friend Kim Krause Berg. Bill is also a featured columnist at Search Engine Land on search related patents and research and writes a monthly column on small business issues. Bill speaks regularly on search engine algorithms and search engine optimization at industry conferences such as Search Engine Strategies and Webmaster World’s PubCon. Bill has the gift of taking complex issues and explaining them in layman’s terms and is also the creative force behind the popular blog SEO By The Sea.
We’re thrilled to have Bill and Li as part of the company and now maybe with more help we’ll be able to post more on this much-neglected blog.
3:55 pm - July 14th, 2007
By Christine Churchill
June felt like one long road trip. I had three conferences in a four week period. All were absolutely fantastic and I wouldn’t have wanted to miss any of them, but I wished I could have stretched out the time between conferences.
The first week of June was Danny Sullivan’s new conference SMX Advanced in Seattle where I spoke on the Better Ways panel with Alex Bennert (Alex is a class act and I hope to be on another panel with her soon), Greg Boser (who never fails to make me smile with his quick quips), Jim Boykin (whose company WeBuildPages should rightfully be called We Build Links), Todd Friesen (Todd claims to have come over to the white hat side, but he’s not fooling anyone – we love him either way), Cameron Olthuis (Mr Social Search Extraordinaire), and Aaron Wall (SEObook himself). You can see Danny trying to keep this motley panel in check in the picture below.

There have been numerous articles covering it and the rest of the show, so I won’t cover the content of panels here. To me, the conference was a networker’s dream – the small size and relaxed feel made it a fabulous place to informally develop business relationships. With the web becoming more socially oriented, having a network of friends and business acquaintances to call on is growing in importance. That may not be the case for all, but I’m finding I’m working with more of my peers than ever before. I’ve included a few pictures below just for fun.

From Left to Right are Stephanie, Alex Bennert, Jane Copland and Christine Churchill

This is my all time favorite picture of Matt Cutts. It really shows his fun side and why he is loved by SEOs.

Did I mention the food at SMX was fantastic? It reminded me of food at SES in the early days when we still had hot meals. Here’s a picture after lunch – notice no box lunches! From the left are Jonathan Hochman, Stephan Spencer, Christine Churchill and Dave McClure.

Todd Malicoat after “yet another strike” during the bowling match at the SEOMOZ party.
The second week in June was SES Toronto hosted by Andrew Goodman and Chris Sherman. Andrew did a super job of planning out a phenomenal conference. He came up with some new sessions and elegantly mixed new with seasoned speakers to give the Conference a fresh feel. Andrew, pat yourself on the back because you pulled off a great conference! I also have to congratulate Incisive for moving the conference to June – the weather then was perfectly delightful. The only downside was I somehow managed to miss my friend Toronto native Brendan Kerin which bummed me out because I wanted to drop a baby gift by to his lovely wife. Sorry Brendan.
My third conference of the month was in Denver, one of my favorite cities in the world. I have fond memories of Colorado from my college days when I attended Colorado State University and from visiting my parents who lived in Estes Park, Colorado. The Denver conference was the High Rankings Seminar which I always enjoy because it’s an excuse to combine work with friendship. Jill Whalen, Scottie Claiborne, Karon Thackston, Jennifer Laycock, myself, and my charming husband Mike Churchill gave presentations on a variety of search related topics.
The weekend after the conference I played hostess to my friends and gave them a tour of some of my favorite Colorado places – Boulder, Estes Park, Rocky Mountain National Park, Trail Ridge Road over the Continental Divide and then through parts of ski country. It was a thoroughly enjoyable trip and it was great to be able to share places I love with people I care about. The gorgeous picture below was taken at Bear Lake in Rocky Mountain National Park. In the picture from left to right are Christine Churchill, Scottie Claiborne, Lee Laughlin, Kaitlin, and Jill Whalen (in the back – yes, the one putting rabbit ears on Lee).

9:19 am - July 12th, 2007
by Jim Gilbert
Rumor has it that new Yahoo executives are running around the trough asking other Yahooers about the direction of the company and what needs to be done. Ain’t that a laugh!
From my experience with the Yahooers I’m allowed to interact with they are part of the problem — not part of the solution.
Many recommendations have been forwared to Yahoo (from little ole me) regarding their PPC systems and to date NOT ONE has ever been implemented. But then, what do I know since I’ve only used PPC systems for years to manage many, many client accounts? A couple of these recommendations were almost guaranteed to put lots of $$$$ in Yahoo’s pockets.
So Yahoo… this is an open challenge to see if your new executives are serious about making improvements and money! Talk to the “right” people — yes, I’m one of them and not that hard to find.
You might even want to keep in mind that at least one high level executive on “Wall Street” values my opinon. That’s kind of funny… Wall Street wants my opinons on Yahoo, but Yahoo doesn’t.
3:41 pm - June 14th, 2007
Jim Gilbert
“Stumbled” across a post at “Stuntdubl’s” blog today ( Confessions of an Advertising Man ) and feel everybody should read it! By the way it was originally written by David Ogilvy whom I do not know personally but gained a lot of respect for.
I mean, just consider some of the quotes from his book:
- The creative process requires more than reason. Most original thinking isn’t even verbal. It requires “a groping experimentation with ideas, governed by intuitive hunches and inspired by the unconcious.”
Strong statement… it can be learned (if you have what it takes), but nobody can teach it to you.
- They copied all they could follow, but they couldn’t copy my mind,
And I left ‘em sweating and stealing, a year and a half behind.
I’m telling you all… the best Internet Marketing tool on earth is still the BRAIN!
Of course there are a few good ole East Texas common sense quotes that were possibly missed, but unless you are from Texas you probably will not notice. Like these that tend to get handed down from father’s to son’s (Hey… it’s a guy thing):
-
Damn Son! If you don’t have time to do it right the first time, when the hell do you think you’ll have time to do it over again?
-
Hey Boy! Quit thinking so much of yourself…… even blind hogs find acorns ocassionally.
There are a few more REAL good ones, but I’ll reserve them for the next meeting in the bar.
« Previous Page — Next Page »
|
|
Spread the Word
+Sphinn+Netscape
+Digg it
+del.icio.us
+reddit
+Furl
+igooi
+Yahoo MyWeb