<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>SEMClubHouse - Key Relevance Blog &#187; SEO</title>
	<atom:link href="http://www.semclubhouse.com/category/SEO/feed" rel="self" type="application/rss+xml" />
	<link>http://www.semclubhouse.com</link>
	<description></description>
	<pubDate>Wed, 05 Nov 2008 21:58:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Domain Moving Day the Key Relevance Way</title>
		<link>http://www.semclubhouse.com/domain-moving-day-the-key-relevance-way/</link>
		<comments>http://www.semclubhouse.com/domain-moving-day-the-key-relevance-way/#comments</comments>
		<pubDate>Fri, 17 Oct 2008 19:21:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Ramblings]]></category>

		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.semclubhouse.com/?p=139</guid>
		<description><![CDATA[Domain Moving Day the Key Relevance Way
by Mike Churchill
So, you&#8217;re gonna  change hosting providers.  In many cases, moving the content of the site is as easy as zipping up the content and unzipping it on the new server.  There is another aspect of moving the domain that many people over look: DNS.
The [...]]]></description>
			<content:encoded><![CDATA[<h3>Domain Moving Day the Key Relevance Way</h3>
<p>by Mike Churchill</p>
<p>So, you&#8217;re gonna  change hosting providers.  In many cases, moving the content of the site is as easy as zipping up the content and unzipping it on the new server.  There is another aspect of moving the domain that many people over look: DNS.</p>
<p>The Domain Name System (DNS) is the translation service that converts your domain name (e.g. keyrelevance.com) to the corresponding IP address.  When you move hosting companies, it’s like changing houses, if you don&#8217;t set up the Change of Address information correctly, you might have some visitors going to the old address for a while.  Proper handling of the changes to DNS records makes this transition time as short as possible.</p>
<p>Let&#8217;s assume that you are changing hosting, and the new hosting company is going to start handling the Authoritative DNS for the domain.  The first step is to configure the new hosting company as the authority.  This should best be done a couple or more days before the site moves to the new location.</p>
<p><strong>What does &#8220;Authoritative DNS&#8221; mean?</strong><br />
There are a double-handful of servers (known as the Root DNS servers) whose purpose is to keep track of who is keeping track of the IP addresses for a domain.  Rather than them handling EVERY DNS request, they only keep track of who is the authoritative publisher of the DNS information for each domain.  In other words, they don&#8217;t know your address, but they tell you who does know it.</p>
<p>If we tell the Root level DNS servers that the authority is changing, this information may take up to 48 hours to propagate throughout the internet.  By changing the authority without changing the IP addresses, then while visiting browsers are making requests during this transition, both the old authority and the new authority will agree on the address (so no traffic gets forwarded before you move).</p>
<p><strong>Shortening the Transition</strong><br />
The authoritative DNS servers want to minimize their load, so every time they send out an answer to a request address for a given domain, they put an expiration date on it.  This is called the &#8220;Time To Live&#8221;, or TTL.  By default, most DNS servers set the domain TTL to 14400 seconds , which equals 1 day.  Thus, when a visitor requests the address of the authoritative DNS, it returns the IP address and says &#8220;and don’t bother asking again for 24 hours.&#8221;  This can cause problems during the actual transition, since the old address might continue to be accessed for a whole day after the address has changed.</p>
<p><strong>The Day Before the Move</strong><br />
Since the new hosting company is the authority, they can shorten the TTL to a much shorter value.  We recommend that 15 minutes (900 seconds) is a good compromise TTL value during the transition time.  </p>
<p><strong>Moving Day</strong><br />
When you are ready to make the switch, have the new DNS servers change the IP information to the new address(es).  Since the TTL was set to 15 minutes, very quickly the other DNS servers on the &#8216;net will be asking for the IP address of the domain.  They will be provided with this info, and the switchover will happen much more quickly than if the authority had not changed.  Once the new site is live and you have verified nothing is horribly wrong, you can change the TTL on the new DNS servers back to 1 day.  If on the other hand, something IS wrong with the new site, you can change the DNS back to the old IP address and within 15 minutes most if not all traffic should be back to the old servers.  We also recommend changing the old DNS info to point to the new IP address as a precaution, but if you follow these steps, most of the traffic should have already trasnsitioned to the new DNS servers.</p>
<p><strong>A Bug in BIND</strong><br />
There is a bug in some versions of the BIND program (which executes the DNS translation).  This bug will cause a DNS server to continue to ask the same authoritative DNS server for the info as long as he is willing to give it.  To complete the transition cleanly, you need to turn the DNS records for the domain off in the old DNS servers.  This will cause it to generate an error, which in turn will cause the requesting DNS server to ask the Root level servers for the new authority.  Until you make this change, there is still a chance that some traffic will continue to visit the old domain.</p>
<p><strong>Change of Address Forms</strong><br />
The USPS offers a Change of Address kit to help make moving your house easier.  Below is the Key Relevance Change of Address Checklist that may make you site&#8217;s transition painless.</p>
<h3>&nbsp;</h3>
<h3>&nbsp;</h3>
<h3>&nbsp;</h3>
<table border=1>
<tr>
<td border=0>
<h3>Key Relevance Domain Change of Address Checklist</h3>
<p><strong>2+ Days Pre-Move</strong><br />
Set up new DNS servers to serve up the OLD IP addresses</p>
<ul>
<li> - handle old subdomains</li>
<li> - handle MX records</li>
</ul>
<p>Once that is complete, Change Authoritative DNS records to point to new DNS servers</p>
<p><strong>1 Day before move</strong><br />
 On new DNS servers, shorten TTL to 15 min (900 sec)</p>
<p><strong>Moving Day</strong><br />
On New DNS Servers</p>
<ul>
<li> - Change IP Addresses to new server</li>
<li> - Change TTL to 1 day (14400 sec), or whatever the default TTL is  once you are sure all is OK</li>
</ul>
<p>On Old DNS Servers</p>
<ul>
<li> - Change IP Addresses to new server to catch DNS stragglers</li>
</ul>
<p><strong>2 Days Post Move (or when convenient)</strong></p>
<ul>
<li> - Remove DNS records from OLD DNS servers (assuming they are still up)</li>
</ul>
</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://www.semclubhouse.com/domain-moving-day-the-key-relevance-way/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Google Expands Details on VisualRank - PageRank for Pictures</title>
		<link>http://www.semclubhouse.com/google-expands-details-on-visualrank-pagerank-for-pictures/</link>
		<comments>http://www.semclubhouse.com/google-expands-details-on-visualrank-pagerank-for-pictures/#comments</comments>
		<pubDate>Thu, 18 Sep 2008 17:57:51 +0000</pubDate>
		<dc:creator>bill</dc:creator>
		
		<category><![CDATA[SEO]]></category>

		<category><![CDATA[algorithms]]></category>

		<category><![CDATA[google]]></category>

		<category><![CDATA[image search]]></category>

		<category><![CDATA[pagerank]]></category>

		<category><![CDATA[picture search]]></category>

		<category><![CDATA[product search]]></category>

		<category><![CDATA[visualrank]]></category>

		<guid isPermaLink="false">http://www.semclubhouse.com/?p=132</guid>
		<description><![CDATA[By Bill Slawski
In April of this year, at the 17th International World Wide Web Conference in Beijing, China, Google researchers presented their findings on an experiment that they performed involving a new way of indexing images which relied to some degree on the actual content of the images instead of things such as text and [...]]]></description>
			<content:encoded><![CDATA[<p>By Bill Slawski</p>
<p>In April of this year, at the 17th International World Wide Web Conference in Beijing, China, Google researchers <a href="http://www2008.org/papers/fp506.html">presented their findings</a> on an experiment that they performed involving a new way of indexing images which relied to some degree on the actual content of the images instead of things such as text and meta data associated with those pictures.  </p>
<p><b>Our First Look at VisualRank</b></p>
<p>The paper, <a href="http://www.esprockets.com/papers/www2008-jing-baluja.pdf">PageRank for Product Image Search</a> (pdf), details the results of a series of experiments involving the retrieval of images in for 2000 of the most popular queries that Google receives for products, such as the iPod and Xbox.  The authors of the paper tell us that user satisfaction and relevancy of results were significantly improved in comparison to results seen from Google&#8217;s image search.</p>
<p>News of this &#8220;PageRank for Pictures&#8221; or VisualRank spread quickly across many blogs including <a href="http://www.techcrunch.com/2008/04/27/google-experiments-with-next-generation-image-search/">TechCrunch</a> and <a href="http://googlesystem.blogspot.com/2008/04/improving-google-image-search-using.html">Google Operating System</a>, as well as media sources such as the <a href="http://www.nytimes.com/2008/04/28/technology/28google.html?_r=2&#038;oref=slogin&#038;oref=slogin">New York Times</a> and <a href="http://www.theregister.co.uk/2008/04/28/google_unveils_pagerank_for_images/">The Register</a> from the UK.</p>
<p>The authors of that paper tell us that it makes three contributions to the indexing of pictures: </p>
<blockquote><ol>
<li>We introduce a novel, simple, algorithm to rank images based on their visual similarities.</li>
<li>We introduce a system to re-rank current Google image search results. In particular, we demonstrate that for a large collection of queries, reliable similarity scores among images can be derived from a comparison of their local descriptors.</li>
<li>The scale of our experiment is the largest among the published works for content-based-image ranking of which we are aware. Basing our evaluation on the most commonly searched for object categories, we significantly improve image search results for queries that are of the most interest to a large set of people.</li>
</ol>
</blockquote>
<p>The process behind ranking images based upon visual similarities between them takes into account small features within the images, while adjusting for such things as differences in scale, rotation, perspective and lighting.  The paper shows an illustration of 1,000 pictures of the painting the Mona Lisa, with the two largest at the center of the illustration being the highest ranked images in a query for &#8220;mona lisa&#8221;</p>
<p><a href='http://www.semclubhouse.com/wp-content/uploads/2008/09/monalisa.jpg'><img src="http://www.semclubhouse.com/wp-content/uploads/2008/09/monalisa.jpg" alt="" title="mona lisa ranking graph" width="446" height="440" border="0" class="aligncenter size-full wp-image-133" /></a></p>
<p><b>A Second Look at VisualRank</b></p>
<p>In the conclusion to <em>PageRank for Product Image Search</em>, the authors noted some areas that they needed to explore further, such as how effective their system might work in real world circumstances on the Web, where mislabeled spam images might appear, as well as many duplicate and near duplicate versions of images.</p>
<p>A new paper from the authors takes a deeper look at the algorithms behind VisualRank, and provides some answers to the problems of spam and duplicate images - <a href="http://www.computer.org/portal/cms_docs_transactions/transactions/tpami/featured_article/featured.pdf">VisualRank: Applying PageRank to Large-Scale Image Search</a> (pdf).</p>
<p>The new VisualRank paper also expands upon the experimentation described in the first paper, which focused upon queries for images of products, to include queries for 80 common landmarks such as the Eiffel Tower, Big Ben, and the Lincoln Memorial.  </p>
<p>This VisualRank approach appears to still rely initially upon older methods of ranking images which look at things such as text and meta data (like alt text) associated with those images, to come up with a limited number of images to compare with each other.  Once it receives those pictures in response to a query, a reranking of those images take place based upon shared features and similarities between the images.</p>
<p><b>Conclusion</b></p>
<p>Hopefully, if you have a website where you include images to help visitors experience what your pages are about in a visual manner, you&#8217;re now asking yourself how good a representation your picture is of what your page is about.  </p>
<p>Being found for images on the web is another way that people can find your pages.  And, the possibility that a search engine might include a picture from your page in search results next to your page title and description and URL is a very real one - Google has been doing it for News searches for a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.semclubhouse.com/google-expands-details-on-visualrank-pagerank-for-pictures/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How A Search Engine May Use Web Traffic Logs in Ranking Web Pages</title>
		<link>http://www.semclubhouse.com/how-a-search-engine-may-use-web-traffic-logs-in-ranking-web-pages/</link>
		<comments>http://www.semclubhouse.com/how-a-search-engine-may-use-web-traffic-logs-in-ranking-web-pages/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 13:59:05 +0000</pubDate>
		<dc:creator>bill</dc:creator>
		
		<category><![CDATA[SEO]]></category>

		<category><![CDATA[crawling]]></category>

		<category><![CDATA[indexing]]></category>

		<category><![CDATA[link popularity]]></category>

		<category><![CDATA[patents]]></category>

		<category><![CDATA[toolbars]]></category>

		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://www.semclubhouse.com/?p=112</guid>
		<description><![CDATA[By Bill Slawski
A newly granted patent from Yahoo describes how information collected from usage log files from toolbars, ISPs, and web servers can be used to rank web pages, discover new pages, move a page into a higher tier in a multi-tier search engine, increase the weight of links and the relevance of anchor text [...]]]></description>
			<content:encoded><![CDATA[<p>By Bill Slawski</p>
<p><em>A newly granted patent from Yahoo describes how information collected from usage log files from toolbars, ISPs, and web servers can be used to rank web pages, discover new pages, move a page into a higher tier in a multi-tier search engine, increase the weight of links and the relevance of anchor text for pages based upon those weights, and determine when the last time a page has been changed or updated.</em></p>
<p><img src="http://www.semclubhouse.com/wp-content/uploads/2008/07/yahoo-toolbar.jpg" width="473" height="56" alt="Yahoo search toolbar" /></p>
<p>When you perform a search at a search engine, and enter a query term to search with, there are a number of steps that a search engine will take before displaying a set of results to you.</p>
<p>One of them is to sort the results to be shown to you in an order based upon a combination of relevance and importance, or popularity.</p>
<p>Over the past few years, that &#8220;popularity&#8221; may have been determined by a search engine in a few different ways. One might be based upon whether or not a page is frequently selected from search results in response to a particular query.  </p>
<p>Another might be based upon a count by a search engine crawling program of the number of links that point to a page, so that the more incoming links to a page, the more popular the page might be considered.  Incoming links might even be treated differently, so that a link from a more popular page may count more than a link from a less popular page.</p>
<p><b>Problems with Click and Link Popularity</b></p>
<p>Those measures of the popularity of a page, based upon clicks in search results and links pointing to that page, are somewhat limited.  It&#8217;s still possible for a page to be very popular and still be assigned a low popularity weight from a search engine. </p>
<p><b><em>Example</em></b></p>
<p>A web page is created, and doesn&#8217;t have many links pointing to it from other sites.  People find the site interesting, and send emails to people they know about the site.  The site gets a lot of visitors, but few links.  It becomes popular, but the search engines don&#8217;t know that, based upon a low number of links to the site, and little or no clicks in search results to the page.  A search engine may continue to consider the page to be one of little popularity.</p>
<p><b>Using Network Traffic Logs to Enhance Popularity Weights</b></p>
<p>Instead of just looking at those links and clicks, what if a search engine started paying attention to actual traffic to pages, measured by looking at traffic information from web browser plugins, web server logs, traffic server logs, and log files from other sources such as Internet Service Providers (ISPs)?</p>
<p>A good question, and it&#8217;s possible that at least one search engine has been using such information for a few years.  </p>
<p>Yahoo was granted a patent today, originally filed in 2002, that describes how search traffic information could be used to create popularity weights for pages, and rerank search results based upon actual traffic to those pages, and be used in a number of other ways. </p>
<p>Here are some of them: </p>
<ul>
<li>The rank of a URL in search results might be influenced by the number of times the URL shows up in network traffic logs as a measure of popularity;</li>
<li>New URLs can be discovered by a search engine when they appear in network traffic logs;</li>
<li>More popular URLs can be placed into higher level tiers of a search index, based upon the number of times the URL appears in the network traffic logs;</li>
<li>Weights can be assigned to links, where the link weights are used to determine popularity and the indexing of pages, based upon the number of times a URL is present in network traffic logs; and,</li>
<li>Whether a page has been modified since the last time a search engine index was updated can be determined by looking at the traffic logs for a last modified date or an HTTP expiration date.</li>
</ul>
<p>The patent granted to Yahoo is:</p>
<p><a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&#038;r=1&#038;p=1&#038;f=G&#038;l=50&#038;d=PTXT&#038;S1=7,398,271.PN.&#038;OS=pn/7,398,271&#038;RS=PN/7,398,271">Using network traffic logs for search enhancement</a><br />
Invented by Arkady Borkovsky, Douglas M. Cook, Jean-Marc Langlois, Tomi Poutanen, and Hongyuan Zha<br />
Assigned to Yahoo<br />
US Patent 7,398,271<br />
Granted July 8, 2008<br />
Filed April 16, 2002</p>
<p>Abstract</p>
<blockquote><p>A method and apparatus for using network traffic logs for search enhancement is disclosed. According to one embodiment, network usage is tracked by generating log files. These log files among other things indicate the frequency web pages are referenced and modified. These log files or information from these log files can then be used to improve document ranking, improve web crawling, determine tiers in a multi-tiered index, determine where to insert a document in a multi-tiered index, determine link weights, and update a search engine index.</p></blockquote>
<p><b>Network Usage Logs Improve Ranking Accuracy</b></p>
<p>The information contained in network usage logs can indicate how a network is actually being used, with popular web pages shown as being viewed more frequently than other web pages. </p>
<p>This popularity count could be used by itself to rank a page, or it could be combined with an older measure that uses such things as links pointing to the page, and clicks in search results.</p>
<p>Instead of looking at all traffic information for a page, visits over a fixed period of time may be counted, or new page views may be considered to be worth more than old page views. </p>
<p><b>Better Web Crawling</b></p>
<p>Usually a search engine crawling program discovers new pages to index by finding links to pages on the pages that they crawl. The crawling program may not easily find sites that don&#8217;t have many links pointing to them.  </p>
<p>But, pages that show up in log files from ISPs or toolbars could be added to the queue of pages to be crawled by a search engine spider</p>
<p>Pages that don&#8217;t have many links to them, but show up frequently in log information may even be promoted for faster processing by a search crawler.</p>
<p><b>Multi-Tiered Search Indexes</b></p>
<p>It&#8217;s not unusual for a search engine to have more than one tier of indexes, with a relatively small first-tier index which includes the most popular documents. Lower tiers get relatively larger, and have relatively less popular documents included within them.</p>
<p>A search query would normally be run against the top level tier first, and if not enough results for a query are found in the first tier, the search engine might run the query against the next level of tiers of the index.</p>
<p>Network usage logs could be used to determine which tier of a multi-tier index should hold a particular page. For instance, a page in the second-tier index could be moved up to the first-tier index if its URL shows up with a high frequency in usage logs. More factors than frequency of a URL in a usage log could be used to determine which tier to assign a document.</p>
<p><b>Usage Logs for Link Weights</b></p>
<p>One use search engines have for link information is to determine the popularity of a document,</p>
<p>The number of incoming links to a page may be used to determine the popularity of that page.</p>
<p>A weight may also be assigned based upon the relationship between words used in a link and the documents being linked to with that link. If there is a strong logical tie between a page and a word, then the relationship between the word and the page is given a relatively higher weight than if there wasn&#8217;t.  This is known as a &#8220;correlation weight.&#8221;  The word &#8220;zebra&#8221; used in the anchor text of a link would have a high correlation weight if the article it points to is about zebras.  If the article is about automobiles, it would have a much lower correlation weight.  </p>
<p>Links could aso be assigned weights (&#8221;link weights&#8221;) based on looking at usage logs to see which links were selected to request a page. As the patent&#8217;s authors tell us:</p>
<blockquote><p>Thus, those links that are frequently selected may be given a higher link weight than those links that are less frequently selected even when the links are to the same document.</p></blockquote>
<p>In other words, pages pointed to by frequently followed links could be assigned higher popularity values than pages with more incoming links that are rarely followed.</p>
<p><b>Link weights Used to Determine the Relevance of Pages for Anchor Text</b> </p>
<p>If a word pointing to a page is in a link (as anchor text), and the link is one that is frequently followed, then the relevance of that page for the word in the anchor text may be increased in the search engine&#8217;s index.  </p>
<blockquote><p>For example, assume that a link to a document has the word &#8220;zebra&#8221;, and another link to the same document has the word &#8220;engine&#8221;. If the &#8220;zebra&#8221; link is rarely followed, then the fact that &#8220;zebra&#8221; is in a link to the document should not significantly increase the correlation weight between the word and the document. On the other hand, if the &#8220;engine&#8221; link is frequently followed, the fact that the word &#8220;engine&#8221; is in a frequently followed link to the document may be used to significantly increase the correlation weight between the word &#8220;engine&#8221; and the document.</p></blockquote>
<p><b>Conclusion</b></p>
<p>This patent was originally filed back in 2002, and some of the processes it covers are also discussed in more recent patent filings and papers from the search engines, such as popularity information being used to determine which tier a page might be on in a multi-tier search engines.</p>
<p>Some of the processes it describes have been assumed by many to be processes that a search engine uses, such as discovering new pages from information gathered by search engine toolbars.</p>
<p>A few of the processes described haven&#8217;t been discussed much, if at all, such as the weight of a link (and the relevance of anchor text in that link) being increased if it is a frequently used link, and decreased if it isn&#8217;t used often.</p>
<p>It&#8217;s possible that some of the processes described in this patent haven&#8217;t been used by a search engine, but it does appear that search engines are paying more and more attention to user information that they do collect from places like toolbars and log files from different sources.  This patent is one of the earliest from a major search engine that describes how such user data could be used in a fair amount of detail. </p>
<p>Another patent from Yahoo was also granted this week on How Anchor Text can be used to determine the relevancy of a page for specific words. I&#8217;ve written about that over on SEO by the Sea, in <a href="http://www.seobythesea.com/?p=1092">Yahoo Patents Anchor Text Relevance in Search Indexing</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.semclubhouse.com/how-a-search-engine-may-use-web-traffic-logs-in-ranking-web-pages/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
