Selasa, 29 Januari 2008

LATENT SEMANTIC INDEXING


Regular keyword searches approach a document collection with a kind of accountant mentality: a document contains a given word or it doesn't, with no middle ground. We create a result set by looking through each document in turn for certain keywords and phrases, tossing aside any documents that don't contain them, and ordering the rest based on some ranking system. Each document stands alone in judgement before the search algorithm - there is no interdependence of any kind between documents, which are evaluated solely on their contents.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

When you search an LSI-indexed database, the search engine looks at similarity values it has calculated for every content word, and returns the documents that it thinks best fit the query. Because two documents may be semantically very close even if they do not share a particular keyword, LSI does not require an exact match to return useful results. Where a plain keyword search will fail if there is no exact match, LSI will often return relevant documents that don't contain the keyword at all.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.

LSI vs Semantically Related Words:
After being roasted by a few IR students and scientists I realized that many SEOs (like me) blended the concepts of semantically related words with latent semantic indexing, and due to constraints of the web it is highly unlikely that large scale search engines are using LSI on their main search indexes.

Nonetheless, it is overtly obvious to anyone who studies search relevancy algorithms by watching the results and ranking pages that the following are true for Google:

* search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target term
* pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized)
* pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords

Given the above, here are tips to help increase your page relevancy scores and make your rankings far more stable...

Mix Your Anchor Text!
Latent semantic indexing (or similar technologies) can also be used to look at the link profile of your website. If all your links are heavy in a few particular phrases and light on other similar phrases then your site may not rank as well.

Example Related Terms:
Many of my links to this site say "SEO Book" but I also used various other anchor text combinations to make the linkage data appear less manipulative.

Instead of using SEO in all the links some of them may use phrases like
search engine optimization
search engine marketing
search engine placement
search engine positioning
search engine promotion
search engine ranking
etc.

Instead of using book in all the links some other good common words might be
ebook
manual
guide
tips
report
tutorial
etc.

How do I Know What Words are Related?
There are a variety of options to know what words are related to one another.

* Search Google for search results with related terms using a ~. For example, Google Search: ~seo will return pages with terms matching or related to seo and will highlight some of the related words in the search results.
* Use a lexical database
* Look at variations of keywords suggested by various keyword suggestion tools.
* write a page and use the Google AdSense sandbox to see what type of ads they would try to deliver to that page.
* Read the page copy and analyze the backlinks of high ranking pages.

Google Sandbox and Semantic Relationships:
The concept of "Google Sandbox" has become synonymous with "the damn thing won't rank" or whatever. The Sandbox idea is based upon sites with inadequate perceived trust taking longer to rank well.

Understanding the semantic relationships of words is just another piece of the relevancy algorithms, though many sites will significantly shift in rankings due to it. The Google sandbox theory typically has more to do with people getting the wrong kinds of links or not getting enough links than it does with semantic relationships. Some sites and pages are hurt though by being too focused on a particular keyword or phrase.

Where do I learn more about Latent Semantic Indexing?
A while ago I read Patterns in Unstructured Data and found it was wrote in a rather plain english easy to understand manner.

Source : www.seobook.com
Related Posts Plugin for WordPress, Blogger...