Sunday, August 9, 2009

The Black Art of SEO

I have been dabbling in the superstitious side of IT recently; Search Engine Optimization, or SEO for short.

Google guards it search engine secret jealously so know one knows for sure what is hot and what is not. Like any half-decent government organization that desires a certain behaviour from its citizens, it is glad to tell people how they should behave, but refrains from telling the exact way in which the appraisal takes place. Better to let people act in the spirit of an idea, than to give them the exact parameters and inner workings, which tends to provoke abuse.

On the downside, and for this reason, the SEO market looks a lot like the paranormal community -- no one knows for sure what is happening, so anyone can claim to have divined The Algorithm (tm). Proof is sketchy and scientific evidence hard to come by, more so if the search engine's crawlers take some time to revisit your website, leaving a big gap between a change and its effect. What seems to be sure is that Google has succeeded in striking fear into the hearts of SEO experts. Ever since it degraded the ranking of sites meddling with link farms, people seem to be watching their steps more carefully, aiming to please, not to anger. What more could the Great G ask for?

Having said that, SEO has its own scale of near-certain knows to urban legends. On the near-certain side are the HTML tags on a page which are important for Google to determine what a page is about. Keywords with a higher importance are taken from the following tags:

  • URL; give your pages names which are meaningful in relation to the content

  • title; it is advised to stick the important keywords to the start of the sentence

  • h1/h2/h3; in order of importance

  • bold/strong; emphasis might denote importance of keyword

  • image; the name of the file and especially the alt text

  • link; the title of the link and the anchor text



As I am naturally disinclined to human-crawl a website looking for these tags, I hacked together a Java program based on HtmlUnit (my latest love!) that does the job for me. I give it a URL and it parses the site, following all links to pages on the same site. Of every page it checks the important HTML elements. The outcome is a report where you can see what keywords are being picked up by a crawler. It does not do SEO for you, but it certainly helps if you do not see the keywords that you would have expected in there.

If you want the app (disclaimer: no unit tests, all-round error checking, build etc in place), are interested in a report for your site or can add valuable SEO divining capacities to the tool, please drop me a line.

2 comments:

  1. Don't forget that <p> is also considered by crawlers (i.e. Google, Yahoo/Bing) for keywords. An important part of SEO is in the actual content. For example copywriters are increasingly adding SEO as a part of their services.

    ReplyDelete