Send this link to a friend by e-mail  |  Print this Web page  |  HOME  |  SCRIPTS  |  FORUMS  |  BACK  |  FORWARD  |  BOTTOM 

Search Engine Optimization ( SEO )


HTML/ SEO



 
Every Web master ( like me! ) has a craving to get his Web site listed top in all the Search Engines. He ( or she ) dreams that huge traffic burst into his ( or her ) Web pages. In this Web page tutorial I will show you all the tricks to achieve the long goal. Below, you see the list of some robots or spiders or bots or crawlers, who visit and index your Web pages. You can assume the three big ones are: Googlebot ( Google ), MSNBot ( MSN ), Inktomi Slurp ( Yahoo! ). Indeed ( and beyond doubt ), Google is one, most likely used by people to search the Web. It's the essential and indispensable or the most thought Search Engine, considered by a Web master. Though, people still use other Search Engines ( or directories ) to get to, your Web page.


Alexa (IA Archiver)
Ask
AskJeeves
Cfetch
Googlebot
Inktomi Slurp
Lycos
MSNBot
MSNBot-media
Netcraft
The web archive (IA Archiver)
Unknown robot (identified by 'bot/' or 'bot-')
Unknown robot (identified by 'crawl')
Unknown robot (identified by 'robot')
Unknown robot (identified by 'spider')
Unknown robot (identified by hit on 'robots.txt')
Voila
Walhello appie
WISENutbot
Yahoo Slurp


You need to submit to Web site to Search Engines ( or directories ), before they crawl and index your Web site. If you see your Web site logs ( or Apache log files ) , you can figure out, who and when, robots or spiders are crawling or visiting your Web pages. It takes few weeks, they list your Web site. Below are the agents ( top 3 ), as they show up in logs. You see, there are Search Engines and directories, who base their results on the top guns; e.g., Alexa results are based on Google, and AltaVista results are based on Yahoo!.
HTML META tags

You need write META tags in HEAD section of your Web page, to provide information about your Web page to the robots or spiders, who visits it. META tags ( or META section ) and HEAD section ( or HEAD tags ) are nothing, but part of HTML. Do you know HTML? Below is the list of META tags with examples. Note that META tags are not all essential, but they provide information to Web page crawlers. Remember, atleast, you put suitable title to your Web page; name ( name of physical HTML or PHP or ASP file ) and title of Web page is really very important.


<title>Search Engine Optimization ( SEO ) tutorial</title>
<meta content="Comprehensive tutorial on Search Engine Optimization ( SEO )." name="description">
<meta content="Comprehensive tutorial on Search Engine Optimization ( SEO )." name="abstract">
<meta content="search engine optimization, seo" name="keywords">
<meta content="index, follow, noarchive" name="robots">
<meta content="anshul shrivastava" name="author">
<meta content="anshul_rsn@yahoo.com" name="email">
<meta content="&copy; 2007 &ndash; mediasworks Group, India and worldwide" name="copyright">
<meta content="7 days" name="revisit-after">
<meta content="en" http-equiv="content-language">
<meta content="text/html; charset=iso-8859-1" http-equiv="content-type">
<meta content="no" http-equiv="imagetoolbar">


description and abstract provides general idea or concept or overview about your Web site. As the example above, these both META tags can be same. The disctinction ( if made ) is that, description should immediately ( and clearly ) tell what the Web site is all about. Infact, Search Engines ( or directories ) often show description of Web site, when a Web site is listed in searcg results, so that users ( surfers ) can understand ( quickly and possibly ), about the content of Web site, they may be going to visit shortly. description and abstract can be few phrases of few lines longs. Search Engines will automatically strip off the long of that; we need just not care. For example– Yahoo! shows more information ( or description ) of a Web page as compared to Google. For a typical Web site, there are groups of Web page, which share same and different description than such other groups. What I suggest, is put effective and relevant description to all your Web pages. If you can divide your Web site content in several groups, you change the description to reflect what the Web pages are all about.


As said above, description META tag of a Web pege, is next important to its name ( name of physical file ) and TTTLE. Next you put keywords META tag. It's just a list of words and phrases, that occur in Web page content. Put the ones that have high repetition. Note that, what is said in name and TTTLE of Web page, need not be copied in keywords. Google is an amazing Search Engine; it reads all the content, upto bottom of your Web page including images, even Flash animations, and it notes the patterns of repetitions. Other Sarch Engines follow similar behaviour. Just don't write fake keywords, as it won't help you. And I suggest you, not to put a long ( and unjust ) list of words and phrases in keywords.


Next is the robots META tag. Unlike keywords META tag, robots works straightforward. <meta content="index, follow, noarchive" name="robots"> tells the agent, to index the Web page, follow the hyperlinks in the Web page but don't archive ( cached! ) the Web page. If you wanna hyperlinks of your Web page are not crawled, you write <meta content="index, nofollow, noarchive" name="robots">. This will only index the ( that ) Web page. What is meant by Web page cached? Well, if you think your Web page won't need any ( or infrequent ) update in future, you may allow the agent to archive it. I say, there are very few people, who click a cached version of Web page; but if they do, you Web site will save some bandwidth! Just an opinion, if a Web page is complete and ready and requires no update for months and years, allow it to be archived; <meta content="index, follow, archive" name="robots">. Some Web masters use <meta name="robot" content="all">.


author and email tells about the author of Web page and email of ( author's ) contact. Remember, a Web page content, may be a contribution of many people or someone who is not the Web master of that Web site. It is just that author contains usually the name of Web master or the coder, who wrote HTML of it. I say you to put these METAs in every Web page of your Web site. I have seen, though less times, Search Engines pay regard to these META tags!


revisit-after META is important. Be careful, if you add this META tag to your Web page. Many times, I've troubled myself to add large attribute value to this tag like <meta content="14 days" name="revisit-after"> or <meta content="21 days" name="revisit-after">. Also, Search Engines may or may not obey this tag. For example– I've seen Google to strictly obey it; while MSNBot not. Point is that, if you're putting enough time and hard-work to update and publish your Web pages, ensure that robots regularly crawl your Web pages. Many times, it happened to me, I've updated ( or published ) a good Web page and Google is not visiting if for hours, for days! ..and I'm checking the logs every few hours!! Moreover, I did gross mistake to put links to planned Web pages; and Google hunts them everyday; gets HTTP 404 error; gets annoyed after few weeks and stops visiting my Web site for weeks! Do you get, what I said? I'll tell more of that, please read on ..


<meta content="en" http-equiv="content-language"> tells the agent, that the Web page content is in English. Like en-US for the U.S. version of English, de if the page is in German. Suppose, you write a Web page in Hindi, you use hi for the content META tag; sa for Sanskrit and ur for Urdu. It enables language dependent pronunciation rules, if browser is reading the Web page document.


<meta content="text/html; charset=iso-8859-1" http-equiv="content-type"> tells something about ISO ( International Standards Organization ) standard; [ISO88591]; represents 8-bit single-byte coded graphic character sets. It specifies character encoding. You don't need care ( and use above ) about it, atleast you're writing English. Note that you can write <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> or <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> or similar. Just note that, capitalizing or humping won't matter. All are valid. As said before, since META tags are not obligatory, they may be skipped. However, atleast put description. It's important. It's really significant.


<meta content="no" http-equiv="imagetoolbar"> prevents poping of image toolbar in MSIE browser ( this happens for images of sufficient dimensions in Internet Explorer ). Put it if Web page has images. Sometimes your users and sometimes you won't like such a pop-up! I got the habit to simply write it on every Web page.


Thus, we've discussed most META tags, which are used to specify metainformation or meta data about a Web page document. As said already, some of these are really useful for Search Engine Optimization. There are more of these, that we've skipped like copyright, expires, refresh, date, distribution, PICS-Label. If you've the exclusive copyright of all content, why not write a copyright META? refresh is an interesting META tag; I'm tempted to tell you ( briefly ); though, it has noting to do with Search Engine Optimization. See and understand the below examples for META refresh. Last thing about METAs ( and if likewise you sought! ), I wanna tell you, is to read W3C Web site ( W3C stands for World Wide Web Consortium ) for HTML or download the documentation and read it offline.


<meta content="9; url=/webstories/qhelp.htm" http-equiv="refresh">
<meta content="3; url=http://www.mediasworks.org/" http-equiv="refresh">


robots.txt

If you've read all the above, you've seen terms like robots or spiders or crawlers or Web wanderers. Do you know what's it all about? You see I'm not here to tell you definition of all that or ( subtle ) distinguish among them. Please note, that they are automated program ( obviously, complicated major projects! ) that visits your Web pages. Search Engines use robots to get data for their Web search results. As you guess, useful data collected by robots is stored in ( terrabytes! ) of disk space ( or main memory? ). You may ask me what we're discussing? Well when robots visit your Web site, they search a file robots.txt in root public directory. I say you to put the said file in said place, so that Searchs Engines don't get 404 File Not Found HTTP error. Below you see the appropriate contents of the file robots.txt, you can use.


User-Agent: *
Allow: /
Disallow: /cgi-bin

Can you understand the above? Well, it allows all spiders ( or robots program ) to crawl the ( public ) / directory; all the directories in it, except cgi-bin. So, that file has 3 terms to say, zero ( some Web masters just put empty; 0 bytes robots.txt, just to prevent HTTP 404 error, it means just the same thing as 3 lines above ) or one or more times. User-Agent says about the robot type who visits. Allow says about the directory which is allowed to visit by User-Agent just said in previous line. You see, you put directory path or file path in Allow. Last one, Disallow is simply the inverse of Allow. Example values for User-Agent is, as I already said in start of this tutorial. For complete details, I say you to visit the http://www.robotstxt.org/ and related sites. Also see the documentation ( of ) provided by Search Engines and directories. Why do you prevent some ( Disallow ) robots? Are robots harmful? Do robots always obey robots.txt and META tags metainformation? are interesting questions. You should guess answers for these questions ( powered by your curiousity and imagination! ) yourself. I say, there is not clear answer to these questions, and you should delve into many documentations to find the answers, if you want. However, I don't see strong reason for such research. As for me, I allow all to crawl my Web pages. Yes, some Web pages need prevented to be indexed by Search Engines like ( files not used by public, directory ) scripts in cgi-bin directory.


How Search Engines work?

As you ( now! ) know, Search Engines use sophisticated automated programs to gather information about your Web site and Web pages. Again, these programs use complicated algorithms to estimate, index or rank your Web pages. Note that's the machine intelligence. Imagine that a robot reads HTML of the queried Web page, and collects the anchor href links, and image src links. Then it visits queries these all links in its knowledge base. Search Engines crawl all the files like .htm, .html, .php, .txt, .css, .js, .gif, .png, .jpg, .jpeg, .mp3, .zip, .rar that are linked to Web page. Bots don't access physical file system. Bots can't query the files directly, in above list, if hotlinking ( cause 403 Status code; Forbidden files ) is prevented.


What Search Engines distaste?

  • Prevent HTTP 404 Page Not Found errors. Interestingly, many times, I put links to planned Web pages, which are unavailable for months. Don't do this. Put href="#", if the anchor link is not currently available. And ( do some work to ) upload the updated Web page, if links are made available. If you're confident, and you put links to planned Web pages, make these Web pages available, as soon as possible. Browse your Web site ( peacefully! ) and note the pages that are unavailable. Plus see your Apache log files ( or awstats log files ) to know detailed/all HTTP Status codes ( please see below ). You can use .htaccess file ( on Apache ) to guide the robots, if a Web page is moved, renamed or deleted. So that they can update their information and database lieu blank ( and unfavorable ) hunting. Below, you see the HTTP/1.1 Status codes.
200 No Error ( success )
206 Partial Content
301 Moved permanently ( redirect )
302 Moved temporarily ( redirect )
303 See other document ( resource replaced )
304 Not modified ( document not modified )
400 Bad request
401 Unauthorized ( HTTP authentication required )
403 Forbidden
404 Document Not Found
408 Request Timeout
410 Gone ( resource removed )
500 Internal Server Error
503 Service Unavailable ( Server busy )
Your ISP ( Internet Service Provider ) or hosting company, should take care for 503. You don't use scripts ( or change server configuration ), that cause 500. Don't plan non-existent Web pages, prevent 404. Next 301, 302, 303 are also unliked by Search Engines. There is little sense in 302. If you renamed or moved a document, you may notify robots, that the original resource is known by new name and/or location, using 301 and 303. Do this temporarily; say for a week or two; point is that, until resource gets registered with new name and/or location in Search Engines. In case of 303 ( resource replaced ), since the original document is then unavailable, let server say 410 to Search Engines. Server should say 410 to all deleted files and moved files ( except 301 ), also. This can be done easily using httpd.conf or .htaccess file. Don't you understand what is said here, in above lines? Reread and please see this tutorial. If you see 206 in your logs, recheck your ( incomplete! ) Web page. In short, eliminate 404 and 206 plus all redirects.
  • Ensure your Web site is up all the time.
  • Don't put big value for revisit-after ( if you put ) in your Web pages. Some Web pages, that update daily, has it '1 days'.
  • Search Engines though index dynamic urls, but static Web pages with clean URLs are best. Some Web masters even take the special effort to use static Web pages with .htm or .html extension, against static Web page with extension like .php. I suggest you, get rid of querystring data and session data in URLs ( Uniform Resource Locators ) or URIs ( Uniform Resource Identifiers ). Apache ( Apache module mod_rewrite ) will help you free querystring data in your links. Use cookie lieu session. If your HTTP server is not Apache, you can use scripting ( PHP or ASP ) to deduce querystring information from static URLs and then use it to show the Web page ( how? ).
  • Search Engines don't like HTTP redirects and META redirects, discussed above.
  • Non-descriptive names for files like 123905.html or img2309.jpeg are bad. Rather Search Engines index these, if long descriptive names are used.


Biased promotion

Google ( and may be other Search Engines also ) read everything in your Web page. Some Web masters put text in invisible color ( i.e., in background color ) and/or in small fonts size at top or bottom of Web page to pass that information to Search Engine. Even the results are welcoming! There are other means of unfair promotion also. If you wanna give away your reputation, you can use such methods.


Google

Google is imporant as a surfer most likely use it. I've seen Google robots show the most activity ( visits and bandwidth, both ) in log files. Most people come to your Web site using Google than compared to other Search Engines or directories. Google even indexes text-content in Flash files! ( How! ). So ensure, Google is regularly visiting and updating its information related to your Web site.


Yahoo!

Yahoo! is the oldest of the class. And it is considered, easier than Google. Note that, lsitings of Search Engines may vary considerably. It is possible to maintain top listings in Yahoo!, if you target it, as compared to Google; as listings in later one is so much volatile! That's the contradiction. Personally, I've found difficult to estimate, how Yahoo! shows its listings. Behaviour and search results of Yahoo! are very different ( and controversial! ) than remaining other toppers. Infact Yahoo! is a Web directory for categorical searching and advertisement ideas. You see, other juniors, in the business also inherited that revolutionary Yellow Pages idea. Google is the one that took the mega challenge to index the Web. It's like measuring the whole earth, that's the enterprise; that's the success! But, one clear thing, I noticed is that, Yahoo! values quality of content and it may remove your Web site completely ( and harshly! ). Is it use human editors or Web site reviewers ( Web site critics! ) also? Yes! Good luck with ( free? ) Yahoo!


MSN

I say MSN, now Live Search is an easy one. In the beginning, I was not happy with MSN, as it's not visitng my Web pages. As with time, its activity once increased; even more than Google! I checked the keywords ( phrases ) and found its showing considerably higher listings than Google. Surprised! One thing I made is, MSN reads mostly title and Google reads everything? I'm not saying here, that MSN not reads the Web page. Surely it does, but perhaps it gives more importance to title than Web page content as compared to Google? The number of links in MSN is always higher than in Google. This is, despite Google ultimacy, many Web masters complain about the number of Web site links in Google.


Alexa

Alexa claims to measure traffic rankings of Web sites. Though, Alexa results are based on Google results, it's really difficult for new Web masters to get a traffic rank for their Web site in Alexa. Alexa pulls many other sharp details ( a non-human Web critic! ) for a Web site, such as speed, reach, page views, user reviews, traffic graph, comparison with other Web site, other Web sites linking to your Web site and so on. It's a pleasure and considered an achievement, to get a position is Alexa traffic rank. Note that Alexa research is based on Alexa Toolbar, Internet Explorer and Windows only and so may not be favourable? Many people use Firefox! Okay! submit your Web site to Alexa and dream to get a traffic rank there. However, as you know, Google is the most, if not all.


Re-submission of your Web site

If you find less activity of Google ( and other Search Engines ) even if, you're adding Web pages to your Web site, you've the trouble. In such cases, update your Web site, check the content ( quality ) and resubmit your Web site. Interestingly, once I've resubmitted my website as http://www.mediasworks.org/ ( previously I've submitted it as http://mediasworks.org/ ) to Google. After few days, I was disheartened one day, when my Web site was not listed for keywords, it used to, earlier. However, within one week of re-submission, I found it is there with 'www' added to url. So I think, re-submission is not a problem. The pricipal reason for re-submission is, if your Web site has changed very much in content, metainformation, you added content, Search Engines activity is low and you need a fresh flush of Search Engines visits to your Web pages. Avoid unnecessary ( plus frequent ) resubmit.


Link popularity

Is your Web site popular? How to estimate popularity? How many Web sites link to your Web pages? How many Web pages ( in 100s ) do you publish? Do your Web pages enough text-content, images and other media? What about traffic ( in 1000s )? If you've positive answers to these questions, yes! your Web site is a popular place on WWW ( World Wide Web ).



Page Rank ( PR ) value

Serach Engines provide toolbars that fits ( installs ) in browser, mostly MSIE and comforts the surfer in searching and browsing. Infact, Alexa uses, data sent, from this toolbar to estimate traffic ranks. These toolbars also display PR value 0 to 10 of a Web page. Page Rank is the importance of a Web page, estimated differently, by different Search Engines. Google PR is popular and universal. More the PR value, more important the Web page is. PR value 0 or 1 shows that a Web page is of little importance. PR value 2 or 3 means, Search Engine considers something is there! PR value 4 means, the Web page is important. PR 5 is a good one. PR 6 or 7 is an achievement. Still higher PR is a very very difficult ( herculean! ) task. Apart toolbars provided by all top Search Engines like Google, Yahoo!, MSN, Alexa. Many Web sites helps to calculate PR value; they ask URL of Web page and output the result for that Web page. You do this if toolbar not installs in browser; you're using other than MSIE.


Life of, as a Web master!

A Web master is a busy creature, whose most close friend is his ( or her ) machine! Infact, it is, his ( or her ) darling! Other people wonder, why his ( or her ) machine is so much different, so much important and why he ( or she ) is so much affected by a dead thing! He ( or she ) is very busy with his ( or her ) Web pages, e-mail and surfing, everytime! His ( or her ) family and friends consider him ( or her ), a fool! Every of this class, has great dreams and I'm just trying to help the new ones in this role, here!.


Search Engine Optimization ( SEO ) Techniques

Let's discuss now, what extra can be done to tame the Web! Please believe that, here I show you, all the great tricks, shown to you.


  • Increase the number of your Web pages, text content, graphics, images and possibly all rich media, if possible.
  • To plan keywords, call your document in browser and see the words/phrases that repeat often. Use nice/meaningful/long description, abstract and title ( and physical file name ) for Web pages.
  • Sub-domains ( like sub.mysite.com ) are considered as a seperate Web site, by Search Engines ( though, one exception is Alexa ). Need of sub-domains is to organize content; URLs like mysite.com/sub/ is better lieu sub.mysite.com. Considering sub-domains are necessary for the reason already said, please link most ( if not all ) your sub-domains, many places in your Web site ( mysite.com ) Web pages.
  • Try to convert your dynamic URLs to static ( clean! ) ones. Though Search Engines are getting ( very much! ) better to index Web pages with quesrystring and session values, it is better not to load these in the URLs ( or URIs ); use clean URLs. You've already read, some Web masters even make extra effort to use static .htm or .html Web pages lieu static .php or .asp Web pages.
  • Try things like affiliates and link-exchange. Make your Web site, a popular place on Web. Try things like link-directory, forums, blogs. Precautions are, value your each auidence and let yourself don't go away from the central purpose or theme of your Web site. For example— if you plan a e-commerce store or online business, you need ( rich!) people come and do purchase; not the messy traffic, you need. For such a thing, expert marketing, product quality, proper shipping and trustworthy security really matters. I warn you, don't use excessive linking or advertising; if you do that, surfer will come, click something, and better go to a better place. You should be able to sustain the interest of the audience. Original quality content is the key. Many people should add your Web pages to bookmarks or favorites!
  • You may start subscription newsletter. Please, don't SPAM, if you plan maturity and age for your Web site.
  • Your Web site should be user-friendly, reachable ( Is your remote server cry often? ), cross-platform; cross-browser compatible. Ensure your Web pages loads quickly. Many people may be using dial-up telephone connection line, and they simply don't wait for the burden to load in high time. Do your Web site has proper navigation structure; some people say absolute links ( URLs ) such as http://www.mediasworks.org/tutorials/search_engine_optimization.htm are better than relative ones like /tutorials/search_engine_optimization.htm. I advise so, use abasolute URLs at places, occasionally; if generally, you use relative URLs at most places.
  • In last, I say submit your Web site to many Search Engines and directories. If you own a profitable business Web site, you may invest a part of your collection, to paid services. Futher, you may solicit audience by using promotional techniques like banners, hoardings, printed pamplets or magazines, T-shirts, skirts, caps, goggle or similar items. Online games or lottery can be considered. You may race to get exclusive publishing rights of something, and link it up. You can do many things. Your creativity! Your imagination! Your enterprise! Your fancy!


Check Your Time!

  • It takes 2 to 4 to 8 weeks for a newly submitted Web site to register itself in Search Engines. Google reacts fastest. Its crawling activity in visits and bandwidth is also highest. Yahoo! and MSN are latecomers.
  • It takes time in weeks; 1 to 2 weeks, to reconcile, annoyed Google or MSN. Yahoo! may take months or half a year.
  • It takes half or full of a year to link a Web site up the Web.
  • Time estimate to achieve Alexa traffic rank may be in years.


Discussion for this tutorial

Search Engine Optimization is a hot topic. Why? Just that, every Web site is not on the top of the Web! Please spare me, if the techniques ( or theory ) discussed here not works ( or varies ) for you or so you conclude its said wrong. Good luck, you Web master!


Errata for this tutorial

None


User comments/feedback for this tutorial

None


This tutorial ends here.

Donate for a cause


Forums— Technology Simplified!

Send this link to a friend by e-mail  |  HOME  |  TECHNOLOGY FORUMS  |  NONPROFIT FUNDS  |  TOP