|
Every Web master ( like me! ) has a craving to get his Web site listed top in all the Search Engines.
He ( or she ) dreams that huge traffic burst into his ( or her ) Web pages. In this Web page tutorial I will show you all the tricks to achieve the long goal.
Below, you see the list of some robots or spiders or bots or crawlers, who visit and index your Web pages.
You can assume the three big ones are: Googlebot ( Google ),
MSNBot ( MSN ),
Inktomi Slurp ( Yahoo! ).
Indeed ( and beyond doubt ), Google is one, most likely used by people to search the Web. It's the essential and indispensable or the most thought Search Engine, considered by a Web master.
Though, people still use other Search Engines ( or directories ) to get to, your Web page.
Alexa (IA Archiver)
Ask AskJeeves Cfetch Googlebot Inktomi Slurp Lycos MSNBot MSNBot-media Netcraft The web archive (IA Archiver) Unknown robot (identified by 'bot/' or 'bot-') Unknown robot (identified by 'crawl') Unknown robot (identified by 'robot') Unknown robot (identified by 'spider') Unknown robot (identified by hit on 'robots.txt') Voila Walhello appie WISENutbot Yahoo Slurp
You need to submit to Web site to Search Engines ( or directories ), before they crawl and index your Web site.
If you see your Web site logs ( or Apache log files ) , you can figure out, who and when, robots or spiders are crawling or visiting your Web pages.
It takes few weeks, they list your Web site. Below are the agents ( top 3 ), as they show up in logs.
You see, there are Search Engines and directories, who base their results on the top guns; e.g., Alexa results are based on Google, and AltaVista results are based on Yahoo!.
HTML META tags
You need write META tags in HEAD section of your Web page, to provide information about your Web page to the robots or spiders, who visits it.
META tags ( or META section ) and HEAD section ( or HEAD tags ) are nothing, but part of HTML. Do you know HTML? Below is the list of META tags with examples.
Note that META tags are not all essential, but they provide information to Web page crawlers.
Remember, atleast, you put suitable title to your Web page; name ( name of physical HTML or PHP or ASP file ) and title of Web page is really very important.
<title>Search Engine Optimization ( SEO ) tutorial</title>
<meta content="Comprehensive tutorial on Search Engine Optimization ( SEO )." name="description"> <meta content="Comprehensive tutorial on Search Engine Optimization ( SEO )." name="abstract"> <meta content="search engine optimization, seo" name="keywords"> <meta content="index, follow, noarchive" name="robots"> <meta content="anshul shrivastava" name="author"> <meta content="anshul_rsn@yahoo.com" name="email"> <meta content="© 2007 – mediasworks Group, India and worldwide" name="copyright"> <meta content="7 days" name="revisit-after"> <meta content="en" http-equiv="content-language"> <meta content="text/html; charset=iso-8859-1" http-equiv="content-type"> <meta content="no" http-equiv="imagetoolbar">
description and abstract provides general idea or concept or overview about your Web site.
As the example above, these both META tags can be same. The disctinction ( if made ) is that, description should immediately ( and clearly ) tell what the Web site is all about.
Infact, Search Engines ( or directories ) often show description of Web site, when a Web site is listed in searcg results, so that users ( surfers ) can understand ( quickly and possibly ), about the content of Web site, they may be going to visit shortly.
description and abstract can be few phrases of few lines longs. Search Engines will automatically strip off the long of that; we need just not care.
For example– Yahoo! shows more information ( or description ) of a Web page as compared to Google.
For a typical Web site, there are groups of Web page, which share same and different description than such other groups.
What I suggest, is put effective and relevant description to all your Web pages. If you can divide your Web site content in several groups, you change the description to reflect what the Web pages are all about.
As said above, description META tag of a Web pege, is next important to its name ( name of physical file ) and TTTLE.
Next you put keywords META tag. It's just a list of words and phrases, that occur in Web page content.
Put the ones that have high repetition. Note that, what is said in name and TTTLE of Web page, need not be copied in keywords.
Google is an amazing Search Engine; it reads all the content, upto bottom of your Web page including images, even Flash animations, and it notes the patterns of repetitions.
Other Sarch Engines follow similar behaviour. Just don't write fake keywords, as it won't help you. And I suggest you, not to put a long ( and unjust ) list of words and phrases in keywords.
Next is the robots META tag. Unlike keywords META tag, robots works straightforward.
<meta content="index, follow, noarchive" name="robots"> tells the agent, to index the Web page, follow the hyperlinks in the Web page but don't archive ( cached! ) the Web page.
If you wanna hyperlinks of your Web page are not crawled, you write <meta content="index, nofollow, noarchive" name="robots">. This will only index the ( that ) Web page.
What is meant by Web page cached? Well, if you think your Web page won't need any ( or infrequent ) update in future, you may allow the agent to archive it.
I say, there are very few people, who click a cached version of Web page; but if they do, you Web site will save some bandwidth!
Just an opinion, if a Web page is complete and ready and requires no update for months and years, allow it to be archived; <meta content="index, follow, archive" name="robots">.
Some Web masters use <meta name="robot" content="all">.
author and email tells about the author of Web page and email of ( author's ) contact.
Remember, a Web page content, may be a contribution of many people or someone who is not the Web master of that Web site.
It is just that author contains usually the name of Web master or the coder, who wrote HTML of it.
I say you to put these METAs in every Web page of your Web site.
I have seen, though less times, Search Engines pay regard to these META tags!
revisit-after META is important. Be careful, if you add this META tag to your Web page.
Many times, I've troubled myself to add large attribute value to this tag like <meta content="14 days" name="revisit-after"> or <meta content="21 days" name="revisit-after">.
Also, Search Engines may or may not obey this tag. For example– I've seen Google to strictly obey it; while MSNBot not.
Point is that, if you're putting enough time and hard-work to update and publish your Web pages, ensure that robots regularly crawl your Web pages.
Many times, it happened to me, I've updated ( or published ) a good Web page and Google is not visiting if for hours, for days! ..and I'm checking the logs every few hours!!
Moreover, I did gross mistake to put links to planned Web pages; and Google hunts them everyday; gets HTTP 404 error; gets annoyed after few weeks and stops visiting my Web site for weeks!
Do you get, what I said? I'll tell more of that, please read on ..
<meta content="en" http-equiv="content-language"> tells the agent, that the Web page content is in English.
Like en-US for the U.S. version of English, de if the page is in German.
Suppose, you write a Web page in Hindi, you use hi for the content META tag; sa for Sanskrit and ur for Urdu.
It enables language dependent pronunciation rules, if browser is reading the Web page document.
<meta content="text/html; charset=iso-8859-1" http-equiv="content-type"> tells something about ISO ( International Standards Organization ) standard; [ISO88591]; represents 8-bit single-byte coded graphic character sets.
It specifies character encoding. You don't need care ( and use above ) about it, atleast you're writing English. Note that you can write <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> or <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> or similar.
Just note that, capitalizing or humping won't matter. All are valid. As said before, since META tags are not obligatory, they may be skipped. However, atleast put description. It's important. It's really significant.
<meta content="no" http-equiv="imagetoolbar"> prevents poping of image toolbar in MSIE browser ( this happens for images of sufficient dimensions in Internet Explorer ).
Put it if Web page has images. Sometimes your users and sometimes you won't like such a pop-up! I got the habit to simply write it on every Web page.
Thus, we've discussed most META tags, which are used to specify metainformation or meta data about a Web page document. As said already, some of these are really useful for Search Engine Optimization.
There are more of these, that we've skipped like copyright, expires, refresh, date, distribution, PICS-Label. If you've the exclusive copyright of all content, why not write a copyright META?
refresh is an interesting META tag; I'm tempted to tell you ( briefly ); though, it has noting to do with Search Engine Optimization. See and understand the below examples for META refresh.
Last thing about METAs ( and if likewise you sought! ), I wanna tell you, is to read W3C Web site ( W3C stands for World Wide Web Consortium ) for HTML or download the documentation and read it offline.
<meta content="9; url=/webstories/qhelp.htm" http-equiv="refresh">
<meta content="3; url=http://www.mediasworks.org/" http-equiv="refresh"> robots.txt
If you've read all the above, you've seen terms like robots or spiders or crawlers or Web wanderers.
Do you know what's it all about? You see I'm not here to tell you definition of all that or ( subtle ) distinguish among them.
Please note, that they are automated program ( obviously, complicated major projects! ) that visits your Web pages.
Search Engines use robots to get data for their Web search results.
As you guess, useful data collected by robots is stored in ( terrabytes! ) of disk space ( or main memory? ).
You may ask me what we're discussing? Well when robots visit your Web site, they search a file robots.txt in root public directory.
I say you to put the said file in said place, so that Searchs Engines don't get 404 File Not Found HTTP error.
Below you see the appropriate contents of the file robots.txt, you can use.
User-Agent: *
Allow: / Disallow: /cgi-bin
Can you understand the above? Well, it allows all spiders ( or robots program ) to crawl the ( public ) / directory; all the directories in it, except cgi-bin.
So, that file has 3 terms to say, zero ( some Web masters just put empty; 0 bytes robots.txt, just to prevent HTTP 404 error, it means just the same thing as 3 lines above ) or one or more times.
User-Agent says about the robot type who visits. Allow says about the directory which is allowed to visit by User-Agent just said in previous line.
You see, you put directory path or file path in Allow. Last one, Disallow is simply the inverse of Allow. Example values for User-Agent is, as I already said in start of this tutorial.
For complete details, I say you to visit the http://www.robotstxt.org/ and related sites. Also see the documentation ( of ) provided by Search Engines and directories.
Why do you prevent some ( Disallow ) robots? Are robots harmful? Do robots always obey robots.txt and META tags metainformation? are interesting questions.
You should guess answers for these questions ( powered by your curiousity and imagination! ) yourself.
I say, there is not clear answer to these questions, and you should delve into many documentations to find the answers, if you want. However, I don't see strong reason for such research.
As for me, I allow all to crawl my Web pages. Yes, some Web pages need prevented to be indexed by Search Engines like ( files not used by public, directory ) scripts in cgi-bin directory.
How Search Engines work?
As you ( now! ) know, Search Engines use sophisticated automated programs to gather information about your Web site and Web pages.
Again, these programs use complicated algorithms to estimate, index or rank your Web pages. Note that's the machine intelligence.
Imagine that a robot reads HTML of the queried Web page, and collects the anchor href links, and image src links.
Then it visits queries these all links in its knowledge base. Search Engines crawl all the files like .htm, .html, .php, .txt, .css, .js,
.gif, .png, .jpg, .jpeg, .mp3, .zip, .rar that are linked to Web page. Bots don't access physical file system.
Bots can't query the files directly, in above list, if hotlinking ( cause 403 Status code; Forbidden files ) is prevented.
What Search Engines distaste?
200 No Error ( success )
206 Partial Content 301 Moved permanently ( redirect ) 302 Moved temporarily ( redirect ) 303 See other document ( resource replaced ) 304 Not modified ( document not modified ) 400 Bad request 401 Unauthorized ( HTTP authentication required ) 403 Forbidden 404 Document Not Found 408 Request Timeout 410 Gone ( resource removed ) 500 Internal Server Error 503 Service Unavailable ( Server busy )
Your ISP ( Internet Service Provider ) or hosting company, should take care for 503.
You don't use scripts ( or change server configuration ), that cause 500.
Don't plan non-existent Web pages, prevent 404.
Next 301, 302, 303 are also unliked by Search Engines.
There is little sense in 302. If you renamed or moved a document, you may notify robots, that the original resource is known by new name and/or location, using 301 and 303.
Do this temporarily; say for a week or two; point is that, until resource gets registered with new name and/or location in Search Engines.
In case of 303 ( resource replaced ), since the original document is then unavailable, let server say 410 to Search Engines.
Server should say 410 to all deleted files and moved files ( except 301 ), also. This can be done easily using httpd.conf or .htaccess file.
Don't you understand what is said here, in above lines? Reread and please see this tutorial.
If you see 206 in your logs, recheck your ( incomplete! ) Web page. In short, eliminate 404 and 206 plus all redirects.
Biased promotion
Google ( and may be other Search Engines also ) read everything in your Web page.
Some Web masters put text in invisible color ( i.e., in background color ) and/or in small fonts size at top or bottom of Web page to pass that information to Search Engine.
Even the results are welcoming! There are other means of unfair promotion also. If you wanna give away your reputation, you can use such methods.
Google
Google is imporant as a surfer most likely use it. I've seen Google robots show the most activity ( visits and bandwidth, both ) in log files.
Most people come to your Web site using Google than compared to other Search Engines or directories.
Google even indexes text-content in Flash files! ( How! ).
So ensure, Google is regularly visiting and updating its information related to your Web site.
Yahoo!
Yahoo! is the oldest of the class. And it is considered, easier than Google. Note that, lsitings of Search Engines may vary considerably.
It is possible to maintain top listings in Yahoo!, if you target it, as compared to Google; as listings in later one is so much volatile! That's the contradiction. Personally, I've found difficult to estimate, how Yahoo! shows its listings.
Behaviour and search results of Yahoo! are very different ( and controversial! ) than remaining other toppers. Infact Yahoo! is a Web directory for categorical searching and advertisement ideas.
You see, other juniors, in the business also inherited that revolutionary Yellow Pages idea. Google is the one that took the mega challenge to index the Web. It's like measuring the whole earth, that's the enterprise; that's the success!
But, one clear thing, I noticed is that, Yahoo! values quality of content and it may remove your Web site completely ( and harshly! ). Is it use human editors or Web site reviewers ( Web site critics! ) also? Yes! Good luck with ( free? ) Yahoo!
MSN
I say MSN, now Live Search is an easy one. In the beginning, I was not happy with MSN, as it's not visitng my Web pages.
As with time, its activity once increased; even more than Google! I checked the keywords ( phrases ) and found its showing considerably higher listings than Google.
Surprised! One thing I made is, MSN reads mostly title and Google reads everything? I'm not saying here, that MSN not reads the Web page. Surely it does, but perhaps it gives more importance to title than Web page content as compared to Google?
The number of links in MSN is always higher than in Google.
This is, despite Google ultimacy, many Web masters complain about the number of Web site links in Google.
Alexa
Alexa claims to measure traffic rankings of Web sites. Though, Alexa results are based on Google results, it's really difficult for new Web masters to get a traffic rank for their Web site in Alexa.
Alexa pulls many other sharp details ( a non-human Web critic! ) for a Web site, such as speed, reach, page views, user reviews, traffic graph, comparison with other Web site, other Web sites linking to your Web site and so on.
It's a pleasure and considered an achievement, to get a position is Alexa traffic rank. Note that Alexa research is based on Alexa Toolbar, Internet Explorer and Windows only and so may not be favourable?
Many people use Firefox! Okay! submit your Web site to Alexa and dream to get a traffic rank there. However, as you know, Google is the most, if not all.
Re-submission of your Web site
If you find less activity of Google ( and other Search Engines ) even if, you're adding Web pages to your Web site, you've the trouble.
In such cases, update your Web site, check the content ( quality ) and resubmit your Web site.
Interestingly, once I've resubmitted my website as http://www.mediasworks.org/ ( previously I've submitted it as http://mediasworks.org/ ) to Google.
After few days, I was disheartened one day, when my Web site was not listed for keywords, it used to, earlier.
However, within one week of re-submission, I found it is there with 'www' added to url.
So I think, re-submission is not a problem. The pricipal reason for re-submission is, if your Web site has changed very much in content, metainformation, you added content, Search Engines activity is low and you need a fresh flush of Search Engines visits to your Web pages.
Avoid unnecessary ( plus frequent ) resubmit.
Link popularity
Is your Web site popular? How to estimate popularity? How many Web sites link to your Web pages?
How many Web pages ( in 100s ) do you publish? Do your Web pages enough text-content, images and other media?
What about traffic ( in 1000s )? If you've positive answers to these questions, yes! your Web site is a popular place on WWW ( World Wide Web ).
Page Rank ( PR ) value
Serach Engines provide toolbars that fits ( installs ) in browser, mostly MSIE and comforts the surfer in searching and browsing.
Infact, Alexa uses, data sent, from this toolbar to estimate traffic ranks. These toolbars also display PR value 0 to 10 of a Web page.
Page Rank is the importance of a Web page, estimated differently, by different Search Engines.
Google PR is popular and universal. More the PR value, more important the Web page is. PR value 0 or 1 shows that a Web page is of little importance. PR value 2 or 3 means, Search Engine considers something is there!
PR value 4 means, the Web page is important. PR 5 is a good one. PR 6 or 7 is an achievement. Still higher PR is a very very difficult ( herculean! ) task. Apart toolbars provided by all top Search Engines like Google, Yahoo!, MSN, Alexa.
Many Web sites helps to calculate PR value; they ask URL of Web page and output the result for that Web page. You do this if toolbar not installs in browser; you're using other than MSIE.
Life of, as a Web master!
A Web master is a busy creature, whose most close friend is his ( or her ) machine! Infact, it is, his ( or her ) darling!
Other people wonder, why his ( or her ) machine is so much different, so much important and why he ( or she ) is so much affected by a dead thing!
He ( or she ) is very busy with his ( or her ) Web pages, e-mail and surfing, everytime! His ( or her ) family and friends consider him ( or her ), a fool!
Every of this class, has great dreams and I'm just trying to help the new ones in this role, here!.
Search Engine Optimization ( SEO ) Techniques
Let's discuss now, what extra can be done to tame the Web!
Please believe that, here I show you, all the great tricks, shown to you.
Check Your Time!
Discussion for this tutorial
Search Engine Optimization is a hot topic. Why? Just that, every Web site is not on the top of the Web!
Please spare me, if the techniques ( or theory ) discussed here not works ( or varies ) for you or so you conclude its said wrong. Good luck, you Web master!
Errata for this tutorial
None
User comments/feedback for this tutorial
None
This tutorial ends here.
|