Register
Forgot Password?
Score
-
-
-
  /100
Certificate
SEO Certificate




Share

2.3.4 Make a robots.txt file

Sorry, you are not logged in. Please login to get full access to this page or register if you're not registered yet.

Like a traveler coming to a country, the Robot needs a guide — which is, the robot.txt file. It's a specific guide, though, as it'll only tell the Robot which cities he shouldn't see.

Like, if you're at the crossroads and one of the roads leads to a private place you shouldn't visit, there will be a NO TRESPASSING sign on the way.

No Trespassing Sign
No Tresspassing Sign

A robots.txt file will prevent the Robot from going to some pages with sensitive material, web pages that you don't want to be found through Google search (for instance, the "shopping cart"), pages that are not important or can be negative for your rankings. And, you can direct the Robot to other, keyword-rich pages, instead.

So, if there's something to hide, a robots.txt file is a must for your website. It helps you keep the Robot away from anything that's not good for your Search Engine rankings. Yep, just tell him not to go here or there — and he'll believe you.

You can make a robots.txt file yourself, though it's rather your webmaster's business. So ask your webmaster to write a robots.txt file for your site, or do it yourself.

Remember, if you rewrote the dynamic URLs we talked above, use robots.txt to forbid the old URLs like this:
http://www.myshop.com/showgood.php?category=56&good=54146

After you have the robots.txt file, run it through a validator to ensure it's written correctly.
Hundreds of robots.txt validators can be found on the web. You can apply Google's tool, or this one, for instance:
http://www.invision-graphics.com/robotstxt_validator.html

As soon as the robots.txt file's correct, you needn't worry, as it will only do you a lot of good, and no harm.

DO IT NOW! Make a robots.txt file and validate it. Add it to the root directory of your website.

2.3.5 Make different paths to reach a page

Think of our map and the Robot driver, again. If you can get to a city from two other places, chances are good that the Robot will get there. But if there are 6 roads from 6 different places, leading to one city, chances are much bigger.

Road Map
Different Roads Leading to the Same Place

So let's make as many paths as possible.

When you create a page, try to make links to other pages from it, where possible.

DO IT NOW! Make different pages on your website link to each other.

2.3.6 Fix broken links

You know what's a broken link and how bad it can be? Well, I'll tell you.

About a month ago, I was driving to Walker River, NV. Just wanted to see the place, as my granddad came from there. I only had a 20-year-old map of the place, and it surely turned out to be too old: On the way I chose, the bridge was broken quite long ago. Well, I was a bit disappointed and had to take another road.

But what if it weren't me on the broken bridge? The Robot would try to find another way to go. But he's not that determined to visit all your pages. In the case of a broken link, the Robot may simply leave the page not crawled.

And what is actually a broken link?

A broken link is the one having some elements incorrect or missing from the link's HTML code, or a link that leads to a non-existing web page.

Now what you have to do is check your pages for broken links — and fix them.

DO IT NOW! Find and fix broken links on your web pages. Upload the changed pages to your web server.

Here's a free online tool to help you:
http://validator.w3.org/checklink

Like any other tool you might find on the web, this one has instructions and tells you where those links are and what you have to do to fix them.

50 comments

Add comment Hide commentsShow comments

Please log in or register to leave a comment.

2008-08-07 17:28:23: Wade Weston

On the robots.txt file is is better to do that or use nofollow links? Or does it matter? Thanks

Answer
2008-08-08 09:19:28: Dan Richmond

Hi Wade,

there's a big difference here: robots.txt prevents your page from being crawled and indexed by search engine, and with robots.txt there's totally no way how your page can appear in Google's index.

As for nofollow, it only tells search engines not to follow this very exact link. They make come to this page through another link you never even suspect about - and index it.

Therefore, it's definitely better to make robotx.txt.

Answer
2008-10-03 11:16:23: Sascha Hillmann

I just did a mod_rewrite. Isn't it better to redirect 301 the old url's ? Why do you prefer using a robots.txt in this case?

Thanks!

Answer
2008-10-06 11:45:07: Dan Richmond

Hi Sascha,

seems like there's some misunderstanding here...

Mod_rewrite and robots.txt are totally different. In fact mod_rewrite is just one of the ways to make 301 redirect, and that's made not with robots.txt, but with .htaccess.

You can google for more info about mod_rewrite or check this page for example:
http://www.phatz.com/301redirect.php

Answer
2009-01-30 23:12:21: n c

Hi Dan,

thanks for great info,

somewhere on Google guidance read something as: the internal links should not be made as "every pages link to every other pages", this seems in contrast here ?

Answer
2009-02-02 10:11:29: Dan Richmond

@n c

Hi,
and thanks for the question.

As any other recommendation, the one I give here: Make different pages on your website link to each other. should be used smartly.

The first reason why it's good to make links from pages A, B of your site to pages C, D is to speed up indexing of pages C, D when you just created them.

Secondly, links from the-same-site's pages are one of the factors that help Google determine which page(s) of your site are more important compared to the rest. For example, if B is your most important landing page, and C is just the shopping cart, you will hardly want Google to show page C (shopping cart) higher than B (landing page). To avoid the problem, you should make a lot of internal links to page B and fewer links to page C.

Also, if you frequently update content on some pages, say on page D, most likely you need search engines robots to crawl this page reqularly and index this new content. If you have a lot of internal links to page D, Google will assume it's more important than the rest, and will be "re-indexing" it more frequently.

So it makes good sense to interlink your most important pages and to link to them from other pages you have. This will make these pages more weighty to Googles. And this will make Google crawl these pages faster and/or more often.

Hope this makes sense now.

Answer
2009-04-03 16:01:51: Investigator Jobs

Im getting so many good tips. Thanks Dan and others who are providing the great info.

Question....

Should I be using and/or Whats Better????

Should I use a robots.txt and "robots Meta Tags"" Together??
Should I be using only One instead of the other??? If so, Which one and why???

Any insight would be greatly appreciated.

Thank you

Andrew

Answer
2009-04-06 10:33:04: Dan Richmond

@Andrew Collins

To the search engines it doesn't really matter whether you use a robots.txt or robots Meta Tags because they basically perform the same function. From the point of view of usability a robots.txt file is much easier to set up and maintain because you have all the data in one place and don't need to configure each of your pages separately. Therefore I'd recommend using a robots.txt file.

Answer
2009-09-14 10:54:55: Oscar Del Santo

Thank you Dan for putting together this great resource, I feel as if I am learning so much useful stuff. Cheers.

Answer
2009-10-13 08:07:16: vichitra singh

hi dan,
http://www.robotstxt.org this link is not working properly

Answer
2009-10-15 16:54:55: Jivko Georgiev

Google says that they don't read robots.txt ?

Answer
2009-10-16 09:12:32: Dan Richmond

Jivko Georgiev
Sorry if I'm not getting you right but if you follow the link you'll see what google says.

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449

"Google won't crawl or index the content of pages blocked by robots.txt"

Answer
2009-10-23 03:08:25: Watch Naruto Online

Broken link problem sometimes can disturb reputation website.maybe certain tools can fix this problem.

but for me i still need help from my visitors.i ask them if they find any broken link at my website just report me as i can fix the problem as soon as possible.

what i make at my website is just provide a form of report so that the visitor can help us to solve broken link problem.

they are kind and they will help us

Answer
2009-12-08 04:11:33: Ken Taylor

This is getting better all the time. I just want to make sure I've got this right. If I have a 7-page website. Homepage based on KW-1, p-2 on KW-2, p-3 on KW-3, p4 on KW-4, p5 on KW-5, p6 on About, p7 on Sitemap; then I would make sure that "About" and "Sitemap" pages get blocked.

And,

All pages should have navigation links to all other pages. There's a navigation menu in the side bar.

A question: Have you ever heard or seen a meta-tag that gives partial value to a page? The reason I ask is that my 'Sitemap' plugin for RapidWeaver let's me set up different values for my webpages. Does Google care about values? I've been trying to find an answer to this in the RW forums.

Answer
2009-12-09 11:22:34: Dan Richmond

@Ken Taylor

then I would make sure that "About" and "Sitemap" pages get blocked.

There's no point "blocking" them.

Have you ever heard or seen a meta-tag that gives partial value to a page? The reason I ask is that my 'Sitemap' plugin for RapidWeaver let's me set up different values for my webpages. Does Google care about values? I've been trying to find an answer to this in the RW forums.

You will rather find an answer in Google Webmaster Tools.

Answer
2009-12-13 19:10:08: Ken Taylor

@Dan Richmond

Thanks Dan. In fact, they're not. My mistake in posting. Sorry.

I've also just spent over an hour searching for some more insight about "partial value to a page".

First, I'm using the wrong terminology, it's "Priority". With my Sitemap plug-in I can assign different "priorities" to different pages. Here's what I've learned and would like to pass on for what it's worth:

1. Priorities range from 0.0 to 1 with 0.5 being the default (Which I infer as Google has set for all pages it's indexed with no priorities set)

2. Our most important pages should have a priority of 1. This doesn't mean that the web page you give a "priority 1" is going to rank higher in the SE, it just means that "relative" to all other pages on your site, "Please Dear Mr. Google, give my Priority 1 pages more scan and indexing time."

3. Priority is a meta-tag

4. An example: My main page is priority 1. My "About," "Contact," pages are Priority 0.5

And a closing note about robots.txt file: If you want Google to crawl "every pixel of your website"- everything, than you don't need a robots.txt.

You can kind of infer that from Dan's lesson above but it was nice to hear it from the "Big G" itself.

Here's the link. It's the second paragraph.

http://www.google.com/support/webmasters/bin/answer.py?answer=156449&cbid=1p87nufo7j7jz&src=cb&lev=answer


Answer
2009-12-21 10:44:10: Dan Richmond

@Ken Taylor

Thanks, Ken, I guess anyone will find the info about priority useful, as well a Google's comments on robots.txt

Answer
2009-12-30 05:34:31: Paul Watchorn

That was another great lesson.
I have a couple of questions:

I followed the link to the validator, run the check and this was the result:

Found 66 anchors.

Valid anchors!

Checked 1 document in 31.22 seconds
I take it that this means that I don't have any broken links?

The other this is, I have heard about W3C compliant web sites, I am not sure what that means exactly.

I use a wysiwyg web site making program, and the latest upgrade claimed to be W3C compliant, so I hope I am ok with it, is there a quick artice that I can read about this W3C.

Thanks

Answer
2010-01-05 06:09:42: Dan Richmond

@Paul Watchorn

Yes, I guess the check means no errors. As for the w3c validation, I see here that you already figured it out:
http://www.seoinpractice.com/crawlability-flash.html#3_10

Answer
2010-01-27 16:30:59: Raleigh Makarechian

Hello,

Would you confirm that this robots.txt file is good.

# robots.txt generated at http://www.mcanerin.com
User-agent: *
Disallow: /cgi-bin/
Sitemap: http://www.technicalanalysismentor.com/sitemap.xml


I have no idea what elements it should contain. I want all robots to vist.

Thank you.

Raleigh

Answer
2010-01-28 12:13:52: Dan Richmond

@ Raleigh Makarechian

Firstly, you do not need a robots.txt file if you are not going to restrict any pages in it.

Secondly, the sitemap isn't working right. Looks like you need to remove this line from the sitemap:

<?xml-stylesheet type="text/xsl" href="http://technicalanalysismentor.com/wp-content/plugins/google-sitemap-generator/sitemap.xsl"?>

Answer
2010-01-28 14:05:58: Raleigh Makarechian

Hello Dan,

Thank you so much for your help. I removed the robots.txt file and deleted that line of code as you instructed.

I am really enjoying this course. Thank you for putting it together and lending your time to help people with SEO.

Raleigh :)

Answer
2010-02-02 21:05:54: mike kikker

So should I use robot.txt to disallow bots to my "about" page and my "privacy page" and pages like these that aren't related to my keywords? If so, should I ALSO nofollow the links to these pages to direct the "link juice" to the other more important internal pages of my site?

Answer
2010-03-30 08:55:38: sharon phillips

Hi Dan, I went on to the validater site to check my webpages and got the message: status 500 cant connect to uniquewildwoodfurniture.co.uk:80(connect:timeout)
This is a server side problem, check the URL, could this be because I only have a dongle and the connection is slow?

Answer
2010-04-02 11:25:42: Dan Richmond

Sharon,

I looked at your website and couldn't find anything that might cause such a problem. I am sorry to admit, but I will not be of any help here :(((

Answer
2010-04-02 11:37:26: sharon phillips

lol thanks for trying anyway,i also have a small issue about changing my landing page, we make bespoke free form natural edge furniture, the keywords i have looked at are handcrafted, solid wood, coffee table designs, at the momment we rank on the first page for unique, unusual, free form and natural edge, the auditor is telling me to change my web name, obviously this is a small niche market so will not attract that many people unless they are looking for our particular product, should i change my whole website to get higher traffic?

Thanks sharon

Answer
2010-04-07 11:03:18: Dan Richmond

What do you mean under "web name"? If this is the page's title then well there might be some changes needed for your current title. Once you've figured out what your most important keywords are, you should place them closer to the beginning of the title. So if you want to rank for "wood furniture" - there's no poing writing it in CAPS, just put it in the beginning of the title.

Answer
2010-04-04 21:17:13: Derek Giesbrecht

Hi Dan and others gathered here. First thanks for putting this course together, it is a really big help.

Regarding the robots.txt files... I do not have a robots.txt file and am using joomla with a members area secured by a user login/password. I would assume that even though google may crawl these pages - those not registered will not have access.

They will however still find my site and hopefully be interested enough to follow the menu options on the restricted page and see what else we have to offer.

Is this a fair assumption?

By the way, let me say the SEO Power Tools Suite has already been a great asset to my business.

Thanks Derek

Answer
2010-04-07 12:41:25: Dan Richmond

Yes, you are right.
And I'm glad that the recommended software has been helpful!

Answer
2010-06-05 18:49:19: Nina Spring

Hi Dan,

Recently I’ve launched five bilingual sites.

Three days ago I was exploring Google Analytics and realised that one of my sites was blocked because I didn’t have robots.txt file. What can I say, I learned as fast as I could, created and submitted the file for all other sites and started to watch how crawler was going through.

Two days it been going through my Russian pages, not .com or .co.uk but _ru’s only.

Despite the fact that I’ve html/css sitemaps, crawler still can’t see them.

To the moment, 15 pages were checked (11 of them _ru) out of 98 without problem and I see meta name verification in the files 1:1 to Diagnostics info, but Blocked URL report is still there.

How and what shell I do to improve the situation?

Thanks,

Nina

Answer
2010-06-10 07:47:06: Dan Richmond

What does the "blocked URL report" say? Are you sure it was blocked because of absence or robots.txt (seems unlikely)?

You should try to create an xml sitemap and submit it to Google Webmaster's Tools - that should help your site get indexed. Also put some links to your English pages on frequently crawled locations (high traffic social media).

Answer
2010-08-16 23:21:12: Gerhard Bayer

Dan,
Some of our competitors use the following meta tags:
meta content="index, follow" name="robots"
meta content="index, follow" name="googlebot"
Can you explain what it means and would you recommend this?
Thanks.

Answer
2010-08-18 07:56:16: Dan Richmond

A "noindex, nofollow" tag would make sense, but the ones you have mentioned are not really useful - basic they are saying a page can be indexed although any page can be indexed as long as there is no restriction. There is no point in adding these to your site, most probably they are default tags of some CMS your competition is using.

Answer
2010-09-01 20:05:45: Make Money at Home

I used: http://validator.w3.org/checklink

The often results are:

>> Status: (N/A) Forbidden by robots.txt
The link was not checked due to robots exclusion rules. Check the link manually.

Answer
2010-09-02 06:47:15: Dan Richmond

Then it seems like the URLs are not allowed by the robots.txt file: http://www.robotstxt.org/robotstxt.html

Answer
2010-09-05 15:08:34: Make Money at Home

I'm not exactly sure what that means. I guess this means that robots can (should) follow this link?
In my case these are Anchor texts that lead to:
- Google, FaceBook, YouTube...
- future links to internal pages. (ahead)
This should mean that robots do not use them (track)? Right? Or...?

1). Is this good for me or not?
I want to have links to such large sites. (Google, FaceBook, YouTube). I'm not sure. If robots do not follow links to my internal pages (future) - 2).I think that this is not good... or ?

3). If I'm right how can I do most easily all or certain links to be tracked by robots?

Thanks

Answer
2010-09-06 11:04:32: Dan Richmond

> I'm not exactly sure what that means. I guess this means that robots can (should) follow this link?

Well, if they are restricted (disallowed), then it means that robots do not follow these links.

> - Google, FaceBook, YouTube...

Most porbably the external links are not analyzed by the validator, so it's about these links:

"- future links to internal pages. (ahead)" - the links that lead to non-existing or restricted pages will be indicated by the validator.

> 1). Is this good for me or not?

Broken site-wide links are never good.

3). If I'm right how can I do most easily all or certain links to be tracked by robots?

Make sure no links point to the restricted folders or simply allow these folders in your robots file:

http://www.moneyonlinetreasure.com/robots.txt

You might want to contact Link-Assistant.Com customer support - they'll help you out with these issues:

www.link-assistant.com/support

Answer
2010-09-11 15:44:33: Keith Rejino

Dan - SEO Powersuite and this course has made a big difference for my website rankings.

As I'm always trying to improve my content and keep it up-to-date, how do you handle fixing the broken links to articles that you delete over time?

Thanks,
Keith

Answer
2010-10-27 21:07:21: GARY BRYANT

Can someone please run the validator on my site, http://www.diabetespatientsite.com

Do I need to worry, or are these errors have to do with GoogleSites?

Answer
2010-11-16 13:02:21: GARY BRYANT

Hi Dan,

Can you help me with this? I ran the validator, but I can't change the Javascript code in my site created with Google Sites. How bad are these numerous errors found, or can I fix some or all of them somehow? Since the code is created by Google Sites, can I assume that they know what they are doing or not in terms of coding correctly? Aren't they experts at this? Confused?

Gary

Answer
2010-11-27 03:28:21: khairul Amry

Hi Dan. I think I got problem here. About Sitemap. In my Blog, I already have Sitemap before but not from Google script. Instead i use script which Created by my friend.

By use those script, i try add my sitemap to google. I think I was right, but.... I got robot.txt at my URL homepage!! http://amry85.blogspot.com/robot.txt

What should I do?

Help...

Answer
2010-12-01 08:56:27: Dan Richmond

Not sure I get your question right... Your sitemap is here:

http://amry85.blogspot.com/feeds/posts/default?orderby=updated you can submit it to GWT;

Your robots.txt file is here and it also looks valid:

http://amry85.blogspot.com/robots.txt

So please let me know what exactly the problem was.

Answer
2011-01-26 08:51:21: locksmith cannock

Do google penalise for using robots.txt?
I use it to hide any bad links people have posted elsewhere,
ie if someone links to my site with an error in the link it shows up on my google webmaster tools error checklist.
As there is no direct way to correct someone elses bad link i tell google not to go there

Answer
2011-01-27 06:08:26: Dan Richmond

Do you mean that people post links to non-existent pages and you would like to hide those pages? Restricting such pages would not carry more practical value than having 404 pages. What you could do is redirect any of those non-existent URLs to your actual promoted pages with permanent 301 redirects — that way you'll be still getting some SEO value from these links.

Answer
2011-02-21 12:15:46: Bruce Wood

Thanks, for all this information. I have spent a great deal of time finding most of this information in 100's of places. To have it in one location layed out from beginning to end makes me wish I had found you 1st.

I added the sitemap line to my robots.txt file as stated on the previous page. Comment below.

Right now, MSN.com doesn't have a similar submission form. So to submit your sitemap, simply add the following line to your robots.txt file (you'll read about robots.txt just a bit later):

Sitemap: http://www.yoursite.com/sitemap.xml

Then I ran it through the validater on this page. see comment below

After you have the robots.txt file, run it through a validator to ensure it's written correctly.
Hundreds of robots.txt validators can be found on the web. You can apply Google's tool, or this one, for instance:
http://www.invision-graphics.com/robotstxt_validator.html


I received the following error

Sitemap: http://www.bowequipped.com/sitemapindex.xml - Error! You used an invalid syntax

Can you see why this is invalid?




Answer
2011-02-25 13:05:00: syahroman syahroman

his thanks for the info,,,,,!!!

mengkin this could be the science for me to understand more about the world of blogging,,,,!!

Answer
2011-04-09 09:46:03: Front Doors

That link checker is cool. It found 40 broken links. I'm going to have some fun fixing them now! Thanks for the heads up.

Answer
2011-06-12 10:44:12: Jack Peters

Hi Dan,
I am so new to this that I haven't found my knees yet never mind crawling. The detail you go into is great and I am trying to absorb it all.

We have just had a new site built and I have run the site auditor to find loads of broken links In Google Webmaster it shows that alot of these are internal links but I can't find them in the pages. I am at a loss as to how I find and correct these broken links

Answer
2011-06-17 01:45:37: Travel Club

Wow, I found a lot of broken links.
I better get to work.

Answer
2011-07-13 05:58:00: thu vinh

Hi Dan,

When doing SEO for a company that doesn't have IT and the partner that created the company's website rejects to modify the website's content, I must choose the way to rewrite the URL!

Google can penalty if I doing this way?

Answer