New Message: Enhancement request: robots.txt improvements

webmaster at webmaster at
Tue Dec 20 23:44:41 CST 2005

A new message was posted:


By: Matt Deatherage (frontier at

I decided to search Google for a message on my own Weblog. There's only one message in question, from 2005.06.15, and it has no comments and no trackbacks. You can find it here [1].

When I search Google for the two defining words in the title, it returns /1,210/ results [2] from the site.

Manila has so many ways of returning the same messages that the robots don't know how to deal with it. Google clearly read that message (or pages linking to it) over 1,000 times off my server to create that list, and it's just not necessary. It found all the print-friendly links, all the discussion group topic pages, archive pages, and the other variants. It found the different URLs for the same message (like the discussion group message page URL and the permalink URL).

So far, this hasn't angered any search engines into blacklisting Manila sites for "gaming the system" by posting "thousands of self-links," though I've heard of it happening to some Web sites. More importantly to me, I just don't need Google and 12 other search engines requesting 1000 different pages for every message.

Can Manila *please* come with a default robots.txt setup that would prevent search engines from doing this? It should find every message and comment and story once, by preferred URL, and not 30 variants by day, archive, print-friendly, and so on. I don't have the few days to spend constructing such a monster, but if it were posted here, I'd happily use it. I can use to cut down on useless Internet traffic as much as the next guy.

Even so, I think this should be a default feature in Manila - reducing bandwidth by not having everyone crawling a site find the same page dozens of times. I don't object to it being an option, but I'd really really like to have it as an option.


This is a Manila site...

More information about the Manila-Users mailing list