Optimizing Robots.txt for Better Google Indexing on DLE Sites

Optimizing Robots.txt for Better Google Indexing on DLE Sites

When managing multiple sites on DLE, you’ll encounter some common issues. For instance, Google might index printed page versions (URLs like print:…), which don’t contain much useful content, leaving the actual pages under-indexed. Google also tends to prefer cleaner pages with more relevant content. There are cases where URLs like “my-dle-site/user/ya-spammer” are indexed, but they contain irrelevant external links and no actual content.

One important thing to keep in mind is robots.txt. This file can help you manage what Google and other search engines index, including submitting a Sitemap directly to Google without needing to use the Google Search Console.

So, how should you configure your robots.txt for optimal results? After comparing advice from forums and official search engine documentation, it’s clear that many people overlook important instructions, like the “Allow” directive in the file (instead, they only use “Disallow”).

Let’s start by outlining the goals for our robots.txt:

  • Ensure proper indexing of important pages by Google
  • Prevent unnecessary pages from appearing in search results (they’ll eventually drop out of the index anyway)
  • Help Google index your content efficiently

Let’s go step by step:

Block Printed Pages

To prevent search engines from indexing printed page versions:

makefileCopyEditUser-agent: *
Disallow: /*print

Remove Unnecessary Pages from Indexing

These pages have no valuable content and should be excluded:

vbnetCopyEditDisallow: /autobackup.php
Disallow: /admin.php
Disallow: /user/
Disallow: /favorites/
Disallow: /index.php?do=register
Disallow: /?do=lastcomments
Disallow: /statistics.html

Optional (for Increased Precision)

Depending on your preferences, you may want to block additional pages:

bashCopyEditDisallow: /index.php?do=pm
Disallow: /index.php?do=stats
Disallow: /index.php?do=search
Disallow: /index.php?do=addnews
Disallow: /index.php?do=register
Disallow: /index.php?do=feedback
Disallow: /index.php?do=lostpassword
Disallow: /index.php?subaction=newposts

Add Your Sitemap

Your Sitemap helps Google index your site more effectively. To create it, navigate to the admin panel, go to “Other sections,” and click “Google Sitemap.” Leave the default settings and click “Create/Update.” You’ll get a message like:
“01/25/2009 10:37 – The index file for Google Sitemap was created and is available at: http://my-dle-site/sitemap.xml

Now, include this in your robots.txt file:

arduinoCopyEditSitemap: http://my-dle-site/sitemap.xml

Set Host for Google’s Crawl Efficiency

Finally, make sure Google knows how to correctly index your site with or without “www.” Use this directive:

makefileCopyEditUser-agent: Googlebot
Host: my-dle-site

If you prefer the version with “www,” specify that instead.

Example of a Well-Configured Robots.txt for Google:

makefileCopyEditDisallow: /*print
Disallow: /autobackup.php
Disallow: /admin.php
Disallow: /user/
Disallow: /favorites/
Disallow: /index.php?do=register
Disallow: /?do=lastcomments
Disallow: /statistics.html
Sitemap: http://my-dle-site.ru/sitemap.xml
User-agent: Googlebot
Host: my-dle-site

By spending just a few minutes configuring your robots.txt, you’ll help Google (and other search engines) crawl your site more efficiently and avoid unnecessary indexing.

Leave a Reply

Your email address will not be published. Required fields are marked *