How to block indexing certain file types using robot.txt

Web hosting, SEO, etc... related
Post Reply
User avatar
Neo
Site Admin
Site Admin
Posts: 2642
Joined: Wed Jul 15, 2009 2:07 am
Location: Colombo

How to block indexing certain file types using robot.txt

Post by Neo » Sat Mar 13, 2010 6:38 am

I will configure the robot.txt file to avoid indexing PDF files in this example.
There are two methods.
  1. You can store all your PDF files inside a directory and block that being index by bots

    Code: Select all

    User-agent: *
    Disallow: /pdf-directory/
  2. Block PDF file types (since we use $ at the end, this will block any URL that ends with pdf)

    Code: Select all

    User-agent: *
    Disallow: /*.pdf$
These examples will block all bots. If you want to block a specific bot (Google bot for example) you may define User-agent as below.

Code: Select all

User-agent: Googlebot
See https://robot.lk/viewtopic.php?f=74&t=1474 for more information.
Post Reply

Return to “Web Related”