Page 1 of 1

How to block indexing certain file types using robot.txt

Posted: Sat Mar 13, 2010 6:38 am
by Neo
I will configure the robot.txt file to avoid indexing PDF files in this example.
There are two methods.
  1. You can store all your PDF files inside a directory and block that being index by bots

    Code: Select all

    User-agent: *
    Disallow: /pdf-directory/
  2. Block PDF file types (since we use $ at the end, this will block any URL that ends with pdf)

    Code: Select all

    User-agent: *
    Disallow: /*.pdf$
These examples will block all bots. If you want to block a specific bot (Google bot for example) you may define User-agent as below.

Code: Select all

User-agent: Googlebot
See https://robot.lk/viewtopic.php?f=74&t=1474 for more information.