Information Gathering / Robots.txt File
Robots.txt is a text file used by web servers to indicate which parts of a website can or cannot be accessed by a web crawler. It is also used to define a list of webpages that should not be indexed or crawled by search engine bots. This vulnerability occurs when the robots.txt file is not properly configured or is missing, thus allowing unauthorized access to sensitive information or parts of the website (CWE-245). This could lead to information gathering, which is a type of attack that seeks to obtain sensitive data such as usernames, passwords, credit card numbers, and other confidential information (OWASP Testing Guide).
If the robots.txt file is not properly configured or is missing, an attacker can gain access to sensitive information or parts of the website, such as file directories, that were meant to remain private. This can lead to an information gathering attack, which can have serious consequences for the security of the website. The risk assessment of this vulnerability is high.
The solution to this vulnerability is to ensure that the robots.txt file is properly configured and up to date. It should be configured to prevent unauthorized access to sensitive information or parts of the website.
The following example from CVE-2020-15044 illustrates a robots.txt file that has been misconfigured, allowing access to a directory containing sensitive files:
User-agent: * Disallow: Allow: /sensitive-directory