Skip to content

Commit f30758c

Browse files
committed
Enable robots.txt handling by default for new projects. Fixes scrapyGH-1668.
For backwards compatibility reasons the default value is not changed.
1 parent 2246280 commit f30758c

File tree

2 files changed

+13
-4
lines changed

2 files changed

+13
-4
lines changed

docs/topics/settings.rst

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -750,8 +750,8 @@ Default: ``60.0``
750750
Scope: ``scrapy.extensions.memusage``
751751

752752
The :ref:`Memory usage extension <topics-extensions-ref-memusage>`
753-
checks the current memory usage, versus the limits set by
754-
:setting:`MEMUSAGE_LIMIT_MB` and :setting:`MEMUSAGE_WARNING_MB`,
753+
checks the current memory usage, versus the limits set by
754+
:setting:`MEMUSAGE_LIMIT_MB` and :setting:`MEMUSAGE_WARNING_MB`,
755755
at fixed time intervals.
756756

757757
This sets the length of these intervals, in seconds.
@@ -877,7 +877,13 @@ Default: ``False``
877877
Scope: ``scrapy.downloadermiddlewares.robotstxt``
878878

879879
If enabled, Scrapy will respect robots.txt policies. For more information see
880-
:ref:`topics-dlmw-robots`
880+
:ref:`topics-dlmw-robots`.
881+
882+
.. note::
883+
884+
While the default value is ``False`` for historical reasons,
885+
this option is enabled by default in settings.py file generated
886+
by ``scrapy startproject`` command.
881887

882888
.. setting:: SCHEDULER
883889

@@ -1036,7 +1042,7 @@ TEMPLATES_DIR
10361042
Default: ``templates`` dir inside scrapy module
10371043

10381044
The directory where to look for templates when creating new projects with
1039-
:command:`startproject` command and new spiders with :command:`genspider`
1045+
:command:`startproject` command and new spiders with :command:`genspider`
10401046
command.
10411047

10421048
The project name must not conflict with the name of custom files or directories

scrapy/templates/project/module/settings.py.tmpl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ NEWSPIDER_MODULE = '$project_name.spiders'
1818
# Crawl responsibly by identifying yourself (and your website) on the user-agent
1919
#USER_AGENT = '$project_name (+http://www.yourdomain.com)'
2020

21+
# Obey robots.txt rules
22+
ROBOTSTXT_OBEY = True
23+
2124
# Configure maximum concurrent requests performed by Scrapy (default: 16)
2225
#CONCURRENT_REQUESTS = 32
2326

0 commit comments

Comments
 (0)