Skip to content

Commit 0797680

Browse files
committed
Merge branch 'release-1.0.4' into 1.0
2 parents 55f7104 + 7dfa979 commit 0797680

37 files changed

+217
-62
lines changed

.travis-workarounds.sh

Lines changed: 0 additions & 15 deletions
This file was deleted.

.travis.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
language: python
22
python: 2.7
3+
sudo: false
4+
branches:
5+
only:
6+
- master
7+
- /^\d\.\d+$/
38
env:
49
- TOXENV=py27
510
- TOXENV=precise
611
- TOXENV=py33
712
- TOXENV=docs
813
install:
9-
- "./.travis-workarounds.sh"
1014
- pip install -U tox twine wheel
1115
script: tox
1216
notifications:
@@ -15,6 +19,9 @@ notifications:
1519
skip_join: true
1620
channels:
1721
- irc.freenode.org#scrapy
22+
cache:
23+
directories:
24+
- $HOME/.cache/pip
1825
deploy:
1926
provider: pypi
2027
distributions: "sdist bdist_wheel"

README.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,22 @@ Scrapy
66
:target: https://pypi.python.org/pypi/Scrapy
77
:alt: PyPI Version
88

9+
.. image:: https://img.shields.io/pypi/dm/Scrapy.svg
10+
:target: https://pypi.python.org/pypi/Scrapy
11+
:alt: PyPI Monthly downloads
12+
913
.. image:: https://img.shields.io/travis/scrapy/scrapy/master.svg
1014
:target: http://travis-ci.org/scrapy/scrapy
1115
:alt: Build Status
1216

1317
.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg
1418
:target: https://pypi.python.org/pypi/Scrapy
1519
:alt: Wheel Status
20+
21+
.. image:: http://static.scrapy.org/py3progress/badge.svg
22+
:target: https://github.com/scrapy/scrapy/wiki/Python-3-Porting
23+
:alt: Python 3 Porting Status
24+
1625

1726
Overview
1827
========

conftest.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import glob
22
import six
33
import pytest
4+
from twisted import version as twisted_version
45

56

67
def _py_files(folder):
@@ -21,6 +22,9 @@ def _py_files(folder):
2122
"scrapy/spider.py",
2223
] + _py_files("scrapy/contrib") + _py_files("scrapy/contrib_exp")
2324

25+
if (twisted_version.major, twisted_version.minor, twisted_version.micro) >= (15, 5, 0):
26+
collect_ignore += _py_files("scrapy/xlib/tx")
27+
2428

2529
if six.PY3:
2630
for line in open('tests/py3-ignores.txt'):

docs/conf.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,10 @@
108108
#html_theme_options = {}
109109

110110
# Add any paths that contain custom themes here, relative to this directory.
111-
#html_theme_path = []
111+
# Add path to the RTD explicitly to robustify builds (otherwise might
112+
# fail in a clean Debian build env)
113+
import sphinx_rtd_theme
114+
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
112115

113116

114117
# The style sheet to use for HTML and HTML Help pages. A file of that name

docs/contributing.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,14 @@ tests requires `tox`_.
146146
Running tests
147147
-------------
148148

149+
Make sure you have a recent enough `tox`_ installation:
150+
151+
``tox --version``
152+
153+
If your version is older than 1.7.0, please update it first:
154+
155+
``pip install -U tox``
156+
149157
To run all tests go to the root directory of Scrapy source code and run:
150158

151159
``tox``

docs/faq.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ I get "Filtered offsite request" messages. How can I fix them?
144144
Those messages (logged with ``DEBUG`` level) don't necessarily mean there is a
145145
problem, so you may not need to fix them.
146146

147-
Those message are thrown by the Offsite Spider Middleware, which is a spider
147+
Those messages are thrown by the Offsite Spider Middleware, which is a spider
148148
middleware (enabled by default) whose purpose is to filter out requests to
149149
domains outside the ones covered by the spider.
150150

docs/index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ First steps
2828
===========
2929

3030
.. toctree::
31+
:caption: First steps
3132
:hidden:
3233

3334
intro/overview
@@ -53,6 +54,7 @@ Basic concepts
5354
==============
5455

5556
.. toctree::
57+
:caption: Basic concepts
5658
:hidden:
5759

5860
topics/commands
@@ -110,6 +112,7 @@ Built-in services
110112
=================
111113

112114
.. toctree::
115+
:caption: Built-in services
113116
:hidden:
114117

115118
topics/logging
@@ -138,6 +141,7 @@ Solving specific problems
138141
=========================
139142

140143
.. toctree::
144+
:caption: Solving specific problems
141145
:hidden:
142146

143147
faq
@@ -203,6 +207,7 @@ Extending Scrapy
203207
================
204208

205209
.. toctree::
210+
:caption: Extending Scrapy
206211
:hidden:
207212

208213
topics/architecture
@@ -240,6 +245,7 @@ All the rest
240245
============
241246

242247
.. toctree::
248+
:caption: All the rest
243249
:hidden:
244250

245251
news

docs/intro/install.rst

Lines changed: 94 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ The installation steps assume that you have the following things installed:
1414
* `Python`_ 2.7
1515

1616
* `pip`_ and `setuptools`_ Python packages. Nowadays `pip`_ requires and
17-
installs `setuptools`_ if not installed.
17+
installs `setuptools`_ if not installed. Python 2.7.9 and later include
18+
`pip`_ by default, so you may have it already.
1819

1920
* `lxml`_. Most Linux distributions ships prepackaged versions of lxml.
2021
Otherwise refer to http://lxml.de/installation.html
@@ -23,9 +24,7 @@ The installation steps assume that you have the following things installed:
2324
where the Python installer ships it bundled.
2425

2526
You can install Scrapy using pip (which is the canonical way to install Python
26-
packages).
27-
28-
To install using pip::
27+
packages). To install using ``pip`` run::
2928

3029
pip install Scrapy
3130

@@ -34,6 +33,22 @@ To install using pip::
3433
Platform specific installation notes
3534
====================================
3635

36+
Anaconda
37+
--------
38+
39+
.. note::
40+
41+
For Windows users, or if you have issues installing through `pip`, this is
42+
the recommended way to install Scrapy.
43+
44+
If you already have installed `Anaconda`_ or `Miniconda`_, the company
45+
`Scrapinghub`_ maintains official conda packages for Linux, Windows and OS X.
46+
47+
To install Scrapy using ``conda``, run::
48+
49+
conda install -c scrapinghub scrapy
50+
51+
3752
Windows
3853
-------
3954

@@ -58,7 +73,8 @@ Windows
5873

5974
Be sure you download the architecture (win32 or amd64) that matches your system
6075

61-
* Install `pip`_ from https://pip.pypa.io/en/latest/installing.html
76+
* *(Only required for Python<2.7.9)* Install `pip`_ from
77+
https://pip.pypa.io/en/latest/installing.html
6278

6379
Now open a Command prompt to check ``pip`` is installed correctly::
6480

@@ -79,13 +95,80 @@ Instead, use the official :ref:`Ubuntu Packages <topics-ubuntu>`, which already
7995
solve all dependencies for you and are continuously updated with the latest bug
8096
fixes.
8197

98+
If you prefer to build the python dependencies locally instead of relying on
99+
system packages you'll need to install their required non-python dependencies
100+
first::
101+
102+
sudo apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
103+
104+
You can install Scrapy with ``pip`` after that::
105+
106+
pip install Scrapy
107+
108+
.. note::
109+
110+
The same non-python dependencies can be used to install Scrapy in Debian
111+
Wheezy (7.0) and above.
112+
82113
Archlinux
83114
---------
84115

85116
You can follow the generic instructions or install Scrapy from `AUR Scrapy package`::
86117

87118
yaourt -S scrapy
88119

120+
Mac OS X
121+
--------
122+
123+
Building Scrapy's dependencies requires the presence of a C compiler and
124+
development headers. On OS X this is typically provided by Apple’s Xcode
125+
development tools. To install the Xcode command line tools open a terminal
126+
window and run::
127+
128+
xcode-select --install
129+
130+
There's a `known issue <https://github.com/pypa/pip/issues/2468>`_ that
131+
prevents ``pip`` from updating system packages. This has to be addressed to
132+
successfully install Scrapy and its dependencies. Here are some proposed
133+
solutions:
134+
135+
* *(Recommended)* **Don't** use system python, install a new, updated version
136+
that doesn't conflict with the rest of your system. Here's how to do it using
137+
the `homebrew`_ package manager:
138+
139+
* Install `homebrew`_ following the instructions in http://brew.sh/
140+
141+
* Update your ``PATH`` variable to state that homebrew packages should be
142+
used before system packages (Change ``.bashrc`` to ``.zshrc`` accordantly
143+
if you're using `zsh`_ as default shell)::
144+
145+
echo "export PATH=/usr/local/bin:/usr/local/sbin:$PATH" >> ~/.bashrc
146+
147+
* Reload ``.bashrc`` to ensure the changes have taken place::
148+
149+
source ~/.bashrc
150+
151+
* Install python::
152+
153+
brew install python
154+
155+
* Latest versions of python have ``pip`` bundled with them so you won't need
156+
to install it separately. If this is not the case, upgrade python::
157+
158+
brew update; brew upgrade python
159+
160+
* *(Optional)* Install Scrapy inside an isolated python environment.
161+
162+
This method is a workaround for the above OS X issue, but it's an overall
163+
good practice for managing dependencies and can complement the first method.
164+
165+
`virtualenv`_ is a tool you can use to create virtual environments in python.
166+
We recommended reading a tutorial like
167+
http://docs.python-guide.org/en/latest/dev/virtualenvs/ to get started.
168+
169+
After any of these workarounds you should be able to install Scrapy::
170+
171+
pip install Scrapy
89172

90173
.. _Python: https://www.python.org/
91174
.. _pip: https://pip.pypa.io/en/latest/installing.html
@@ -95,3 +178,9 @@ You can follow the generic instructions or install Scrapy from `AUR Scrapy packa
95178
.. _OpenSSL: https://pypi.python.org/pypi/pyOpenSSL
96179
.. _setuptools: https://pypi.python.org/pypi/setuptools
97180
.. _AUR Scrapy package: https://aur.archlinux.org/packages/scrapy/
181+
.. _homebrew: http://brew.sh/
182+
.. _zsh: http://www.zsh.org/
183+
.. _virtualenv: https://virtualenv.pypa.io/en/latest/
184+
.. _Scrapinghub: http://scrapinghub.com
185+
.. _Anaconda: http://docs.continuum.io/anaconda/index
186+
.. _Miniconda: http://conda.pydata.org/docs/install/quick.html

docs/topics/broad-crawls.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ These are some common properties often found in broad crawls:
3434

3535
As said above, Scrapy default settings are optimized for focused crawls, not
3636
broad crawls. However, due to its asynchronous architecture, Scrapy is very
37-
well suited for performing fast broad crawls. This page summarize some things
37+
well suited for performing fast broad crawls. This page summarizes some things
3838
you need to keep in mind when using Scrapy for doing broad crawls, along with
3939
concrete suggestions of Scrapy settings to tune in order to achieve an
4040
efficient broad crawl.
@@ -46,7 +46,7 @@ Concurrency is the number of requests that are processed in parallel. There is
4646
a global limit and a per-domain limit.
4747

4848
The default global concurrency limit in Scrapy is not suitable for crawling
49-
many different domains in parallel, so you will want to increase it. How much
49+
many different domains in parallel, so you will want to increase it. How much
5050
to increase it will depend on how much CPU you crawler will have available. A
5151
good starting point is ``100``, but the best way to find out is by doing some
5252
trials and identifying at what concurrency your Scrapy process gets CPU

docs/topics/deploy.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This section describes the different options you have for deploying your Scrapy
88
spiders to run them on a regular basis. Running Scrapy spiders in your local
99
machine is very convenient for the (early) development stage, but not so much
1010
when you need to execute long-running spiders or move spiders to run in
11-
production continously. This is where the solutions for deploying Scrapy
11+
production continuously. This is where the solutions for deploying Scrapy
1212
spiders come in.
1313

1414
Popular choices for deploying Scrapy spiders are:

docs/topics/downloader-middleware.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -736,7 +736,7 @@ RetryMiddleware
736736

737737
.. class:: RetryMiddleware
738738

739-
A middlware to retry failed requests that are potentially caused by
739+
A middleware to retry failed requests that are potentially caused by
740740
temporary problems such as a connection timeout or HTTP 500 error.
741741

742742
Failed pages are collected on the scraping process and rescheduled at the

docs/topics/exceptions.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ remain disabled. Those components include:
5757

5858
* Extensions
5959
* Item pipelines
60-
* Downloader middlwares
60+
* Downloader middlewares
6161
* Spider middlewares
6262

6363
The exception must be raised in the component constructor.

docs/topics/extensions.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Extensions use the :ref:`Scrapy settings <topics-settings>` to manage their
1717
settings, just like any other Scrapy code.
1818

1919
It is customary for extensions to prefix their settings with their own name, to
20-
avoid collision with existing (and future) extensions. For example, an
20+
avoid collision with existing (and future) extensions. For example, a
2121
hypothetic extension to handle `Google Sitemaps`_ would use settings like
2222
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
2323

@@ -145,7 +145,7 @@ Here is the code of such extension::
145145
self.items_scraped += 1
146146
if self.items_scraped % self.item_count == 0:
147147
logger.info("scraped %d items", self.items_scraped)
148-
148+
149149

150150
.. _topics-extensions-ref:
151151

docs/topics/firebug.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ they work as we expect.
118118

119119
As you can see, the page markup is not very descriptive: the elements don't
120120
contain ``id``, ``class`` or any attribute that clearly identifies them, so
121-
we''ll use the ranking bars as a reference point to select the data to extract
121+
we'll use the ranking bars as a reference point to select the data to extract
122122
when we construct our XPaths.
123123

124124
After using FireBug, we can see that each link is inside a ``td`` tag, which is

docs/topics/item-pipeline.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ contain a price::
9595
Write items to a JSON file
9696
--------------------------
9797

98-
The following pipeline stores all scraped items (from all spiders) into a a
98+
The following pipeline stores all scraped items (from all spiders) into a
9999
single ``items.jl`` file, containing one item per line serialized in JSON
100100
format::
101101

0 commit comments

Comments
 (0)