Skip to content

Commit 16d9a33

Browse files
committed
added FAQ entry about working with big data feeds
1 parent 3dc6777 commit 16d9a33

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

docs/faq.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,3 +228,15 @@ which scrapes one of these sites.
228228

229229
.. _this page: http://search.cpan.org/~ecarroll/HTML-TreeBuilderX-ASP_NET-0.09/lib/HTML/TreeBuilderX/ASP_NET.pm
230230
.. _example spider: http://github.com/AmbientLighter/rpn-fas/blob/master/fas/spiders/rnp.py
231+
232+
What's the best way to parse big XML/CSV data feeds?
233+
----------------------------------------------------
234+
235+
Parsing big feeds with XPath selectors can be problematic since they need to
236+
build the DOM of the entire feed in memory, and this can be quite slow and
237+
consume a lot of memory.
238+
239+
In order to avoid parsing all the entire feed at once in memory, you can use
240+
the functions ``xmliter`` and ``csviter`` from ``scrapy.utils.iterators``
241+
module. In fact, this is what the feed spiders (see :ref:`topics-spiders`) use
242+
under the cover.

0 commit comments

Comments
 (0)