-
Notifications
You must be signed in to change notification settings - Fork 83
Fix the bug where ConnectionPool cannot be used with multiprocessing #314
base: master
Are you sure you want to change the base?
Conversation
couchdb/tests/client.py
Outdated
@@ -19,6 +21,10 @@ | |||
from couchdb.tests import testutil | |||
|
|||
|
|||
def _current_pid(): | |||
return multiprocessing.current_process().pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would os.getpid()
make more sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I reported this (or at least a very similar) issue back in 2011: #205 I ended up solving my issue with application level code like:
Not pretty, but gets the job done. It would be great if this could get into the library proper. |
@djc Thoughts? |
Sorry, I've been very busy recently. I think it looks okay. Can we do |
@djc Done. |
couchdb/http.py
Outdated
self.lock = Lock() | ||
|
||
def get(self, url): | ||
@property | ||
def _current_process_id(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this property buys us? Just using os.getpid()
is actually shorter to write, and it's used in just two places.
It looks like this change breaks something on Python 3.4. Would you be able to investigate? |
As discussed in #313
When couchdb-python is used with multiprocessing, you get
TypeError: 'ResponseBody' object is not iterable
.This happens in
couchdb.http.Session:request
method:In this particular case, the resp object fails to match either condition and falls through to the
else
clause, which causes a rawResponseBody
object to be returned upstream to the client code, and when client code doesresponse['row']
, it fails b/cResponseBody
object does not support item indexing.Adding a
print
on theresp
object reveals thatresp.getheader('content-length')
isNone
, and hence the secondelif
is skipped.The reason for
content-length
to beNone
:httplib.HTTPConnection.begin
, line 470~475:so HTTPConnection thinks it's connecting to a HTTP/0.9 server, even though couchdb response was
HTTP/1.1
.Tracing further, in order for
self.version == 9
,version
returned byHTTPConnection._read_status
must be9
:Putting a print statement after the line is read, and rerun the bug script:
The process got a garbled status line, even though the response from couchdb is fine. With the garbled status line,
_read_status
method assumes it must beHTTP/0.9
so it returnsversion==9
.So basically what we have here is a race condition where all three processes are talking to the server over the same socket at the same time. This is because the couchdb-python's
ConnectionPool
is created in the parent process by theSession
object, which is in turn only created once perDatabase
object. So in essence, all three sub-processes are sharing the same session object and the same connection pool object. Because they all talk to the same host/port combination, they all checkout the same connection object from the pool and same underlying socket is being used across all three subprocesses, and hence the bug.The fix here is to make
ConnectionPool
process aware in that the connections are keyed by the current pid in addition toscheme
andnetloc
. This way, we make sure that sub-processes get their own separate connections.TBH, I'm not sure this is a good implementation. Having the
ConnectionPool
knowing about the process it's running on seems to be violating its responsibility. Feel free to suggest another better solution.