Skip to content

Commit c798c5c

Browse files
committed
BUG#35710145: Bad MySQLCursor.statement and result when query text contains code comments
Wrong query results provided by the connector, when the query contains code comments while using `cursor.execute(query, multi=True)`. The regex expression that splits a multi-statement query produces as many statements as semicolon-separated operations are found in the query. End-of-line (EOL) comments are treated as if they were normal statements by the connector but that leads to inconsistencies when reading results from the server and trying to map each result to a statement since it's assumed every statement will produce a result, but isn't always the case - e.g. for comments. With this patch, comments are simply ignored to guarantee results and executed statements are matched correctly. Change-Id: I728222d626bb562d23066e38924e8007faff5bc1
1 parent dd88b60 commit c798c5c

File tree

5 files changed

+366
-57
lines changed

5 files changed

+366
-57
lines changed

CHANGES.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ v8.3.0
1717
- WL#15942: Improve type hints and standardize byte type handling
1818
- WL#15523: Support Python DB API asynchronous execution
1919
- BUG#35912790: Binary strings are converted when using prepared statements
20+
- BUG#35710145: Bad MySQLCursor.statement and result when query text contains code comments
2021

2122
v8.2.0
2223
======

lib/mysql/connector/aio/cursor.py

Lines changed: 56 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@
6262
RE_SQL_ON_DUPLICATE,
6363
RE_SQL_PYTHON_CAPTURE_PARAM_NAME,
6464
RE_SQL_PYTHON_REPLACE_PARAM,
65-
RE_SQL_SPLIT_STMTS,
65+
is_eol_comment,
66+
parse_multi_statement_query,
6667
)
6768
from ..errors import (
6869
Error,
@@ -217,26 +218,66 @@ def _process_params(
217218
async def _execute_iter(
218219
self, query_iter: AsyncGenerator[ResultType, None]
219220
) -> AsyncGenerator[MySQLCursorAbstract, None]:
220-
"""Return generator of MySQLCursor objects for multiple statements.
221-
222-
This method is only used when multiple statements are executed by the execute()
223-
method. It uses zip() to make an iterator from the given query_iter
224-
(result of MySQLConnection.cmd_query_iter()) and the list of statements that
225-
were executed.
221+
"""Generator returns MySQLCursor objects for multiple statements
222+
223+
This method is only used when multiple statements are executed
224+
by the `cursor.execute(multi_stmt_query, multi=True)` method.
225+
226+
It matches the given `query_iter` (result of `MySQLConnection.cmd_query_iter()`)
227+
and the list of statements that were executed.
228+
229+
How does this method work? To properly map each statement (stmt) to a result,
230+
the following facts must be considered:
231+
232+
1. Read operations such as `SELECT` produce a non-empty result
233+
(calling `next(query_iter)` gets a result that includes at least one column).
234+
2. Write operatios such as `INSERT` produce an empty result
235+
(calling `next(query_iter)` gets a result with no columns - aka empty).
236+
3. End-of-line (EOL) comments do not produce a result, unless is the last stmt
237+
in which case produces an empty result.
238+
4. Calling procedures such as `CALL my_proc` produce a sequence `(1)*0` which
239+
means it may produce zero or more non-empty results followed by just one
240+
empty result. In other words, a callproc stmt always terminates with an
241+
empty result. E.g., `my_proc` includes an update + select + select + update,
242+
then the result sequence will be `110` - note how the write ops results get
243+
annulated, just the read ops results are produced. Other examples:
244+
* insert + insert -> 0
245+
* select + select + insert + select -> 1110
246+
* select -> 10
247+
Observe how 0 indicates the end of the result sequence. This property is
248+
vital to know what result corresponds to what callproc stmt.
249+
250+
In this regard, the implementation is composed of:
251+
1. Parsing: the multi-statement is broken down into single statements, and then
252+
for each of these, leading white spaces are removed (including
253+
jumping line, vertical line, tab, etc.). Also, EOL comments are removed from
254+
the stream, except when the comment is the last statement of the
255+
multi-statement string.
256+
2. Mapping: the facts described above as used as "game rules" to properly match
257+
statements and results. In case, if we run out of statements before running out
258+
of results we use a sentinel named "stmt_overflow!" to indicate that the mapping
259+
went wrong.
260+
261+
Acronyms
262+
1: a non-empty result
263+
2: an empty result
226264
"""
227-
executed_list = RE_SQL_SPLIT_STMTS.split(self._executed)
228-
229-
i = 0
265+
executed_list = parse_multi_statement_query(multi_stmt=self._executed)
266+
self._executed = None
267+
stmt = executed_list.popleft() if executed_list else b"stmt_overflow!"
230268
async for result in query_iter:
231269
await self._reset_result()
232270
await self._handle_result(result)
233-
try:
234-
self._executed = executed_list[i].strip()
235-
i += 1
236-
except IndexError:
237-
self._executed = executed_list[0]
271+
272+
if is_eol_comment(stmt):
273+
continue
274+
275+
self._executed = stmt.rstrip()
238276
yield self
239277

278+
if not stmt.upper().startswith(b"CALL") or "columns" not in result:
279+
stmt = executed_list.popleft() if executed_list else b"stmt_overflow!"
280+
240281
async def _fetch_row(self, raw: bool = False) -> Optional[RowType]:
241282
"""Return the next row in the result set."""
242283
if not self._connection.unread_result:

lib/mysql/connector/cursor.py

Lines changed: 124 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,16 @@
3232
from __future__ import annotations
3333

3434
import re
35+
import unicodedata
3536
import warnings
3637
import weakref
3738

38-
from collections import namedtuple
39+
from collections import deque, namedtuple
3940
from decimal import Decimal
4041
from typing import (
4142
TYPE_CHECKING,
4243
Any,
44+
Deque,
4345
Dict,
4446
Generator,
4547
Iterator,
@@ -115,6 +117,71 @@
115117
MAX_RESULTS = 4294967295
116118

117119

120+
def is_eol_comment(stmt: bytes) -> bool:
121+
"""Checks if statement is an end-of-line comment.
122+
123+
Double-dash comment style requires the second dash to be
124+
followed by at least one whitespace (Z) or control character (C) such
125+
as a space, tab, newline, and so on.
126+
127+
Hash comment simply requires start from `#` and nothing else.
128+
129+
Args:
130+
stmt: MySQL statement.
131+
132+
Returns:
133+
Whether or not the statement is an end-of-line comment.
134+
135+
References:
136+
[1]: https://dev.mysql.com/doc/refman/8.0/en/comments.html
137+
"""
138+
is_double_dash_comment = (
139+
len(stmt) >= 3
140+
and stmt.startswith(b"--")
141+
and unicodedata.category(chr(stmt[2]))[0] in {"Z", "C"}
142+
)
143+
is_hash_comment = len(stmt) >= 2 and stmt.startswith(b"#")
144+
145+
return is_double_dash_comment or is_hash_comment
146+
147+
148+
def parse_multi_statement_query(multi_stmt: bytes) -> Deque[bytes]:
149+
"""Parses a multi-statement query/operation.
150+
151+
Parsing consists of removing empty (which includes just whitespaces and/or control
152+
characters) statements and EOL (end-of-line) comments.
153+
154+
However, there's a caveat, by rule, the last EOL comment found in the stream isn't
155+
removed if and only if it's the last statement.
156+
157+
Why? EOL comments do not generate results, however, when the last statement is an
158+
EOL comment the server returns an empty result. So, in other to match statements
159+
and results correctly we need to keep the last EOL comment statement.
160+
161+
Args:
162+
multi_stmt: Query representing multi-statement operations separated by semicolons.
163+
164+
Returns:
165+
A list of statements that aren't empty and don't contain leading
166+
ASCII whitespaces. Also, they aren't EOL comments except
167+
perhaps for the last one.
168+
"""
169+
executed_list: Deque[bytes] = deque(RE_SQL_SPLIT_STMTS.split(multi_stmt))
170+
stmt, num_stms = b"", len(executed_list)
171+
while num_stms > 0:
172+
num_stms -= 1
173+
stmt_next = executed_list.popleft().lstrip()
174+
if stmt_next:
175+
stmt = stmt_next
176+
if not is_eol_comment(stmt):
177+
executed_list.append(stmt)
178+
179+
if is_eol_comment(stmt):
180+
executed_list.append(stmt)
181+
182+
return executed_list
183+
184+
118185
class _ParamSubstitutor:
119186
"""
120187
Substitutes parameters into SQL statement.
@@ -554,27 +621,62 @@ def _execute_iter(
554621
"""Generator returns MySQLCursor objects for multiple statements
555622
556623
This method is only used when multiple statements are executed
557-
by the execute() method. It uses zip() to make an iterator from the
558-
given query_iter (result of MySQLConnection.cmd_query_iter()) and
559-
the list of statements that were executed.
560-
"""
561-
executed_list = RE_SQL_SPLIT_STMTS.split(self._executed)
562-
563-
i = 0
564-
while True:
565-
try:
566-
result = next(query_iter)
567-
self._reset_result()
568-
self._handle_result(result)
569-
try:
570-
self._executed = executed_list[i].strip()
571-
i += 1
572-
except IndexError:
573-
self._executed = executed_list[0]
574-
575-
yield self
576-
except StopIteration:
577-
return
624+
by the `cursor.execute(multi_stmt_query, multi=True)` method.
625+
626+
It matches the given `query_iter` (result of `MySQLConnection.cmd_query_iter()`)
627+
and the list of statements that were executed.
628+
629+
How does this method work? To properly map each statement (stmt) to a result,
630+
the following facts must be considered:
631+
632+
1. Read operations such as `SELECT` produce a non-empty result
633+
(calling `next(query_iter)` gets a result that includes at least one column).
634+
2. Write operatios such as `INSERT` produce an empty result
635+
(calling `next(query_iter)` gets a result with no columns - aka empty).
636+
3. End-of-line (EOL) comments do not produce a result, unless is the last stmt
637+
in which case produces an empty result.
638+
4. Calling procedures such as `CALL my_proc` produce a sequence `(1)*0` which
639+
means it may produce zero or more non-empty results followed by just one
640+
empty result. In other words, a callproc stmt always terminates with an
641+
empty result. E.g., `my_proc` includes an update + select + select + update,
642+
then the result sequence will be `110` - note how the write ops results get
643+
annulated, just the read ops results are produced. Other examples:
644+
* insert + insert -> 0
645+
* select + select + insert + select -> 1110
646+
* select -> 10
647+
Observe how 0 indicates the end of the result sequence. This property is
648+
vital to know what result corresponds to what callproc stmt.
649+
650+
In this regard, the implementation is composed of:
651+
1. Parsing: the multi-statement is broken down into single statements, and then
652+
for each of these, leading white spaces are removed (including
653+
jumping line, vertical line, tab, etc.). Also, EOL comments are removed from
654+
the stream, except when the comment is the last statement of the
655+
multi-statement string.
656+
2. Mapping: the facts described above as used as "game rules" to properly match
657+
statements and results. In case, if we run out of statements before running out
658+
of results we use a sentinel named "stmt_overflow!" to indicate that the mapping
659+
went wrong.
660+
661+
Acronyms
662+
1: a non-empty result
663+
2: an empty result
664+
"""
665+
executed_list = parse_multi_statement_query(multi_stmt=self._executed)
666+
self._executed = None
667+
stmt = executed_list.popleft() if executed_list else b"stmt_overflow!"
668+
for result in query_iter:
669+
self._reset_result()
670+
self._handle_result(result)
671+
672+
if is_eol_comment(stmt):
673+
continue
674+
675+
self._executed = stmt.rstrip()
676+
yield self
677+
678+
if not stmt.upper().startswith(b"CALL") or "columns" not in result:
679+
stmt = executed_list.popleft() if executed_list else b"stmt_overflow!"
578680

579681
def execute(
580682
self,

lib/mysql/connector/cursor_cext.py

Lines changed: 58 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,8 @@
8383
RE_SQL_ON_DUPLICATE,
8484
RE_SQL_PYTHON_CAPTURE_PARAM_NAME,
8585
RE_SQL_PYTHON_REPLACE_PARAM,
86-
RE_SQL_SPLIT_STMTS,
86+
is_eol_comment,
87+
parse_multi_statement_query,
8788
)
8889
from .errorcode import CR_NO_RESULT_SET
8990
from .errors import (
@@ -254,22 +255,67 @@ def _handle_eof(self) -> None:
254255
self._cnx.free_result()
255256

256257
def _execute_iter(self) -> Generator[CMySQLCursor, None, None]:
257-
"""Generator returns MySQLCursor objects for multiple statements
258-
259-
Deprecated: use nextset() method directly.
258+
"""Generator returns MySQLCursor objects for multiple statements.
260259
261260
This method is only used when multiple statements are executed
262-
by the execute() method. It uses zip() to make an iterator from the
263-
given query_iter (result of MySQLConnection.cmd_query_iter()) and
264-
the list of statements that were executed.
261+
by the `cursor.execute(multi_stmt_query, multi=True)` method.
262+
263+
How does this method work? To properly map each statement (stmt) to a result,
264+
the following facts must be considered:
265+
266+
1. Read operations such as `SELECT` produce a non-empty result
267+
(calling `next(query_iter)` gets a result that includes at least one column).
268+
2. Write operatios such as `INSERT` produce an empty result
269+
(calling `next(query_iter)` gets a result with no columns - aka empty).
270+
3. End-of-line (EOL) comments do not produce a result, unless is the last stmt
271+
in which case produces an empty result.
272+
4. Calling procedures such as `CALL my_proc` produce a sequence `(1)*0` which
273+
means it may produce zero or more non-empty results followed by just one
274+
empty result. In other words, a callproc stmt always terminates with an
275+
empty result. E.g., `my_proc` includes an update + select + select + update,
276+
then the result sequence will be `110` - note how the write ops results get
277+
annulated, just the read ops results are produced. Other examples:
278+
* insert + insert -> 0
279+
* select + select + insert + select -> 1110
280+
* select -> 10
281+
Observe how 0 indicates the end of the result sequence. This property is
282+
vital to know what result corresponds to what callproc stmt.
283+
284+
In this regard, the implementation is composed of:
285+
1. Parsing: the multi-statement is broken down into single statements, and then
286+
for each of these, leading white spaces are removed (including
287+
jumping line, vertical line, tab, etc.). Also, EOL comments are removed from
288+
the stream, except when the comment is the last statement of the
289+
multi-statement string.
290+
2. Mapping: the facts described above as used as "game rules" to properly match
291+
statements and results. In case, if we run out of statements before running out
292+
of results we use a sentinel named "stmt_overflow!" to indicate that the mapping
293+
went wrong.
294+
295+
Acronyms
296+
1: a non-empty result
297+
2: an empty result
265298
"""
266-
executed_list: List[bytes] = RE_SQL_SPLIT_STMTS.split(self._executed)
267-
i = 0
268-
self._executed = executed_list[i]
269-
yield self
270-
299+
executed_list = parse_multi_statement_query(multi_stmt=self._executed)
300+
self._executed = None
301+
stmt = b""
302+
result: bool = False
271303
while True:
272304
try:
305+
if not stmt.upper().startswith(b"CALL") or not result:
306+
stmt = (
307+
executed_list.popleft() if executed_list else b"stmt_overflow!"
308+
)
309+
310+
# at this point the result has been fetched already
311+
result = self._cnx.result_set_available
312+
313+
if is_eol_comment(stmt):
314+
continue
315+
316+
self._executed = stmt.rstrip()
317+
yield self
318+
273319
if not self.nextset():
274320
raise StopIteration
275321
except InterfaceError as err:
@@ -278,13 +324,6 @@ def _execute_iter(self) -> Generator[CMySQLCursor, None, None]:
278324
raise
279325
except StopIteration:
280326
return
281-
i += 1
282-
try:
283-
self._executed = executed_list[i].strip()
284-
except IndexError:
285-
self._executed = executed_list[0]
286-
yield self
287-
return
288327

289328
def execute(
290329
self,

0 commit comments

Comments
 (0)