Skip to content

Make auth_socket more flexible. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dveeden
Copy link

@dveeden dveeden commented Oct 27, 2014

Now supports "AS ''" so it doesn't have to be a 1-on-1 mapping.

Example:
CREATE USER 'foo'@'localhost IDENTIFIED WITH auth_socket AS 'myuser';

Now supports "AS '<user>'" so it doesn't have to be a 1-on-1 mapping.

Example:
CREATE USER 'foo'@'localhost IDENTIFIED WITH auth_socket AS 'myuser';
@roel666
Copy link

roel666 commented Oct 27, 2014

Thanks

@dveeden
Copy link
Author

dveeden commented Oct 27, 2014

The MySQL Bug ID for this:
http://bugs.mysql.com/74586

@MyDanny
Copy link
Member

MyDanny commented Oct 29, 2014

Thank you. Unfortunately, we can't accept pull requests on github. I see you've already submitted your patch via the bug tracker, so I'm closing this pull request.

@MyDanny MyDanny closed this Oct 29, 2014
MyDanny pushed a commit that referenced this pull request Mar 10, 2015
…rows.

Background 1:

When binlog_format = row, CREATE ... SELECT is logged in two pieces,
like:

Anonymous_Gtid
query_log_event(CREATE TABLE without SELECT)
Anonymous_Gtid
query_log_event(BEGIN)
...row events...
query_log_event(COMMIT) (or Xid_log_event)

Internally, there is a call to MYSQL_BIN_LOG::commit after the table has
been created and before the rows are selected.

When gtid_next='ANONYMOUS', we must not release anonymous ownership for
the commit occurring in the middle of the statement (since that would
allow a concurrent client to set gtid_mode=on, making it impossible to
commit the rest of the statement). Also, the commit in the middle of the
statement should not decrease the counter of ongoing GTID-violating
transactions, since that would allow a concurrent client to set
ENFORCE_GTID_CONSISTENCY=ON even if there is an ongoing transaction that
violates GTID-consistency.

The logic to skip releasing anonymous ownership and skip decreasing the
counter is as follows. Before calling mysql_bin_log.commit, it sets the
flag thd->is_commit_in_middle_of_statement. Eventually,
mysql_bin_log.commit calls gtid_state->update_on_commit, which calls
gtid_state->update_gtids_impl, which reads the
thd->is_commit_in_middle_of_statement and accordingly decides to skip
releasing anonymous ownership and/or skips decreasing the counter.

Problem 1:

When thd->is_commit_in_middle_of_statement has been set, it is crucial
that there is another call to update_gtids_impl when the transaction
ends (otherwise the session will keep holding anonymous ownership and
will not decrease the counters). Normally, this happens because
mysql_bin_log.commit is always called, and mysql_bin_log.commit normally
invokes ordered_commit, which calls update_gtids_impl. However, in case
the SELECT part of the statement does not find any rows,
mysql_bin_log.commit skips the call to ordered_commit, so
update_gtids_impl does not get called. This is the first problem we fix
in this commit.

Fix 1:

We fix this problem as follows. After calling mysql_bin_log.commit to
log the CREATE part of CREATE...SELECT, the CREATE...SELECT code sets
thd->pending_gtid_state_update=true (this is a new flag that we
introduce in this patch). If the flag is set, update_gtids_impl clears
it. At the end of mysql_bin_log.commit, we check the flag to see if
update_gtids_impl has been called by any function invoked by
mysql_bin_log.commit. If not, i.e., if the flag is still true at the end
of mysql_bin_log.commit, it means we have reached the corner case where
update_gtids_impl was skipped. Thus we call it explicitly from
mysql_bin_log.commit.

Background 2:

GTID-violating DDL (CREATE...SELECT and CREATE TEMPORARY) is detected in
is_ddl_gtid_compatible, called from gtid_pre_statement_checks, which is
called from mysql_parse just before the implicit pre-commit.
is_ddl_gtid_compatible determines whether an error or warning or nothing
is to be generated, and whether to increment the counters of GTID-
violating transactions. In case an error is generated, it is important
that the error happens before the implicit commit, so that the statement
fails before it commits the ongoing transaction.

Problem 2:

In case a warning is to be generated, and there is an ongoing
transaction, the implicit commit will write to the binlog, and thus it
will call gtid_state->update_gtids_impl, which will decrease the
counters of GTID-violating transactions. Thus, the counters will be zero
for the duration of the transaction.

Fix 2:

We call is_ddl_gtid_compatible *both* before the implicit commit and
after the implicit commit. If an error is to be generated, the error is
generated before the commit. If a warning is to be generated and/or the
counter of GTID-violating transactions is to be increased, then this
happens after the commit.

Code changes #1:

@sql/binlog.cc
- Move MYSQL_BIN_LOG::commit to a new function
MYSQL_BIN_LOG::write_binlog_and_commit_engine. Make
MYSQL_BIN_LOG::commit call this function, and after the return check
thd->pending_gtid_state_update to see if another call to
gtid_state->update_on_[commit|rollback] is needed.
- Simplify MYSQL_BIN_LOG::write_bin_log_and_commit_engine; remove
useless local variable 'error' that would never change its value.

@sql/binlog.h:
- Declaration of new function.

@sql/rpl_gtid_state.cc:
- Set thd->pending_gtid_state_update to false at the end of
update_gtids_impl.

Code changes #2:
@sql/binlog.cc:
- Add two parameters to handle_gtid_consistency and is_ddl_compatible:
handle_error is true in the call to is_ddl_gtid_compatible that happens
*before* the implicit commit and fals in the call to
is_ddl_gtid_compatible that happens *after* the implicit commit. It
tells the function to generate the error, if an error is to be
generated. The other parameter, handle_nonerror, is true in the call to
is_ddl_gtid_compatible that happens *after* the implicit commit and
false in the call that happens *before* the implicit commit. It tells
the function to generate the warnings and increment the counter, if that
needs to be done.

@sql/rpl_gtid_execution.cc:
- Call is_ddl_gtid_compatible after the implicit commit. Pass the two
new parameters to the function.

@sql/sql_class.h:
- Update prototype for is_ddl_gtid_compatible.

@sql/sql_insert.cc:
- Set thd->pending_gtid_state_update = true after committing the CREATE
part of a CREATE...SELECT.

Misc changes:

@sql/binlog.cc
- Add DEBUG_SYNC symbol end_decide_logging_format used in tests.

@sql/rpl_gtid_state.cc:
- For modularity, move out parts of update_gtids_impl to a new function,
end_gtid_violating_transaction.
- Move the lock/unlock of global_sid_lock into update_gtids_impl.
- Make update_gtids_impl release global_sid_lock before the call to
end_gtid_violating_transaction, so as to hold it as short as possible.

@sql/rpl_gtid.h
- Because we release the locks earlier in update_gtids_impl in
rpl_gtid_state.cc, we need to acquire the lock again in
end_[anonymous|automatic]_gtid_violating_transaction, in order to do
some debug assertions.
- Add DBUG_PRINT for the counters.

Test changes:
- Split binlog_enforce_gtid_consistency into six tests, depending on the
type of scenarios it tests:
Three classes of GTID-violation:
    *_create_select_*: test CREATE ... SELECT.
    *_tmp_*: test CREATE TEMPORARY/DROP TEMPORARY inside a transaction.
    *_trx_nontrx_*: test combinations of transactional and
    non-transactional updates in the same statement or in the same
    transaction.
For each class of GTID-violation, one positive and one negative test:
    *_consistent.test: Cases which are *not* GTID-violating
    *_violation.test: Cases which *are* GTID-violating.
- The general logic of these test is:
- extra/binlog_tests/enforce_gtid_consistency.test iterates over all
    values of GTID_MODE, ENFORCE_GTID_CONSISTENCY, and GTID_NEXT. For
    each case, it sources file; the name of the sourced file is
    specified by the file that sources
    extra/binlog_tests/enforce_gtid_consistency.test
- The top-level file in suite/binlog/t invokes
    extra/binlog_tests/enforce_gtid_consistency.test, specifying one
    of the filenames
    extra/binlog_tests/enforce_gtid_consistency_[create_select|tmp|
    trx_nontrx]_[consistent|violation].test.
- Each of the files
    extra/binlog_tests/enforce_gtid_consistency_[create_select|tmp|
    trx_nontrx]_[consistent|violation].test
    sets up a number of test scenarios. Each test scenario is executed
    by sourcing
    extra/binlog_tests/enforce_gtid_consistency_statement.inc.
- extra/binlog_tests/enforce_gtid_consistency_statement.inc executes
    the specified statement, checks that warnings are generated and
    counters incremented/decremented as specified by the caller.
- Since the tests set GTID_MODE explicitly, it doesn't make sense to run
the test in both combinations GTID_MODE=ON/OFF. However, for the
*_trx_nontrx_* cases, it is important to test that it works both with
binlog_direct_non_transactional_updates=on and off. The suite is never
run with those combinations. To leverage from the combinations
GTID_MODE=ON/OFF, we run the test with
binlog_direct_non_transactional_updates=on if GTID_MODE=ON, and we run
the test with binlog_direct_non_transactional_updates=off if
GTID_MODE=OFF.
bkandasa pushed a commit that referenced this pull request Aug 4, 2015
Added the extra scope argument to the status variables.
bkandasa pushed a commit that referenced this pull request Aug 4, 2015
Two places in replication code were causing Valgrind errors:

 1. Memory leak in Relay_log_info::wait_for_gtid_set, since it passed
    the return value from Gtid_set::to_string() directly to
    DBUG_PRINT, without storing it anywhere so that it can be freed.

 2. In MYSQL_BIN_LOG::init_gtid_sets would pass a bad pointer to
    DBUG_PRINT in some cases.

In problem #1, an underlying problem was that to_string returns newly
allocated memory and this was easy to miss when reading the code that
calls the function.  It would be better to return the value through a
parameter, since that forces the caller to store it in a variable, and
then it is more obvious that the value must be freed.  And in fact
such a function existed already, so we fix the problem by removing the
(redundant) no-args version of Gtid_set::to_string and using the one-
or two-argument function instead.

In problem #2, print an empty string if we detect that the pointer
will be bad.

These bugs were found when adding some debug printouts to
read_gtids_from_binlog.  These debug printouts never made it to the
server code through any other bug report, but would be useful to have
for future debugging, so including them in this patch.

Additionally, removed the call to global_sid_lock->rdlock() used
before Previous_gtids_log_event::get_str().  This is not needed since
Previous_gtids_log_event doesn't use shared resources.
bjornmu pushed a commit that referenced this pull request Oct 21, 2015
…TO SELF

Problem: If a multi-column update statement fails when updating one of
the columns in a row, it will go on and update the remaining columns
in that row before it stops and reports an error. If the failure
happens when updating a JSON column, and the JSON column is also
referenced later in the update statement, new and more serious errors
can happen when the update statement attempts to read the JSON column,
as it may contain garbage at this point.

The fix is twofold:

1) Field_json::val_str() currently returns NULL if an error happens.
This is correct for val_str() functions in the Item class hierarchy,
but not for val_str() functions in the Field class hierarchy. The
val_str() functions in the Field classes instead return a pointer to
an empty String object on error. Since callers don't expect it to
return NULL, this caused a crash when a caller unconditionally
dereferenced the returned pointer. The patch makes
Field_json::val_str() return a pointer to an empty String on error to
avoid such crashes.

2) Whereas #1 fixes the immediate crash, Field_json::val_str() may
still read garbage when this situation occurs. This could lead to
unreliable behaviour, and both valgrind and ASAN warn about it. The
patch therefore also makes Field_json::store() start by clearing the
field, so that it will hold an empty value rather than garbage after
an error has happened.

Fix #2 is sufficient to fix the reported problems. Fix #1 is included
for consistency, so that Field_json::val_str() behaves the same way as
the other Field::val_str() functions.

The query in the bug report didn't always crash. Since the root cause
was that it had read garbage, it could be lucky and read something
that looked like a valid value. In that case, Field_json::val_str()
didn't return NULL, and the crash was avoided.

The patch also makes these changes:

- It removes the Field_json::store_dom() function, since it is only
  called from one place. It is now inlined instead.

- It corrects information about return values in the comment that
  describes the ensure_utf8mb4() function.
bjornmu pushed a commit that referenced this pull request Oct 21, 2015
Background:

WAIT_FOR_EXECUTED_GTID_SET waits until a specified set of GTIDs is
included in GTID_EXECUTED. SET GTID_PURGED adds GTIDs to
GTID_EXECUTED. RESET MASTER clears GTID_EXECUTED.

There were multiple issues:

 1. Problem:

    The change in GTID_EXECUTED implied by SET GTID_PURGED did
    not cause WAIT_FOR_EXECUTED_GTID_SET to stop waiting.

    Analysis:

    WAIT_FOR_EXECUTED_GTID_SET waits for a signal to be sent.
    But SET GTID_PURGED never sent the signal.

    Fix:

    Make GTID_PURGED send the signal.

    Changes:
    - sql/rpl_gtid_state.cc:Gtid_state::add_lost_gtids
    - sql/rpl_gtid_state.cc: removal of #ifdef HAVE_GTID_NEXT_LIST
    - sql/rpl_gtid.h: removal of #ifdef HAVE_GTID_NEXT_LIST

 2. Problem:

    There was a race condition where WAIT_FOR_EXECUTED_GTID_SET
    could miss the signal from a commit and go into an infinite
    wait even if GTID_EXECUTED contains all the waited-for GTIDs.

    Analysis:

    In the bug, WAIT_FOR_EXECUTED_GTID_SET took a lock while
    taking a copy of the global state. Then it released the lock,
    analyzed the copy of the global state, and decided whether it
    should wait.  But if the GTID to wait for was committed after
    the lock was released, WAIT_FOR_EXECUTED_GTID_SET would miss
    the signal and go to an infinite wait even if GTID_EXECUTED
    contains all the waited-for GTIDs.

    Fix:

    Refactor the code so that it holds the lock all the way from
    before it reads the global state until it goes to the wait.

    Changes:

    - sql/rpl_gtid_state.cc:Gtid_state::wait_for_gtid_set:
      Most of the changes in this function are to fix this bug.

    Note:

    When the bug existed, it was possible to create a test case
    for this by placing a debug sync point in the section where
    it does not hold the lock.  However, after the bug has been
    fixed this section does not exist, so there is no way to test
    it deterministically.  The bug would also cause the test to
    fail rarely, so a way to test this is to run the test case
    1000 times.

 3. Problem:

    The function would take global_sid_lock.wrlock every time it has
    to wait, and while holding it takes a copy of the entire
    gtid_executed (which implies allocating memory).  This is not very
    optimal: it may process the entire set each time it waits, and it
    may wait once for each member of the set, so in the worst case it
    is O(N^2) where N is the size of the set.

    Fix:

    This is fixed by the same refactoring that fixes problem #2.  In
    particular, it does not re-process the entire Gtid_set for each
    committed transaction. It only removes all intervals of
    gtid_executed for the current sidno from the remainder of the
    wait-for-set.

    Changes:
    - sql/rpl_gtid_set.cc: Add function remove_intervals_for_sidno.
    - sql/rpl_gtid_state.cc: Use remove_intervals_for_sidno and remove
      only intervals for the current sidno. Remove intervals
      incrementally in the innermost while loop, rather than recompute
      the entire set each iteration.

 4. Problem:

    If the client that executes WAIT_FOR_EXECUTED_GTID_SET owns a
    GTID that is included in the set, then there is no chance for
    another thread to commit it, so it will wait forever.  In
    effect, it deadlocks with itself.

    Fix:

    Detect the situation and generate an error.

    Changes:
    - sql/share/errmsg-utf8.txt: new error code
      ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID
    - sql/item_func.cc: check the condition and generate the new error

 5. Various simplfications.

    - sql/item_func.cc:Item_wait_for_executed_gtid_set::val_int:
      - Pointless to set null_value when generating an error.
      - add DBUG_ENTER
      - Improve the prototype for Gtid_state::wait_for_gtid_set so
        that it takes a Gtid_set instead of a string, and also so that
        it requires global_sid_lock.
    - sql/rpl_gtid.h:Mutex_cond_array
      - combine wait functions into one and make it return bool
      - improve some comments
    - sql/rpl_gtid_set.cc:Gtid_set::remove_gno_intervals:
      - Optimize so that it returns early if this set becomes empty

@mysql-test/extra/rpl_tests/rpl_wait_for_executed_gtid_set.inc
- Move all wait_for_executed_gtid_set tests into
  mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test

@mysql-test/include/kill_wait_for_executed_gtid_set.inc
@mysql-test/include/wait_for_wait_for_executed_gtid_set.inc
- New auxiliary scripts.

@mysql-test/include/rpl_init.inc
- Document undocumented side effect.

@mysql-test/suite/rpl/r/rpl_wait_for_executed_gtid_set.result
- Update result file.

@mysql-test/suite/rpl/t/rpl_wait_for_executed_gtid_set.test
- Rewrote the test to improve coverage and cover all parts of this bug.

@sql/item_func.cc
- Add DBUG_ENTER
- No point in setting null_value when generating an error.
- Do the decoding from text to Gtid_set here rather than in Gtid_state.
- Check for the new error
  ER_CANT_WAIT_FOR_EXECUTED_GTID_SET_WHILE_OWNING_A_GTID

@sql/rpl_gtid.h
- Simplify the Mutex_cond_array::wait functions in the following ways:
  - Make them one function since they share most code. This also allows
    calling the three-argument function with NULL as the last
    parameter, which simplifies the caller.
  - Make it return bool rather than 0/ETIME/ETIMEOUT, to make it more
    easy to use.
- Make is_thd_killed private.
- Add prototype for new Gtid_set::remove_intervals_for_sidno.
- Add prototype for Gtid_state::wait_for_sidno.
- Un-ifdef-out lock_sidnos/unlock_sidnos/broadcast_sidnos since we now
  need them.
- Make wait_for_gtid_set return bool.

@sql/rpl_gtid_mutex_cond_array.cc
- Remove the now unused check_thd_killed.

@sql/rpl_gtid_set.cc
- Optimize Gtid_set::remove_gno_intervals, so that it returns early
  if the Interval list becomes empty.
- Add Gtid_set::remove_intervals_for_sidno. This is just a wrapper
  around the already existing private member function
  Gtid_set::remove_gno_intervals.

@sql/rpl_gtid_state.cc
- Rewrite wait_for_gtid_set to fix problems 2 and 3. See code
  comment for details.
- Factor out wait_for_sidno from wait_for_gtid.
- Enable broadcast_sidnos/lock_sidnos/unlock_sidnos, which were ifdef'ed out.
- Call broadcast_sidnos after updating the state, to fix issue #1.

@sql/share/errmsg-utf8.txt
- Add error message used to fix issue #4.
bjornmu pushed a commit that referenced this pull request Oct 21, 2015
An assert failure is seen in some queries which have a semijoin and
use the materialization strategy.

The assertion fails if either the length of the key is zero or the
number of key parts is zero. This could indicate two different
problems.

1) If the length is zero, there may not be a problem, as it can
legitimately be zero if, for example, the key is a zero-length string.

2) If the number of key parts is zero, there is a bug, as a key must
have at least one part.

The patch fixes issue #1 by removing the length check in the
assertion.

Issue #2 happens if JOIN::update_equalities_for_sjm() doesn't
recognize the expression selected from a subquery, and fails to
replace it with a reference to a column in a temporary table that
holds the materialized result. This causes it to not recognize it as a
part of the key later, and keyparts could end up as zero. The patch
fixes it by calling real_item() on the expression in order to see
through Item_refs that may wrap the expression if the subquery reads
from a view.
bjornmu pushed a commit that referenced this pull request Feb 5, 2016
Problem:

The binary log group commit sync is failing when committing a group of
transactions into a non-transactional storage engine while other thread
is rotating the binary log.

Analysis:

The binary log group commit procedure (ordered_commit) acquires LOCK_log
during the #1 stage (flush). As it holds the LOCK_log, a binary log
rotation will have to wait until this flush stage to finish before
actually rotating the binary log.

For the #2 stage (sync), the binary log group commit only holds the
LOCK_log if sync_binlog=1. In this case, the rotation has to wait also
for the sync stage to finish.

When sync_binlog>1, the sync stage releases the LOCK_log (to let other
groups to enter the flush stage), holding only the LOCK_sync. In this
case, the rotation can acquire the LOCK_log in parallel with the sync
stage.

For commits into transactional storage engine, the binary log rotation
checks a counter of "flushed but not yet committed" transactions,
waiting until this counter to be zeroed before closing the current
binary log file.  As the commit of the transactions happen in the #3
stage of the binary log group commit, the sync of the binary log in
stage #2 always succeed.

For commits into non-transactional storage engine, the binary log
rotation is checking the "flushed but not yet committed" transactions
counter, but it is zero because it only counts transactions that
contains XIDs. So, the rotation is allowed to take place in parallel
with the #2 stage of the binary log group commit. When the sync is
called at the same time that the rotation has closed the old binary log
file but didn't open the new file yet, the sync is failing with the
following error: 'Can't sync file 'UNOPENED' to disk (Errcode: 9 - Bad
file descriptor)'.

Fix:

For non-transactional only workload, binary log group commit will keep
the LOCK_log when entering #2 stage (sync) if the current group is
supposed to be synced to the binary log file.
hramilison pushed a commit that referenced this pull request Jun 2, 2016
Description: 
For ngshell query
Crud.Find({ name:$.name, count:count(*) }).GroupBy($.name);
an error has been raised:
"Expression #1 of SELECT list is not in GROUP BY clause and contains
nonaggregated column 'test.coll.doc' which is not functionally dependent
on columns in GROUP BY clause; this is incompatible with
sql_mode=only_full_group_by"

Reviewed-by: Lukasz Kotula <[email protected]>
RB:12283
bjornmu pushed a commit that referenced this pull request Apr 10, 2017
Author: Andrei Elkin <[email protected]>
Date:   Fri Nov 25 15:17:17 2016 +0200

    WL#9175 Correct recovery of DDL statements/transactions by binary log

    The patch consists of two parts implementing the WL agenda which is
    is to provide crash-safety for DDL.
    That is a server (a general one, master or slave) must be able to recover
    from crash to commit or rollback every DDL command that was in progress
    on the eve of crash.

    The Commit decision is done to commands that had reached
    Engine-prepared status and got successfully logged into binary log.
    Otherwise they are rolled back.

    In order to achieve the goal some refinements are done to the binlogging
    mechanism, minor addition is done to the server recovery module and some changes
    applied to the slave side.

    The binary log part includes Query-log-event which is made to contain xid
    that is a key item at server recovery. The recovery now is concern with it along
    with its standard location in Xid_log_event.

    The first part deals with the ACL DDL sub-class and
    TRIGGER related queries are fully 2pc-capable. It constructs
    the WL's framework which is proved on these subclasses.
    It also specifies how to cover the rest of DDLs by the WL's framework.
    For those not 2pc-ready DDL cases, sometimes "stub" tests are prepared
    to be refined by responsible worklogs.

    Take a few notes to the low-level details of implementation.

    Note #1.

    Tagging by xid number is done to the exact 2pc-capable DDL subclass.
    For DDL:s that will be ready for xiding in future, there is a tech specification
    how to do so.

    Note #2.

    By virtue of existing mechanisms, the slave applier augments the DDL
    transaction incorporating the slave info table update and the
    Gtid-executed table (either one optionally) at time of the DDL is
    ready for the final commit.
    When for filtering reason the DDL skips committing at its regular
    time, the augmented transaction would still be not empty consisting of
    only the added statements, and it would have to be committed by
    top-level slave specific functions through Log_event::do_update_pos().

    To aid this process Query_log_event::has_committed is introduced.

    Note #3 (QA, please read this.)

    Replication System_table interface that is employed by handler of TABLE type slave info
    had to be refined in few places.

    Note #4 (runtime code).

    While trying to lessen the footprint to the runtime server code few
    concessions had to be conceded. These include changes to
    ha_commit_trans()
    to invoke new pre_commit() and post_commit(), and post_rollback() hooks
    due to the slave extra statement.

    -------------------------------------------------------------------

    The 2nd part patch extends the basic framework,
    xidifies the rest of DDL commands that are
    (at least) committable at recovery. At the moment those include
    all Data Definition Statements except ones related to
    VIEWs, STORED Functions and Procedures.

    DDL Query is recoverable for these subclasses when it has been
    recorded into the binary log and was discovered there at the server
    restart, quite compatible with the DML algorithm.
    However a clean automatic rollback can't be provided for some
    of the commands and the user would have to complete recovery
    manually.
pobrzut pushed a commit that referenced this pull request May 8, 2017
… WITH DEVELOPER STUDIO

When we do a release type build of the server (with both optimized and
debug enabled server/plugins) with Developer Studio, some MTR tests
when run with --debug-server will fail in one of two ways:

1. Tests which try to load a plugin into the mysql client fail with
   missing symbols. This is caused by the plugin having references to
   functions which do not exist in the non-debug client.

2. Some tests on sparc fail with Thread stack overrun.

Fix for issue #1: mtr will have appended /debug to the plugin dir part
when running with --debug-server and if there actually is such a
directory. The fix is to remove any trailing /debug from the
env. variable within the test. This will affect the client only, not
the server. Developer builds will not have put the plugins in a
subdirectory /debug so it makes no different to those.

Fix for issue #2: apparently this thread stack overrun is not feasible
to avoid, so just skip the test if running with debug server on sparc;
there is already an include file to do that.

Also added not_sparc_debug.inc to the "white list" so the tests are
skipped even when running mtr --no-skip.

(cherry picked from commit 9c79e477261ab252e38def436bca3336ef597603)
pobrzut pushed a commit that referenced this pull request May 8, 2017
akopytov pushed a commit to akopytov/mysql-server that referenced this pull request Aug 25, 2017
In WL-included builds ASAN run witnessed missed ~Query_log_event invocation.
The destruct-or was not called due to the WL's changes in the error propagation
that specifically affect LC MTS.
The failure is exposed in particular by rpl_trigger as the following
stack:

  #0 0x9ecd98 in __interceptor_malloc (/export/home/pb2/test/sb_2-22611026-1489061390.32/mysql-commercial-8.0.1-dmr-linux-x86_64-asan/bin/mysqld+0x9ecd98)
  mysql#1 0x2b1a245 in my_raw_malloc(unsigned long, int) obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:209:12
  mysql#2 0x2b1a245 in my_malloc obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:72
  mysql#3 0x2940590 in Query_log_event::Query_log_event(char const*, unsigned int, binary_log::Format_description_event const*, binary_log::Log_event_type) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:4343:46
  mysql#4 0x293d235 in Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*, bool) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:1686:17
  mysql#5 0x293b96f in Log_event::read_log_event()
  mysql#6 0x2a2a1c9 in next_event(Relay_log_info*)

Previously before the WL
Mts_submode_logical_clock::wait_for_workers_to_finish() had not
returned any error even when Coordinator thread is killed.

The WL patch needed to refine such behavior, but at doing so
it also had to attend log_event.cc::schedule_next_event() to register
an error to follow an existing pattern.
While my_error() does not take place the killed Coordinator continued
scheduling, ineffectively though - no Worker gets engaged (legal case
of deferred scheduling), and without noticing its killed status up to
a point when it resets the event pointer in
apply_event_and_update_pos():

  *ptr_ev= NULL; // announcing the event is passed to w-worker

The reset was intended for an assigned Worker to perform the event
destruction or by Coordinator itself when the event is deferred.
As neither is the current case the event gets unattended for its termination.

In contrast in the pre-WL sources the killed Coordinator does find a Worker.
However such Worker could be already down (errored out and exited), in
which case apply_event_and_update_pos() reasonably returns an error and executes

  delete ev

in exec_relay_log_event() error branch.

**Fixed** with deploying my_error() call in log_event.cc::schedule_next_event()
error branch which fits to the existing pattern.
THD::is_error() has been always checked by Coordinator before any attempt to
reset *ptr_ev= NULL. In the errored case Coordinator does not reset and
destroys the event itself in the exec_relay_log_event() error branch pretty similarly to
how the pre-WL sources do.

Tested against rpl_trigger and rpl suites to pass.

Approved on rb#15667.
akopytov pushed a commit to akopytov/mysql-server that referenced this pull request Aug 25, 2017
Some character sets are designated as MY_CS_STRNXFRM, meaning that sorting
needs to go through my_strnxfrm() (implemented by the charset), and some are
not, meaning that a client can do the strnxfrm itself based on
cs->sort_order. However, most of the logic related to the latter has been
removed already (e.g. filesort always uses my_strnxfrm() since 2003), and now
it's mostly in the way. The three main uses left are:

 1. A microoptimization for constructing sort keys in filesort.
 2. A home-grown implementation of Boyer-Moore for accelerating certain
    LIKE patterns that should probably be handled through FTS.
 3. Some optimizations to MyISAM prefix keys.

Given that our default collation (utf8mb4_0900_ai_ci) now is a strnxfrm-based
collation, the benefits of keeping these around for a narrow range of
single-byte locales (like latin1_swedish_ci, cp850 and a bunch of more
obscure locales) seems dubious. We seemingly can't remove the flag entirely
due to mysql#3 seemingly affecting the on-disk MyISAM structure, but we can remove
the code for mysql#1 and mysql#2.

Change-Id: If974e490d451b7278355e33ab1fca993f446b792
bjornmu pushed a commit that referenced this pull request Sep 21, 2017
Patch #1:

When find_child_doms() evaluates an ellipsis path leg, it adds each
matching child twice. First once for every immediate child in the
top-level call to find_child_doms(), and then again in the recursive
call to find_child_doms() on each of the children. This doesn't cause
any wrong results, since duplicate elimination prevents them from
being added to the result, but it is unnecessary work. This patch
makes find_child_dom() stop calling add_if_missing() on the children
before it recurses into them, so that they are only added once.

Additionally:

Make find_child_dom() a free, static function instead of a member of
Json_dom. It is a helper function for Json_dom::seek(), and doesn't
use any non-public members of Json_dom, so it doesn't have to be part
of Json_dom's interface.

Remove unnecessary checks for only_need_one in the ellipsis
processing. Json_dom::seek() sets the flag to true only in the last
path leg, and the JSON path parser rejects paths that end with an
ellipsis, so this cannot happen. Added an assert instead.

Microbenchmarks (64-bit, Intel Core i7-4770 3.4 GHz, GCC 6.3):

BM_JsonDomSearchEllipsis              79880 ns/iter [+32.0%]
BM_JsonDomSearchEllipsis_OnlyOne      75872 ns/iter [+35.3%]
BM_JsonDomSearchKey                     129 ns/iter [ -1.6%]
BM_JsonBinarySearchEllipsis          320920 ns/iter [+11.8%]
BM_JsonBinarySearchEllipsis_OnlyOne  315458 ns/iter [+12.6%]
BM_JsonBinarySearchKey                   86 ns/iter [ -2.3%]

Change-Id: I865af789b90b820a6e180ad822f2fb68f411516b
bjornmu pushed a commit that referenced this pull request Jan 23, 2018
…_ROW()' FAILED

Analysis:

When a window with buffering follows a equijoin on a unique index
(JT_EQ_REF) , we can get into trouble because windowing modifies the
input record, presuming that once the windowing has been handed the
record, next time control passes back to the join code a (new) record
will be read to the input record.

However, this does not hold with JT_EQ_REF, cf. the caching done in
join_read_key:

From its Doxygen:

  "Since the eq_ref access method will always return the same row, it
   is not necessary to read the row more than once, regardless of how
   many times it is needed in execution.  This cache element is used
   when a row is needed after it has been read once, unless a key
   conversion error has occurred, or the cache has been disabled."

Fix:

We solve this problem by reinstating the input record before handing
control back from end_write_wf. We optimize: only do this if the
window in question follows after such a JOIN, i.e. window #1, and it
has actually clobbered the input record. This can only happen if
the last qep_tab has type JT_EQ_REF.

Another, perhaps better approach, is to refactor to never touch the
input record but keep the copying between the out record and the frame
table record instead. Left for future refactoring.

Added some missing Window method "const"s, and folded a couple of
one-liners into window.h (from .cc).

Repro added.

Change-Id: I33bc43cd99ff79303b17d181abc3805ce226fb85
bjornmu pushed a commit that referenced this pull request Jan 23, 2018
…TABLE_UPGRADE_GUARD

To repeat: cmake -DWITH_ASAN=1 -DWITH_ASAN_SCOPE=1
./mtr --mem --sanitize main.dd_upgrade_error

A few dd tests fail with:
==26861==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7000063bf5e8 at pc 0x00010d4dbe8b bp 0x7000063bda40 sp 0x7000063bda38
READ of size 8 at 0x7000063bf5e8 thread T2
    #0 0x10d4dbe8a in Prealloced_array<st_plugin_int**, 16ul>::empty() const prealloced_array.h:186
    #1 0x10d406a8b in lex_end(LEX*) sql_lex.cc:560
    #2 0x10dae4b6d in dd::upgrade::Table_upgrade_guard::~Table_upgrade_guard() (mysqld:x86_64+0x100f87b6d)
    #3 0x10dadc557 in dd::upgrade::migrate_table_to_dd(THD*, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, bool) (mysqld:x86_64+0x100f7f557)
    #4 0x10dad7e85 in dd::upgrade::migrate_plugin_table_to_dd(THD*) (mysqld:x86_64+0x100f7ae85)
    #5 0x10daec6a1 in dd::upgrade::do_pre_checks_and_initialize_dd(THD*) upgrade.cc:1216
    #6 0x10cd0a5c0 in bootstrap::handle_bootstrap(void*) bootstrap.cc:336

Change-Id: I265ec6dd97ee8076aaf03763840c0cdf9e20325b
Fix: increase lifetime of 'LEX lex;' which is used by 'table_guard'
dveeden pushed a commit to dveeden/mysql-server that referenced this pull request Feb 4, 2018
Fix misc UBSAN warnings in unit tests.
To repeat:
export UBSAN_OPTIONS="print_stacktrace=1"

./runtime_output_directory/merge_large_tests-t --gtest_filter='-*DeathTest*' > /dev/null

unittest/gunit/gis_algos-t.cc:78:70:
runtime error: downcast of address 0x000012dc0be8 which does not point to an object of type 'Gis_polygon_ring'

include/sql_string.h:683:35: runtime error: null pointer passed as argument 2, which is declared to never be null
    mysql#1 0x373e7af in histograms::Value_map<String>::add_values(String const&, unsigned long long) sql/histograms/value_map.cc:149
    mysql#2 0x294fcf2 in dd_column_statistics_unittest::add_values(histograms::Value_map<String>&) unittest/gunit/dd_column_statistics-t.cc:62

runtime_output_directory/merge_keyring_file_tests-t --gtest_filter='-*DeathTest*' > /dev/null

plugin/keyring/common/keyring_key.cc:82:57: runtime error: null pointer passed as argument 2, which is declared to never be null

Change-Id: I2651362e3373244b72e6893f0e22e67402b49a52
(cherry picked from commit 1fe3f72561994da1d912a257689e1b18106f8828)
hramilison pushed a commit that referenced this pull request Apr 19, 2018
              AUTH_COMMON.H

Description:- Sever crashes due to a NULL pointer
de-reference.

Analysis:- Sever encounters a NUll pointer de-reference
during "acl_load()".

Fix:- A check is introduced to avoid the NULL pointer
de-reference.
This issue is already prevented in 8.0 through Bug#27225806
fix. Therefore, this patch is applicable only for 5.7.
sreedhars pushed a commit that referenced this pull request Jul 27, 2018
Group Replication does implement conflict detection on
multi-primary to avoid write errors on parallel operations.
The conflict detection is also engaged in single-primary mode on the
particular case of primary change and the new primary still has a
backlog to apply. Until the backlog is flushed, conflict detection
is enabled to prevent write errors between the backlog and incoming
transactions.

The conflict detection data, which we name certification info, is
also used to detected dependencies between accepted transactions,
dependencies which will rule the transactions schedule on the
parallel applier.

In order to avoid that the certification info grows forever,
periodically all members exchange their GTID_EXECUTED set, which
full intersection will provide the set of transactions that are
applied on all members. Future transactions cannot conflict with
this set since all members are operating on top of it, so we can
safely remove all write-sets from the certification info that do
belong to those transactions.
More details at WL#6833: Group Replication: Read-set free
Certification Module (DBSM Snapshot Isolation).

Though a corner case was found on which the garbage collection was
purging more data than it should.
The scenario is:
 1) Group with 2 members;
 2) Member1 executes:
      CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
      INSERT INTO t1 VALUE(1, 1);
    Both members have a GTID_EXECUTED= UUID:1-4
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       #1                           UUID1:1-4
 3) member1 executes TA
      UPDATE t1 SET b=10 WHERE a=1;
    and blocks immediately before send the transaction to the group.
    This transaction has snapshot_version: UUID:1-4
 4) member2 executes TB
      UPDATE t1 SET b=10 WHERE a=1;
    This transaction has snapshot_version: UUID:1-4
    It goes through the complete patch and it is committed.
    This transaction has GTID: UUID:1000002
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       #1                           UUID1:1-4:1000002
 5) member2 becomes extremely slow in processing transactions, we
    simulate that by holding the transaction queue to the GR
    pipeline.
    Transaction delivery is still working, but the transaction will
    be block before certification.
 6) member1 is able to send its TA transaction, lets recall that
    this transaction has snapshot_version: UUID:1-4.
    On conflict detection on member1, it will conflict with #1,
    since this snapshot_version does not contain the snapshot_version
    of #1, that is TA was executed on a previous version than TB.
    On member2 the transaction will be delivered and will be put on
    hold before conflict detection.
 7) meanwhile the certification info garbage collection kicks in.
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Its intersection is UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       #1                           UUID1:1-4:1000002
    The condition to purge write-sets is:
       snapshot_version.is_subset(intersection)
    We have
       "UUID:1-4:1000002".is_subset("UUID:1-4:1000002)
    which is true, so we remove #1.
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       <empty>
 8) member2 gets back to normal, we release transaction TA, lets
    recall that this transaction has snapshot_version: UUID:1-4.
    On conflict detection, since the certification info is empty,
    the transaction will be allowed to proceed, which is incorrect,
    it must rollback (like on member1) since it conflicts with TB.

The problem it is on certification garbage collection, more
precisely on the condition used to purge data, we cannot leave the
certification info empty otherwise this situation can happen.
The condition must be changed to
       snapshot_version.is_subset_not_equals(intersection)
which will always leave a placeholder to detect delayed conflicting
transaction.

So a trace of the solution is (starting on step 7):
 7) meanwhile the certification info garbage collection kicks in.
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Its intersection is UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       #1                           UUID1:1-4:1000002
    The condition to purge write-sets is:
       snapshot_version.is_subset_not_equals(intersection)
    We have
       "UUID:1-4:1000002".is_subset_not_equals("UUID:1-4:1000002)
    which is false, so we do not remove #1.
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       #1                           UUID1:1-4:1000002
 8) member2 gets back to normal, we release transaction TA, lets
    recall that this transaction has snapshot_version: UUID:1-4.
    On conflict detection on member2, it will conflict with #1,
    since this snapshot_version does not contain the snapshot_version
    of #1, that is TA was executed on a previous version than TB.

This is the same scenario that we see on this bug, though here the
pipeline is being blocked by the distributed recovery procedure,
that is, while the joining member is applying the missing data
through the recovery channel, the incoming data is being queued.
Meanwhile the certification info garbage collection kicks in and
purges more data that it should, the result it is that conflicts are
not being detected.
bjornmu pushed a commit that referenced this pull request Jul 27, 2018
Use void* for function arguments, and cast Item_field* in function body.

compare_fields_by_table_order(Item_field*, Item_field*, void*) through
pointer to incorrect function type 'int (*)(void *, void *, void *)'
sql/sql_optimizer.cc:3904: note: compare_fields_by_table_order(Item_field*,
Item_field*, void*) defined here
    #0 0x34ac57d in base_list::sort(int (*)(void*, void*, void*), void*)
sql/sql_list.h:278:13
    #1 0x49ad303 in substitute_for_best_equal_field(Item*, COND_EQUAL*,

Change-Id: I4f5417304a24201682f32fc7631034de7aa62589
bjornmu pushed a commit that referenced this pull request Jul 27, 2018
…ENERATED_READ_FIELDS

It's a SELECT with WHERE "(-1) minus 0x4d".
this operation has a result type of "unsigned" (because 0x4d is unsigned
integer) and the result (-78) doesn't fit int an unsigned type.
This WHERE is evaluated by InnoDB in index condition pushdown:
#0 my_error
#1 Item_func::raise_numeric_overflow
...
#7 Item_cond_and::val_int
#8 innobase_index_cond
...
#12 handler::index_read_map
...
#15 handler::multi_range_read_next
...
#20 rr_quick
#21 join_init_read_record
As val_int() has no "error" return code, the execution continues until
frame #12; there we call update_generated_read_fields(), which has
an assertion about thd->is_error() which fails.

Fix: it would be nice to detect error as soon as it happens, i.e. in innodb
code right after it calls val_bool(). But innodb's index condition
pushdown functions only have found / not found return codes so they cannot
signal "error" to the upper layers. Same is true for MyISAM. Moreover,
"thd" isn't easily accessible there.
Adding a detection a bit above in the stack (handler::* functions which
do index reads) is also possible but would require fixing ~20
functions.
The chosen fix here is to change update_generated_*_fields()
to return error if thd->is_error() is true.
Note that the removed assertion was already one cause of
bug 27041382.
bjornmu pushed a commit that referenced this pull request Oct 22, 2018
Post push fix

sql/opt_range.cc:14196:22: runtime error: -256 is outside the range of representable values of type 'unsigned int'
    #0 0x4248a9d in cost_skip_scan(TABLE*, unsigned int, unsigned int, unsigned long long, Cost_estimate*, unsigned long long*, Item*, Opt_trace_object*) sql/opt_range.cc:14196:22
    #1 0x41c524c in get_best_skip_scan(PARAM*, SEL_TREE*, bool) sql/opt_range.cc:14086:5
    #2 0x41b7b65 in test_quick_select(THD*, Bitmap<64u>, unsigned long long, unsigned long long, bool, enum_order, QEP_shared_owner const*, Item*, Bitmap<64u>*, QUICK_SELECT_I**) sql/opt_range.cc:3352:23
    #3 0x458fc08 in get_quick_record_count(THD*, JOIN_TAB*, unsigned long long) sql/sql_optimizer.cc:5542:17
    #4 0x458a0cd in JOIN::estimate_rowcount() sql/sql_optimizer.cc:5290:25

The fix is to handle REC_PER_KEY_UNKNOWN explicitly, to avoid using
-1.0 in computations later.

Change-Id: Ie8a81bdf7323e4f66abcad0a9aca776de8acd945
sjmudd pushed a commit to sjmudd/mysql-server that referenced this pull request Jan 24, 2019
dveeden pushed a commit to dveeden/mysql-server that referenced this pull request May 5, 2023
Introduce class NdbSocket, which includes both an ndb_socket_t
and an SSL *, and wraps all socket operations that might use
TLS.

Change-Id: I20d7aeb4854cdb11cfd0b256270ab3648b067efa
dveeden pushed a commit to dveeden/mysql-server that referenced this pull request May 5, 2023
The patch for
    WL#15130 Socket-level TLS patch mysql#1: class NdbSocket
re-introduced a -Wcast-qual warning.

Use const_cast, and reinterpret_cast to fix it, since the corresponding
posix version of ndb_socket_writev() is const-correct.

Change-Id: Ib446a926b4108edf51eda7d8fd27ada560b67a24
venkatesh-prasad-v pushed a commit to venkatesh-prasad-v/mysql-server that referenced this pull request Aug 3, 2023
…etwork

https://bugs.mysql.com/bug.php?id=109668

Description
-----------
GR suffered from problems caused by the security probes and network scanner
processes connecting to the group replication communication port. This usually
is not a problem, but poses a serious threat when another member tries to join
the cluster by initialting a connection to the member which is affected by
external processes using the port dedicated for group communication for longer
durations.

On such activites by external processes, the SSL enabled server stalled forever
on the SSL_accept() call waiting for handshake data. Below is the stacktrace:

    Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)):
    #0 in read ()
    mysql#1 in sock_read ()
    mysql#2 in BIO_read ()
    mysql#3 in ssl23_read_bytes ()
    mysql#4 in ssl23_get_client_hello ()
    mysql#5 in ssl23_accept ()
    mysql#6 in xcom_tcp_server_startup(Xcom_network_provider*) ()

When the server stalled in the above path forever, it prohibited other members
to join the cluster resulting in the following messages on the joiner server's
logs.

    [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
    [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'

Solution
--------
This patch adds two new variables

1. group_replication_xcom_ssl_socket_timeout

   It is a file-descriptor level timeout in seconds for both accept() and
   SSL_accept() calls when group replication is listening on the xcom port.
   When set to a valid value, say for example 5 seconds, both accept() and
   SSL_accept() return after 5 seconds. The default value has been set to 0
   (waits infinitely) for backward compatibility. This variable is effective
   only when GR is configred with SSL.

2. group_replication_xcom_ssl_accept_retries

   It defines the number of retries to be performed before closing the socket.
   For each retry the server thread calls SSL_accept()  with timeout defined by
   the group_replication_xcom_ssl_socket_timeout for the SSL handshake process
   once the connection has been accepted by the first accept() call. The
   default value has been set to 10. This variable is effective only when GR is
   configred with SSL.

Note:
- Both of the above variables are dynamically configurable, but will become
  effective only on START GROUP_REPLICATION.
- This patch is only for the Linux systems.
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
Part of WL#15135 Certificate Architecture

This patch introduces a set of C++ classes to implement the creation of
private keys, PKCS#10 signing requests, and X.509 certificates for NDB
cluster.

The TlsSearchPath class provides searching for files over a delimited
list of directories.

The PrivateKey and Certificate classes provide simple wrappers over the
OpenSSL routines to create, free, save, and open keys and certificates.

Classes PendingPrivateKey and PendingCertificate implement file naming
conventions and life cycle for pending key pairs; ActivePrivateKey
and ActiveCertificate implement them for active key pairs. Class
SigningRequest provides the naming conventions and life cycle for
PKCS#10 CSRs.

Class NodeCertificate is the primary in-memory representation of a
node's TLS credentials.

A unit test, NodeCertificate-t, is intended to thoroughly test the
whole suite of classes. It should be possible to run this test under
valgrind with no reported leaks.

Change-Id: I76bf719375ab2a9b6a97245e326158a49dde28c2
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
This is a complete implementation of ndb_sign_keys.

It searches --ndb-tls-search-path for node certificate and key
files, and additionally searchs in --CA-search-path for CA-related
key and certificate files.

It includes three methods for remote key signing:
  With --remote-CA-host, run ndb_sign_keys remotely, using ssh.
  With --remote-openssl, run openssl on the remote host, using ssh.
  With --CA-tool, run a local signing helper tool.

Change-Id: I5d93b702a667fa98d820ed150631a91e8444b8d7
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
Post-push fix for : WL#15166 patch #1 ndb_sign_keys

DWORD is 'unsigned long' not int
Remove an unused local variable.
C-style cast (LPSTR) drops const qualifier [-Wcast-qual]

Change-Id: I059ad8a5a5f6b1cc644456576a8acff9a78331e3
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
Add boolean parameter "RequireCertificate" to [DB] section.
Default is false. If true, node will fail at startup time unless
it finds a TLS key and a current valid certificate.

Add boolean parameter "RequireTls" to [DB] section. Default is
false. If true, every transporter link involving the data node
must use TLS.

Add boolean parameter "RequireTls" to [TCP] sections. This is
computed, and not user-setable. If either endpoint of a link
has RequireTls set to true, RequireTls for the link will be
set true.

Add some clarifying comments to ndbinfo_plans test.

Change-Id: I889d9b7563022e2ebb2eaae92c3b26b557180d40
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
Add an MGM protocol command to turn a plaintext mgm api session
into a TLS session. Add three new MGM API functions:
  ndb_mgm_set_ssl_ctx()
  ndb_mgm_start_tls()
  ndb_mgm_connect_tls()

Define two client TLS requirement levels:
 CLIENT_TLS_RELAXED, CLIENT_TLS_STRICT

This adds a new test: testMgmd -n StartTls

Change-Id: Ib46faacd9198c474558e46c3fa0538c7e759f3fb
bjornmu pushed a commit that referenced this pull request Oct 25, 2023
Post push fix. Remove added C++ dependencies in C header mgmapi.h.

 - forward declare SSL_CTX.
 - add missing struct keyword with ndb_mgm_cert_table and
   ndb_mgm_tls_stats
 - make ndb_mgm_set_ssl_ctx return int instead of bool as other mgmapi
   functions do.

Change-Id: I493b4c4fb1272974e1bb72e35abb08c8cef1a534
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
Post push fix.

Do not allow ndb_mgm_listen_event to return a socket that uses TLS since
user can not access the corresponding SSL object thorugh the public
MgmAPI.

Change-Id: I2a741efe4f80db750419101ecabb03fb5e025346
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
Post push fix.

Make NdbSocket::ssl_readln return 0 on timeout.

Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
Post push fix.

Add missing socket close in testMgmd -n StartTls.

Change-Id: Ia446b522ad2698f63d588d3c52122df8735765c7
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
Problem
================================

Group Replication ASAN run failing without any symptom of a
leak, but with shutdown issues:

worker[6] Shutdown report from
/dev/shm/mtr-3771884/var-gr-debug/6/log/mysqld.1.err after tests:
 group_replication.gr_flush_logs
group_replication.gr_delayed_initialization_thread_handler_error
group_replication.gr_sbr_verifications
group_replication.gr_server_uuid_matches_group_name_bootstrap
group_replication.gr_stop_async_on_stop_gr
group_replication.gr_certifier_message_same_member
group_replication.gr_ssl_mode_verify_identity_error_xcom

Analysis and Fix
================================

It ended up being a leak on gr_ssl_mode_verify_identity_error_xcom test:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f1709fbe1c7 in operator new(unsigned long)
      ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7f16ea0df799 in xcom_tcp_server_startup(Xcom_network_provider*)
      (/export/home/tmp/BUG35594709/mysql-trunk/BIN-ASAN/plugin_output_directory
        /group_replication.so+0x65d799)
    #2 0x7f170751e2b2  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

This happens because we delegated incoming connections
cleanup to the external consumer in incoming_connection_task.
Since it calls incoming_connection() from
Network_provider_manager, in case of a concurrent stop,
a connection could be left orphan in the shared atomic
due to the lack of an Active Provider, thus creating a
memory leak.

The solution is to make this cleanup on
Network_provider_manager, on both stop_provider() and in
stop_all_providers() methods, thus ensuring that no
incoming connection leaks.

Change-Id: I2367c37608ad075dee63785e9f908af5e81374ca
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
Post push fix.

Make NdbSocket::ssl_readln return 0 on timeout.

Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6
bjornmu pushed a commit that referenced this pull request Jan 16, 2024
BUG#35949017 Schema dist setup lockup
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#2]
Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1]
Bug#32550019 Missing check for ndb_schema_result leads to schema dist timeout

Change-Id: I4a32197992bf8b6899892f21587580788f828f34
msprajap pushed a commit that referenced this pull request Jan 16, 2024
… cache [#1]

Problem:
A MySQL Server which has been disconnected from schema distribution
fails to setup event operations since the columns of the table can't be
found in the event.

Analysis:
The ndbcluster plugin uses NDB table definitions which are cached by the
NdbApi. These cached objects are reference counted and there can be
multiple versions of the same table in the cache, the intention is that
it should be possible to continue using the table even though it
changes in NDB. When changing a table in NDB this cache need to be
invalidated, both on the local MySQL Server and on all other MySQL
Servers connected to the same cluster. Such invalidation is especially
important before installing in DD and setting up event subscriptions.

The local MySQL Server cache is invalidated directly when releasing the
reference from the NdbApi after having modified the table.

The other MySQL Servers are primarily invalidated by using schema
distribution. Since schema distribution is event driven the invalidation
will happen promptly but as with all things in a distributed system
there is a possibility that these events are not handled for some
reason. This means there must be a fallback mechanism which
invalidates stale cache objects.

The reported problem occurs since there is a stale NDB table definition
in the NdbApi, it has the same name but different columns than the
current table in NDB. In most cases the NdbApi continues
to operate on a cached NDB table definition but when setting up events
the "mismatch on version" will be detected inside the NdbApi(due to the
relation between the event and the table), this causes the cache to be
invalidated and current version to be loaded from NDB. However the
caller is still using the "old" cached table definition and thus when
trying to subscribe the columns they can not be found.

Solution:

1) Invalidate NDB table definition in schema event handler that handles
new table created. This covers the case where table is dropped directly
in NDB using for example ndb_drop_table or ndb_restore and then
subsequently created using SQL. This scenario is covered by the existing
metadata_sync test cases who will be detected by 4) before this part of
the fix.

2) Invalidate NDB table definition before table schema synchronization
install tables in DD and setup event subscripotion. This function
handles the case when schema distribution is reconnecting to the cluster
and a table it knew about earlier has changed while schema distribution
event handlers have not been active. This scenario is tested by the
drop_util_table test case.

3) Invalidate NDB table definition when schema distribution event
handler which is used for drop table and cluster failure occurs. At this
time it's well known that table does not exists or it's status is
unknown. Earlier this invalidation was only performed if there was a
version mismatch in the the event vs. table relation.

4) Detect when problem occurs by checking that NDB table definition has
not been invalidated (by NdbApi event functions) in the function that
setup the event subscription. It's currently not possible to handle the
problem this low down, but at least it can be detected and fix added to
the callers. This detection is only done in debug compile.

Change-Id: I4ed6efb9308be0022e99c51eb23ecf583805b1f4
venkatesh-prasad-v pushed a commit to venkatesh-prasad-v/mysql-server that referenced this pull request Jan 26, 2024
…and a local

             DDL executed

https://bugs.mysql.com/bug.php?id=113727

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    mysql#1  __GI___lll_lock_wait
    mysql#2  ___pthread_mutex_lock
    mysql#3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    mysql#4  Commit_stage_manager::enroll_for
    mysql#5  MYSQL_BIN_LOG::change_stage
    mysql#6  MYSQL_BIN_LOG::ordered_commit
    mysql#7  MYSQL_BIN_LOG::commit
    mysql#8  ha_commit_trans
    mysql#9  trans_commit_implicit
    mysql#10 mysql_create_like_table
    mysql#11 Sql_cmd_create_table::execute
    mysql#12 mysql_execute_command
    mysql#13 dispatch_sql_command

    Applier thread
    --------------
    mysql#1  ___pthread_mutex_lock
    mysql#2  native_mutex_lock
    mysql#3  safe_mutex_lock
    mysql#4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    mysql#5  Gtid_state::update_commit_group
    mysql#6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    mysql#7  Commit_order_manager::finish
    mysql#8  Commit_order_manager::wait_and_finish
    mysql#9  ha_commit_low
    mysql#10 trx_coordinator::commit_in_engines
    mysql#11 MYSQL_BIN_LOG::commit
    mysql#12 ha_commit_trans
    mysql#13 trans_commit
    mysql#14 Xid_log_event::do_commit
    mysql#15 Xid_apply_log_event::do_apply_event_worker
    mysql#16 Slave_worker::slave_worker_exec_event
    mysql#17 slave_worker_exec_job_group
    mysql#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
bjornmu pushed a commit that referenced this pull request Apr 30, 2024
When built with ASAN, a use-after-free is reported for the TcpPortPool.

AddressSanitizer: heap-use-after-free on address 0x60200019f190 at pc
0x00000076a18d bp 0x7fff51e7d1d0 sp 0x7fff51e7d1c0

    #4 0x770b73 in UniqueId::ProcessUniqueIds::erase(unsigned int)
       ../router/tests/helpers/tcp_port_pool.h:112
    #5 0x770c48 in UniqueId::~UniqueId()
       ../router/tests/helpers/tcp_port_pool.cc:234
    ...
    #12 0x82faa3 in testing::UnitTest::~UnitTest()
	../extra/googletest/googletest-release-1.12.0/googletest/src/gtest.cc:5496
    #13 0x7f5fe085ace8 in __run_exit_handlers (/lib64/libc.so.6+0x39ce8)

0x60200019f190 is located 0 bytes inside of 16-byte region
[0x60200019f190,0x60200019f1a0)
freed by thread T0 here:
    #0 0x7f5fe3cbd10f in operator delete(void*, unsigned long)
       (/lib64/libasan.so.6+0xb710f)
    #1 0x7f5fe085ace8 in __run_exit_handlers (/lib64/libc.so.6+0x39ce8)

Background
==========

__run_exit_handlers destroys "static" and "global" variables in reverse
order of their creation.

googletest's unit-tests are a static, and the TcpPortPool also has
ProcessUniqueId's which contains the process-wide unique-ids.

At construct: unittest -> tcp-port-pool -> proces-unique-ids
At destruct : process-unique-ids -> tcp-port-pool -> 💥

The use-after-free happens as the process-unique-ids static is
destructed before the tcp-port-pool which tries to its Ids from the
process-unique-ids.

Change
======

- extend the lifetime of the process-unique-ids to after the last use of
  the tcp-port-pool via a std::shared_ptr<>

Change-Id: I75b8b781e1d240f18ca72f2c86182639a7699f06
bjornmu pushed a commit that referenced this pull request Apr 30, 2024
…nt on Windows and posix [#1]

When passing arguments to NdbProcess::create it will become important
when introducing quoting to distinguish spaces that are port of the
argument value or beeing an argument separator.

This patch removes current uses of space as separator in arguments to
NdbProcess::create.

Change-Id: I1d1bab27e183fc33632bfd9974010129a8970365
bjornmu pushed a commit that referenced this pull request Apr 30, 2024
Problem:
Starting ´ndb_mgmd --bind-address´ may potentially cause abnormal
program termination in MgmtSrvr destructor when ndb_mgmd restart itself.

  Core was generated by `ndb_mgmd --defa'.
  Program terminated with signal SIGABRT,   Aborted.
  #0  0x00007f8ce4066b8f in raise () from /lib64/libc.so.6
  #1  0x00007f8ce4039ea5 in abort () from /lib64/libc.so.6
  #2  0x00007f8ce40a7d97 in __libc_message () from /lib64/libc.so.6
  #3  0x00007f8ce40af08c in malloc_printerr () from /lib64/libc.so.6
  #4  0x00007f8ce40b132d in _int_free () from /lib64/libc.so.6
  #5  0x00000000006e9ffe in MgmtSrvr::~MgmtSrvr (this=0x28de4b0) at
mysql/8.0/storage/ndb/src/mgmsrv/MgmtSrvr.cpp:
890
  #6  0x00000000006ea09e in MgmtSrvr::~MgmtSrvr (this=0x2) at mysql/8.0/
storage/ndb/src/mgmsrv/MgmtSrvr.cpp:849
  #7  0x0000000000700d94 in mgmd_run () at
mysql/8.0/storage/ndb/src/mgmsrv/main.cpp:260
  #8  0x0000000000700775 in mgmd_main (argc=<optimized out>,
argv=0x28041d0) at mysql/8.0/storage/ndb/src/
mgmsrv/main.cpp:479

Analysis:
While starting up, the ndb_mgmd will allocate memory for bind_address in
order to potentially rewrite the parameter. When ndb_mgmd restart itself
the memory will be released and dangling pointer causing double free.

Fix:
Drop support for bind_address=[::], it is not documented anywhere, is
not useful and doesn't work.
This means the need to rewrite bind_address is gone and bind_address
argument need neither alloc or free.

Change-Id: I7797109b9d8391394587188d64d4b1f398887e94
bjornmu pushed a commit that referenced this pull request Apr 30, 2024
This worklog introduces dynamic offload of Queries to RAPID in following
ways:

When system variable rapid_use_dynamic_offload is 0/false , then we
fall back to normal cost threshold classifier, which also implies that
when use secondary engine is set to forced, eligible queries will go to
secondary engine, regardless of cost threshold or this classifier.

When rapid_use_dynamic_offload is 1/true, then we proceed with looking
for optimal execution engine for this queries, if secondary engine is
found more optimal, then query is offloaded, otherwise it is sent back
to mysql. This is handled in following scenarios:

1. Static Scenario: When there's no Change propagation or Queue on RAPID
side, this introduces decision tree which has > 85 % precision in
training which queries should be faster on mysql or which queries
should be faster on mysql, and accepts or rejects queries.  the decision
tree takes around 20-100 microseconds for fast queries, hence
minimal overhead, for bigger queries this introduces overhead of
upto maximum observed 700 microseconds, these end up with long execution
time, hence not a problem. For very fast queries, defined here by having
cost < 10 and of the form point select, dynamic offload is not applied,
since 100 % of these queries  (out of 16667 samples) are faster on
MySQL. Additionally, routing these "very fast queries" through dynamic
offload leads to performance regressions due to 3 phase optimisation.

2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog
 introduces dynamic feature normalization to factor into account
 extra catch up time RAPID needs, and factoring in that, attempts to
 verify if RAPID is still the best engine for execution. If queue is
 too long or CP is too long, this mechanism wants to progressively start
 shifting queries to mysql, moving gradually towards the heavier queries

The steps in this worklog with respect to query lifecycle in server with
secondary_engine = ON, are described below:

query
   |
Primary Tentatively optimisation -> mysql optimises for Innodb
   |
secondary_engine_pre_prepare_hook -> following Rapid function called:
   |  RapidCachePrimaryInfoAtPrimaryTentativelyStep
   |  If dynamic offload is enabled and query is not "very fast":
   |   This caches features from mysql plan in rapid_statement_context
   |   to be used for dynamic offload.
   |  If dynamic offload is disabled or the query is "very fast":
   |   This function invokes standary mysql cost threshold classifier,
   |   which decides if query needs further RAPID optimisation.
   |
   |
   |-> if returns False, then query proceeds to Innodb for execution
   |-> if returns true, step below is called
   |
 Secondary optimisation -> mysql optimises for RAPID
   |
prepare_secondary_engine -> following Rapid function is called:
   |   RapidPrepareEstimateQueryCosts
   |     In this function, Dynamic offload combines mysql plan features
   |      retrieved from rapid_statement_context
   |     and RAPID info such as rapid base table cardinality,
   |     dict encoding projection, varlen projection size, rapid queue
   |     size in to decide if query should be offloaded to RAPID.
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, step below is called
   |
optimize_secondary_engine -> following Rapid function is called
   |    RapidOptimize
   |     In this function, Dynamic offload retrieves info from
   |     rapid_statement_context and additionally looks at Change
   |     propagation lag to decide if query should be offloaded to rapid
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, then query goes to Rapid Execution.

Following new MYSQL ERR log messages are printed with this WL, when
dynamic offload is enabled, and query is not a "very fast query".

1. SelOffload allow decision 1 : as secondary not forced 1 and enable
 var value 1 and transactional enabled 1 and( big shape detected 0
  or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 ,
   no lh table: 1

   Message such as this shows if dynamic offload is used to classify
   this query or not. If not, why not, using each of the conditions.
   1 = pass, 0 = not pass.

2. myqid=65 Selective offload classifier #1#1#1
    f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <=
    68.3899040222168 : 0.028218, f_count_all_base_tables = 0 ,
    f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 :
    0.173916 are_all_ts_index_ref = true outcome=0

   Line such as this serialises what leg of decision tree decided
   outcome of this query 0 -> back to mysql 1 -> keep on rapid.
   each leg is uniquely searchable via identifier such as #1#1#1 here.

This worklog additionally introduces python scripts to run queries on
mysql client with multiple queries and multiple dmls at once, in
various modes such as simulator mode and standard benchmark modes.

By Default this WL is enabled, but before release it will be disabled.
This is tracked via BUG#36343189 #no-close.

Perf mode unittests will be enabled on jenkins after this wl.
Further cleanup will be done via BUG#36368437 #no-close.

Bugs tackled via this WL: 	BUG#35738194, Enh#34132523, Bug#36343208

Unrelated bugs fixed: BUG#35987975

Old gerrit review : 25567 (abandoned due to 1000 update limit reached)

Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
bjornmu pushed a commit that referenced this pull request Jul 1, 2024
… for connection xxx'.

The new iterator based explains are not impacted.

The issue here is a race condition. More than one thread is using the
query term iterator at the same time (whoch is neithe threas safe nor
reantrant), and part of its state is in the query terms being visited
which leads to interference/race conditions.

a) the explain thread

uses an iterator here:

   Sql_cmd_explain_other_thread::execute

is inspecting the Query_expression of the running query
calling master_query_expression()->find_blocks_query_term which uses
an iterator over the query terms in the query expression:

   for (auto qt : query_terms<>()) {
       if (qt->query_block() == qb) {
           return qt;
       }
   }

the above search fails to find qb due to the interference of the
thread b), see below, and then tries to access a nullpointer:

    * thread #36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11
  frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50
  frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21
  frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26
  frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32
  frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9
  frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14
  frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20
  frame #8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11
  frame #9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13
  frame #10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7
  frame #11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17
  frame #12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13
 * frame #13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11
  frame #14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29
  frame #15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19
  frame #16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7
  frame #17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18
  frame #18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13
  frame #19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3
  frame #20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99
  frame #21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15

b) the query thread being explained is itself performing LEX::cleanup
and as part of the iterates over the query terms, but still allows
EXPLAIN of the query plan since

   thd->query_plan.set_query_plan(SQLCOM_END, ...)

hasn't been called yet.

     20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613)
     21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861)
     22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286)
     30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799)
     31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648)
     32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303)
     33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135)
     34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464)
     57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304)
     58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053)
     65:frame: _pthread_start (in libsystem_pthread.dylib) + 99
     66:frame: thread_start (in libsystem_pthread.dylib) + 15

Solution:

This patch solves the issue by removing iterator state from
Query_term, making the query_term iterators thread safe. This solution
labels every child query_term with its index in its parent's
m_children vector.  The iterator can therefore easily compute the next
child to visit based on Query_term::m_sibling_idx.

A unit test case is added to check reentrancy.

One can also manually verify that we have no remaining race condition
by running two client connections files (with \. <file>) with a big
number of copies of the repro query in one connection and a big number
of EXPLAIN format=json FOR <connection>, e.g.

    EXPLAIN FORMAT=json FOR CONNECTION 8\G

in the other. The actual connection number would need to verified
in connection one, of course.

Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
msprajap pushed a commit that referenced this pull request Jul 1, 2024
This worklog introduces dynamic offload of Queries to RAPID in following
ways:

When system variable rapid_use_dynamic_offload is 0/false , then we
fall back to normal cost threshold classifier, which also implies that
when use secondary engine is set to forced, eligible queries will go to
secondary engine, regardless of cost threshold or this classifier.

When rapid_use_dynamic_offload is 1/true, then we proceed with looking
for optimal execution engine for this queries, if secondary engine is
found more optimal, then query is offloaded, otherwise it is sent back
to mysql. This is handled in following scenarios:

1. Static Scenario: When there's no Change propagation or Queue on RAPID
side, this introduces decision tree which has > 85 % precision in
training which queries should be faster on mysql or which queries
should be faster on mysql, and accepts or rejects queries.  the decision
tree takes around 20-100 microseconds for fast queries, hence
minimal overhead, for bigger queries this introduces overhead of
upto maximum observed 700 microseconds, these end up with long execution
time, hence not a problem. For very fast queries, defined here by having
cost < 10 and of the form point select, dynamic offload is not applied,
since 100 % of these queries  (out of 16667 samples) are faster on
MySQL. Additionally, routing these "very fast queries" through dynamic
offload leads to performance regressions due to 3 phase optimisation.

2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog
 introduces dynamic feature normalization to factor into account
 extra catch up time RAPID needs, and factoring in that, attempts to
 verify if RAPID is still the best engine for execution. If queue is
 too long or CP is too long, this mechanism wants to progressively start
 shifting queries to mysql, moving gradually towards the heavier queries

The steps in this worklog with respect to query lifecycle in server with
secondary_engine = ON, are described below:

query
   |
Primary Tentatively optimisation -> mysql optimises for Innodb
   |
secondary_engine_pre_prepare_hook -> following Rapid function called:
   |  RapidCachePrimaryInfoAtPrimaryTentativelyStep
   |  If dynamic offload is enabled and query is not "very fast":
   |   This caches features from mysql plan in rapid_statement_context
   |   to be used for dynamic offload.
   |  If dynamic offload is disabled or the query is "very fast":
   |   This function invokes standary mysql cost threshold classifier,
   |   which decides if query needs further RAPID optimisation.
   |
   |
   |-> if returns False, then query proceeds to Innodb for execution
   |-> if returns true, step below is called
   |
 Secondary optimisation -> mysql optimises for RAPID
   |
prepare_secondary_engine -> following Rapid function is called:
   |   RapidPrepareEstimateQueryCosts
   |     In this function, Dynamic offload combines mysql plan features
   |      retrieved from rapid_statement_context
   |     and RAPID info such as rapid base table cardinality,
   |     dict encoding projection, varlen projection size, rapid queue
   |     size in to decide if query should be offloaded to RAPID.
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, step below is called
   |
optimize_secondary_engine -> following Rapid function is called
   |    RapidOptimize
   |     In this function, Dynamic offload retrieves info from
   |     rapid_statement_context and additionally looks at Change
   |     propagation lag to decide if query should be offloaded to rapid
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, then query goes to Rapid Execution.

Following new MYSQL ERR log messages are printed with this WL, when
dynamic offload is enabled, and query is not a "very fast query".

1. SelOffload allow decision 1 : as secondary not forced 1 and enable
 var value 1 and transactional enabled 1 and( big shape detected 0
  or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 ,
   no lh table: 1

   Message such as this shows if dynamic offload is used to classify
   this query or not. If not, why not, using each of the conditions.
   1 = pass, 0 = not pass.

2. myqid=65 Selective offload classifier #1#1#1
    f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <=
    68.3899040222168 : 0.028218, f_count_all_base_tables = 0 ,
    f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 :
    0.173916 are_all_ts_index_ref = true outcome=0

   Line such as this serialises what leg of decision tree decided
   outcome of this query 0 -> back to mysql 1 -> keep on rapid.
   each leg is uniquely searchable via identifier such as #1#1#1 here.

This worklog additionally introduces python scripts to run queries on
mysql client with multiple queries and multiple dmls at once, in
various modes such as simulator mode and standard benchmark modes.

By Default this WL is enabled, but before release it will be disabled.
This is tracked via BUG#36343189 #no-close.

Perf mode unittests will be enabled on jenkins after this wl.
Further cleanup will be done via BUG#36368437 #no-close.

Bugs tackled via this WL: 	BUG#35738194, Enh#34132523, Bug#36343208

Unrelated bugs fixed: BUG#35987975

Old gerrit review : 25567 (abandoned due to 1000 update limit reached)

Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
laurynas-biveinis pushed a commit to laurynas-biveinis/mysql-server that referenced this pull request Jul 9, 2024
This worklog introduces dynamic offload of Queries to RAPID in following
ways:

When system variable rapid_use_dynamic_offload is 0/false , then we
fall back to normal cost threshold classifier, which also implies that
when use secondary engine is set to forced, eligible queries will go to
secondary engine, regardless of cost threshold or this classifier.

When rapid_use_dynamic_offload is 1/true, then we proceed with looking
for optimal execution engine for this queries, if secondary engine is
found more optimal, then query is offloaded, otherwise it is sent back
to mysql. This is handled in following scenarios:

1. Static Scenario: When there's no Change propagation or Queue on RAPID
side, this introduces decision tree which has > 85 % precision in
training which queries should be faster on mysql or which queries
should be faster on mysql, and accepts or rejects queries.  the decision
tree takes around 20-100 microseconds for fast queries, hence
minimal overhead, for bigger queries this introduces overhead of
upto maximum observed 700 microseconds, these end up with long execution
time, hence not a problem. For very fast queries, defined here by having
cost < 10 and of the form point select, dynamic offload is not applied,
since 100 % of these queries  (out of 16667 samples) are faster on
MySQL. Additionally, routing these "very fast queries" through dynamic
offload leads to performance regressions due to 3 phase optimisation.

2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog
 introduces dynamic feature normalization to factor into account
 extra catch up time RAPID needs, and factoring in that, attempts to
 verify if RAPID is still the best engine for execution. If queue is
 too long or CP is too long, this mechanism wants to progressively start
 shifting queries to mysql, moving gradually towards the heavier queries

The steps in this worklog with respect to query lifecycle in server with
secondary_engine = ON, are described below:

query
   |
Primary Tentatively optimisation -> mysql optimises for Innodb
   |
secondary_engine_pre_prepare_hook -> following Rapid function called:
   |  RapidCachePrimaryInfoAtPrimaryTentativelyStep
   |  If dynamic offload is enabled and query is not "very fast":
   |   This caches features from mysql plan in rapid_statement_context
   |   to be used for dynamic offload.
   |  If dynamic offload is disabled or the query is "very fast":
   |   This function invokes standary mysql cost threshold classifier,
   |   which decides if query needs further RAPID optimisation.
   |
   |
   |-> if returns False, then query proceeds to Innodb for execution
   |-> if returns true, step below is called
   |
 Secondary optimisation -> mysql optimises for RAPID
   |
prepare_secondary_engine -> following Rapid function is called:
   |   RapidPrepareEstimateQueryCosts
   |     In this function, Dynamic offload combines mysql plan features
   |      retrieved from rapid_statement_context
   |     and RAPID info such as rapid base table cardinality,
   |     dict encoding projection, varlen projection size, rapid queue
   |     size in to decide if query should be offloaded to RAPID.
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, step below is called
   |
optimize_secondary_engine -> following Rapid function is called
   |    RapidOptimize
   |     In this function, Dynamic offload retrieves info from
   |     rapid_statement_context and additionally looks at Change
   |     propagation lag to decide if query should be offloaded to rapid
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, then query goes to Rapid Execution.

Following new MYSQL ERR log messages are printed with this WL, when
dynamic offload is enabled, and query is not a "very fast query".

1. SelOffload allow decision 1 : as secondary not forced 1 and enable
 var value 1 and transactional enabled 1 and( big shape detected 0
  or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 ,
   no lh table: 1

   Message such as this shows if dynamic offload is used to classify
   this query or not. If not, why not, using each of the conditions.
   1 = pass, 0 = not pass.

2. myqid=65 Selective offload classifier mysql#1#1#1
    f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <=
    68.3899040222168 : 0.028218, f_count_all_base_tables = 0 ,
    f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 :
    0.173916 are_all_ts_index_ref = true outcome=0

   Line such as this serialises what leg of decision tree decided
   outcome of this query 0 -> back to mysql 1 -> keep on rapid.
   each leg is uniquely searchable via identifier such as mysql#1#1#1 here.

This worklog additionally introduces python scripts to run queries on
mysql client with multiple queries and multiple dmls at once, in
various modes such as simulator mode and standard benchmark modes.

By Default this WL is enabled, but before release it will be disabled.
This is tracked via BUG#36343189 #no-close.

Perf mode unittests will be enabled on jenkins after this wl.
Further cleanup will be done via BUG#36368437 #no-close.

Bugs tackled via this WL: 	BUG#35738194, Enh#34132523, Bug#36343208

Unrelated bugs fixed: BUG#35987975

Old gerrit review : 25567 (abandoned due to 1000 update limit reached)

Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
gipulla pushed a commit that referenced this pull request Oct 15, 2024
… and .6node3rpl

Issue #1
 Problem:
   Test fail in 4node4rpl (1 node group).
 Solution:
   Skip test when there is only one NG.

Issue #2
  Problem:
    Test fail in 6node3rpl (2 node group) with timeout.
    Test idea is to restart, with nostart option, *ALL* nodes
    in same node group to check if QMGR handles it wrongly as
    "node group is missing".
    In the test only two nodes in same node group are restarted,
    it works for 2 replica setups but, for 4 replica, test
    hangs waiting cluster to enter a noStart state.
  Solution:
   Instead of restart exactly 2 nodes, restart ALL nodes in a
   given node group.

Change-Id: Iafb0511992a553723013e73593ea10540cd03661
dbussink added a commit to planetscale/mysql-server that referenced this pull request Nov 21, 2024
In case `with_ndb_home` is set, `buf` is allocated with `PATH_MAX` and
the home is already written into the buffer.

The additional path is written using `snprintf` and it starts off at
`len`. It still can write up to `PATH_MAX` though which is wrong, since
if we already have a home written into it, we only have `PATH_MAX - len`
available in the buffer.

On Ubuntu 24.04 with debug builds this is caught and it crashes:

```
*** buffer overflow detected ***: terminated
Signal 6 thrown, attempting backtrace.
stack_bottom = 0 thread_stack 0x0
 #0 0x604895341cb6 <unknown>
 mysql#1 0x7ff22524531f <unknown> at sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
 mysql#2 0x7ff22529eb1c __pthread_kill_implementation at ./nptl/pthread_kill.c:44
 mysql#3 0x7ff22529eb1c __pthread_kill_internal at ./nptl/pthread_kill.c:78
 mysql#4 0x7ff22529eb1c __GI___pthread_kill at ./nptl/pthread_kill.c:89
 mysql#5 0x7ff22524526d __GI_raise at sysdeps/posix/raise.c:26
 mysql#6 0x7ff2252288fe __GI_abort at ./stdlib/abort.c:79
 mysql#7 0x7ff2252297b5 __libc_message_impl at sysdeps/posix/libc_fatal.c:132
 mysql#8 0x7ff225336c18 __GI___fortify_fail at ./debug/fortify_fail.c:24
 mysql#9 0x7ff2253365d3 __GI___chk_fail at ./debug/chk_fail.c:28
 mysql#10 0x7ff225337db4 ___snprintf_chk at ./debug/snprintf_chk.c:29
 mysql#11 0x6048953593ba <unknown>
 mysql#12 0x604895331a3d <unknown>
 mysql#13 0x6048953206e7 <unknown>
 mysql#14 0x60489531f4b1 <unknown>
 mysql#15 0x60489531e8e6 <unknown>
 mysql#16 0x7ff22522a1c9 __libc_start_call_main at sysdeps/nptl/libc_start_call_main.h:58
 mysql#17 0x7ff22522a28a __libc_start_main_impl at csu/libc-start.c:360
 mysql#18 0x60489531ed54 <unknown>
 mysql#19 0xffffffffffffffff <unknown>
```

In practice this buffer overflow only would happen with very long paths.

Signed-off-by: Dirkjan Bussink <[email protected]>
VarunNagaraju added a commit to VarunNagaraju/mysql-server that referenced this pull request Jan 3, 2025
bjornmu pushed a commit that referenced this pull request Jan 21, 2025
Description:
============
Dropping a primary key and adding a new auto-increment column as a
primary key in descending order using the "inplace" algorithm fails.

Analysis:
=========
Dropping an existing primary key and adding a new auto-increment key in
descending order requires arranging the records in reverse order,
which necessitates a file sort. However, this scenario was not detected
in the method innobase_pk_order_preserved(), causing it to return false.

As a result, the ALTER INPLACE operation, which calls this method,
skips the file sort. Instead, it processes the primary key as usual
in batches, a method known as bulk mode. In bulk mode, records are
inserted into a sort buffer (in descending order in this case).
When the sort buffer becomes full, records are directly inserted into
the B-tree.

Consider a case where we have 2000 records, and the sort buffer can
hold 1000 records in a batch:

Batch #1 inserted: Records 1000 to 1 (in descending order)
Batch #2 inserted: Records 2000 to 1001 (in descending order)

If the records from both batches happen to be in the same page,
the record order is violated.

It's important to note that this record order violation would still
exist even if the sort buffer were skipped when file sort was skipped.
Therefore, enabling file sort is essential to ensure correct record
order across batches.

Fix:
====
Enable file sort when add autoinc descending.

This patch is based on the contribution from Shaohua Wang at Alibaba
Group. We thank you for contributing to MySQL.

Change-Id: I398173bbd27db7f5e29218d217bf11c30297c242
lurkingryuu pushed a commit to lurkingryuu/asql that referenced this pull request Apr 13, 2025
all working, super remainign
bjornmu pushed a commit that referenced this pull request Apr 15, 2025
Post-push fix for broken unit test mdl-t

In Debug mode:
mdl-t: sql/mdl.h:481: void MDL_key::mdl_key_init(enum_mdl_namespace, const char *, const char *):
    Assertion `!use_normalized_object_name()' failed.
unit test got signal 6
stack_bottom = 0 thread_stack 0x0
 #0 0x67f2b7 _ZL14signal_handleri at unittest/gunit/gunit_test_main.cc:62
 #1 0x7f9fd3e4fcff <unknown> at sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
 ....
 #7 0x645bda _ZN7MDL_key12mdl_key_initENS_18enum_mdl_namespaceEPKcS2_ at sql/mdl.h:481
 #8 0x65f6d1 _ZN11MDL_request16init_with_sourceEN7MDL_key18enum_mdl_namespaceEPKcS3_13enum_mdl_type17enum_mdl_durationS3_j at sql/mdl.cc:1520
 #9 0x624783 _ZN12mdl_unittest39MDLHtonNotifyTest_NotifyNamespaces_Test8TestBodyEv at unittest/gunit/mdl-t.cc:3893

In RelWithDebInfo mode:
unittest/gunit/mdl-t.cc:3902: Failure
Expected equality of these values:
  1U
    Which is: 1
  pre_acquire_count()
    Which is: 0

Change-Id: I155ef98b40fb521a1721ba4f34e3a315ef847626
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants