| Lists: | pgsql-committerspgsql-hackers | 
|---|
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-committers(at)postgresql(dot)org | 
| Subject: | pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi | 
| Date: | 2013-04-07 18:47:07 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Lists: | pgsql-committers pgsql-hackers | 
Get rid of USE_WIDE_UPPER_LOWER dependency in trigram construction.
contrib/pg_trgm's make_trigrams() was coded to ignore multibyte character
boundaries and just make trigrams from bytes if USE_WIDE_UPPER_LOWER wasn't
defined.  This is a bit odd, since there's no obvious reason why trigram
compaction rules should depend on the presence of towlower() and friends.
What's more, there was an Assert() that would fail if that code path was
fed any multibyte characters.
We need to do something about this since the pending regex-indexing patch
has an assumption that you get just one "trgm" from any three characters.
The best solution seems to be to remove the USE_WIDE_UPPER_LOWER
dependency, which shouldn't really have been there in the first place.
The second loop in make_trigrams() is now just a fast path and not a
potentially incompatible algorithm.
If there is anybody still using Postgres on machines without wcstombs() or
towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
incorrectly. It seems likely that there are no such installations, though.
In passing, rename cnt_trigram to compact_trigram, which seems to better
describe its functionality, and improve make_trigrams' test for whether it
has to use the slow path or not (per a suggestion from Alexander Korotkov).
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/7844608e54a3a2e3dee461b00fd6ef028a845d7c
Modified Files
--------------
contrib/pg_trgm/trgm_op.c |   17 ++++++++++-------
1 files changed, 10 insertions(+), 7 deletions(-)
| From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-committers(at)postgresql(dot)org | 
| Subject: | Re: pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi | 
| Date: | 2013-04-08 08:11:30 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Lists: | pgsql-committers pgsql-hackers | 
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> If there is anybody still using Postgres on machines without wcstombs() or
> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
> incorrectly. It seems likely that there are no such installations, though.
Those conditions seem just complex enough to require a test script that
will check that for you. What if we created a new binary responsible for
auto checking all those release-note items that are possible to machine
check, then issue a WARNING containing the URL to the release notes you
should be reading, and a SQL script (ala pg_upgrade) to run after
upgrade?
  $ pg_checkupgrade -d "connection=string" > upgrade.sql
  NOTICE: checking 9.3 upgrade release notes
  WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3
  WARNING: TN-93-0012 …
  WARNING: This script is NOT comprehensive, read release notes at …
The target version would be hard coded on the binary itself for easier
maintaining of it, and that proposal includes a unique identifier for
any release note worthy warning that we know about, that would be
included in the output of the program.
I think most of the checks would only have to be SQL code, and some of
them should include running some binary code the server side. When
that's possible, we could maybe expose that binary code in a server side
extension so as to make the client side binary life's easier. That would
also be an excuse for the project to install some upgrade material on
the old server, which has been discussed in the past for preparing
pg_upgrade when we have a page format change.
What do you think?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> | 
| Subject: | Re: [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi | 
| Date: | 2013-04-08 14:36:26 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Lists: | pgsql-committers pgsql-hackers | 
Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> If there is anybody still using Postgres on machines without wcstombs() or
>> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
>> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
>> incorrectly. It seems likely that there are no such installations, though.
> Those conditions seem just complex enough to require a test script that
> will check that for you. What if we created a new binary responsible for
> auto checking all those release-note items that are possible to machine
> check, then issue a WARNING containing the URL to the release notes you
> should be reading, and a SQL script (ala pg_upgrade) to run after
> upgrade?
How exactly would you know whether the previous installation was built
without HAVE_WCSTOMBS/HAVE_TOWLOWER?  That's not exposed anywhere
reliable.  And it's not out of the question that somebody upgrading to
a newer PG version might upgrade his OS too, so I would not think that
checking what configure says now is trustworthy.
In general, though, this suggestion seems about two orders of magnitude
more work than the particular case justifies.  We might want to think
about something like it for future releases, but it's not likely to
happen for 9.3.
regards, tom lane
| From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi | 
| Date: | 2013-04-08 14:58:23 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Lists: | pgsql-committers pgsql-hackers | 
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> How exactly would you know whether the previous installation was built
> without HAVE_WCSTOMBS/HAVE_TOWLOWER?  That's not exposed anywhere
> reliable.  And it's not out of the question that somebody upgrading to
> a newer PG version might upgrade his OS too, so I would not think that
> checking what configure says now is trustworthy.
Oh, it's even worse than I imagined then. If you don't know where to
find the information, maybe the release notes should just propose to
REINDEX in any case?
> In general, though, this suggestion seems about two orders of magnitude
> more work than the particular case justifies.  We might want to think
> about something like it for future releases, but it's not likely to
> happen for 9.3.
Yeah, I've been thinking that it's the case after sending the email
proposal. It might just be some accumulative effect that leads me to
that proposal. I still think that a tool able to asses if you're
concerned by "reindex gist indexes on that datatype" or suchlike would
be really good to have.
Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support
| From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> | 
|---|---|
| To: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> | 
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers(at)postgresql(dot)org, pgsql-hackers(at)postgreSQL(dot)org | 
| Subject: | Re: [COMMITTERS] pgsql: Get rid of USE_WIDE_UPPER_LOWER dependency in trigram constructi | 
| Date: | 2013-04-10 18:46:17 | 
| Message-ID: | [email protected] | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Lists: | pgsql-committers pgsql-hackers | 
On 04/08/2013 10:11 AM, Dimitri Fontaine wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> If there is anybody still using Postgres on machines without wcstombs() or
>> towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
>> to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
>> incorrectly. It seems likely that there are no such installations, though.
> 
> Those conditions seem just complex enough to require a test script that
> will check that for you. What if we created a new binary responsible for
> auto checking all those release-note items that are possible to machine
> check, then issue a WARNING containing the URL to the release notes you
> should be reading, and a SQL script (ala pg_upgrade) to run after
> upgrade?
> 
>   $ pg_checkupgrade -d "connection=string" > upgrade.sql
>   NOTICE: checking 9.3 upgrade release notes
>   WARNING: RN-93-0001 index idx_trgm_abc is not on-disk compatible with 9.3
>   WARNING: TN-93-0012 …
>   WARNING: This script is NOT comprehensive, read release notes at …
> 
> The target version would be hard coded on the binary itself for easier
> maintaining of it, and that proposal includes a unique identifier for
> any release note worthy warning that we know about, that would be
> included in the output of the program.
> 
> I think most of the checks would only have to be SQL code, and some of
> them should include running some binary code the server side. When
> that's possible, we could maybe expose that binary code in a server side
> extension so as to make the client side binary life's easier. That would
> also be an excuse for the project to install some upgrade material on
> the old server, which has been discussed in the past for preparing
> pg_upgrade when we have a page format change.
given something like this also will have to be dealt with by pg_upgrade,
why not fold it into that (like into -c) completly and recommend running
that? on the flipside if people don't read the release notes they will
also not run any kind of binary/script mentioned there...
Stefan