Quick Links

Re: Create on insert a unique random number

Lists:	pgsql-sql

From:	Steve Midgley <public(at)misuse(dot)org>
To:	lance(at)uiuc(dot)edu,pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-18 19:23:35
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

At 11:58 AM 3/18/2008, pgsql-sql-owner(at)postgresql(dot)org wrote:
>Date: Tue, 18 Mar 2008 13:40:42 -0500
>From: "Campbell, Lance" <lance(at)uiuc(dot)edu>
>To: "Vivek Khera" <vivek(at)khera(dot)org>,
> <pgsql-sql(at)postgresql(dot)org>
>Subject: Re: Create on insert a unique random number
>Message-ID:
><B10E6810AC2A2F4EA7550D072CDE8760CDDC34(at)SAB-FENWICK(dot)sab(dot)uiuc(dot)edu>
>
>Thanks for all of your input. It appears that the best way to do this
>is to create a default random number in the primary id field in the
>table definition and then return that value after insert. If an
>exception occurs because of duplicates I will simple perform the same
>insert statement again. I doubt there would be many duplicate hits if
>I
>use a really large number.
>[snip]
>I built a web application called the Form Builder. It allows
>individuals to create web forms. After a user is done building their
>web form the tool provides a URL for the user to access the form.
>Obviously the URL has the random ID of the form in it. Most of the
>forms created with this tool can be accessed and filled out by the
>general public.
[snip]

Hi Lance,

I think I "get you" as a fellow web systems (aka middleware) guy. My
opinion is that the use of a "sparse index" is totally reasonable for
the purpose you describe. But I would argue that you could take it a
little further in implementation that might keep your db design sane
while still giving you the sparse index function on the front-end.

1) Create a second field (as someone recommend on this list) that is an
MD5 of your primary key. Use that as your "accessor" index from the web
application. But keep the primary key as an integer serial, so that it
works as expected, and you can build relations normally. I think in the
end you'll be happier with this method than messing around with a
custom primary key system.. You can build a trigger that generates the
MD5 hash every time a record is created (or you can do it in your ORM
layer in the web app).

2) Also, (but OT) put a monitor on your weblogs to look for "404"
errors ("page not found" for the sql-only people here). This will
supplement your sparse index by detecting people who are scanning your
sparse index space and generating lots of "misses."

Hope that helps,

Steve

From:	"D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To:	Steve Midgley <public(at)misuse(dot)org>
Cc:	lance(at)uiuc(dot)edu, pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-18 19:36:13
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

On Tue, 18 Mar 2008 12:23:35 -0700
Steve Midgley <public(at)misuse(dot)org> wrote:
> 1) Create a second field (as someone recommend on this list) that is an
> MD5 of your primary key. Use that as your "accessor" index from the web

I strongly disagree for three reasons. First, if you are going to
generate a key then don't store it. Just generate it every time.
Second, don't generate it based on a known field. You may think that
it is secure but what if you private key is compromised? Do you then
change everyone's security code? Third, what if one person's
code is compromised? If it is based on a calculation then you
can't change that one person's security code.

Generate a random number and store that. You will be much happier when
something goes wrong and something always goes wrong.

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

From:	Steve Midgley <public(at)misuse(dot)org>
To:	"D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>,lance(at)uiuc(dot)edu
Cc:	lance(at)uiuc(dot)edu,pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-18 20:57:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

At 12:36 PM 3/18/2008, D'Arcy J.M. Cain wrote:
>On Tue, 18 Mar 2008 12:23:35 -0700
>Steve Midgley <public(at)misuse(dot)org> wrote:
> > 1) Create a second field (as someone recommend on this list) that
> is an
> > MD5 of your primary key. Use that as your "accessor" index from the
> web
>
>I strongly disagree for three reasons. First, if you are going to
>generate a key then don't store it. Just generate it every time.
>Second, don't generate it based on a known field. You may think that
>it is secure but what if you private key is compromised? Do you then
>change everyone's security code? Third, what if one person's
>code is compromised? If it is based on a calculation then you
>can't change that one person's security code.
>
>Generate a random number and store that. You will be much happier
>when
>something goes wrong and something always goes wrong.
>
>--
>D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three
>wolves

Hi D'Arcy,

I'm not clear on your concern here - an MD5 hash doesn't have a private
key that can be compromised, afaik. It's a one way hash. I don't see
much difference between making an MD5 of the primary key and generating
a random number for the "public primary key", except that you shouldn't
get index collisions with the MD5 method (whereas eventually you will
with a random number, though of course using a GUID would eliminate
that concern for practical purposes).

The issue raised by the OP, I believe, is not about security of the
primary key # itself or its ability to provide unauthorized access to
the underlying records. The system in question protects its records
from unauthorized access already.

The issue is about creating an index into a sparse hash so that each
record is somewhat randomly located in a sparse hash "index space".
(One valid reason to do this would be if you wanted to hide the total
number of records in your table from competitors or customers). (Just
for reference of my view on the problem:
http://en.wikipedia.org/wiki/Hash_table)

Whether SHA-1 or MD5, I think the point is that if you don't care about
speed in generating the hash index (which the OP doesn't apparently),
hash indexing via an encryption algorithm will ensure that the hash
index is relatively free of "clustering" - which as I understand it, is
the point of this exercise. Encryption as a hash index generator is
imperfect for sure, as the Wikipedia article goes at length to discuss,
but from my perspective it "does the job" - at least as far as the OP
describes it (or I understood it!). [smile]

I may be way off here of course, and I appreciate the input - any
thoughts?

Steve

From:	"D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To:	Steve Midgley <public(at)misuse(dot)org>
Cc:	lance(at)uiuc(dot)edu, pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-19 13:47:53
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

On Tue, 18 Mar 2008 13:57:39 -0700
Steve Midgley <public(at)misuse(dot)org> wrote:
> At 12:36 PM 3/18/2008, D'Arcy J.M. Cain wrote:
> >On Tue, 18 Mar 2008 12:23:35 -0700
> >Steve Midgley <public(at)misuse(dot)org> wrote:
> > > 1) Create a second field (as someone recommend on this list) that
> > is an
> > > MD5 of your primary key. Use that as your "accessor" index from the
> > web
> >
> >I strongly disagree for three reasons. First, if you are going to
> >generate a key then don't store it. Just generate it every time.
> >Second, don't generate it based on a known field. You may think that
> >it is secure but what if you private key is compromised? Do you then
> >change everyone's security code? Third, what if one person's
> >code is compromised? If it is based on a calculation then you
> >can't change that one person's security code.

> I'm not clear on your concern here - an MD5 hash doesn't have a private
> key that can be compromised, afaik. It's a one way hash. I don't see

Right so it is even less useful than I implied. It can never be
changed so why store it when it can be re-generated at any time.

> much difference between making an MD5 of the primary key and generating
> a random number for the "public primary key", except that you shouldn't
> get index collisions with the MD5 method (whereas eventually you will
> with a random number, though of course using a GUID would eliminate
> that concern for practical purposes).

But your suggestion was to base this key on the serial primary key so
where is your index collision protection? You are going to get
collisions on both the serial key and, to a lesser extent, your
generated one. Besides, has anyone ever demonstrated a real issue with
lookups using serial primary keys? I think you are trying to second
guess the database engine with this and I don't think that that is a
great idea.

> The issue is about creating an index into a sparse hash so that each
> record is somewhat randomly located in a sparse hash "index space".
> (One valid reason to do this would be if you wanted to hide the total
> number of records in your table from competitors or customers). (Just

If that is your goal then start your serial at something other than 1.
Start at 1,000,000 for example and your first user will think that
you already have one million clients. Actually, he will think that
you started elsewhere than 1 but he won't know where.

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

From:	Steve Midgley <public(at)misuse(dot)org>
To:	"D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
Cc:	lance(at)uiuc(dot)edu,pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-19 15:28:28
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

At 06:47 AM 3/19/2008, D'Arcy J.M. Cain wrote:
>But your suggestion was to base this key on the serial primary key so
>where is your index collision protection? You are going to get
>collisions on both the serial key and, to a lesser extent, your
>generated one. Besides, has anyone ever demonstrated a real issue with
>lookups using serial primary keys? I think you are trying to second
>guess the database engine with this and I don't think that that is a
>great idea.
Hi D'Arcy,

I'm not following this line. Maybe we're talking about two different
things here.. I don't know if Lance is using "CRUD" methodology per se,
but that's a well accepted web approach and uses (generally) serial
primary keys in the URL structure as (where numbers are serial pk's):

[website]/contact/12345
[website]/property/45678
[and the client sends GET, POST, PUT, DELETE http requests, or
mimics, to activate various functions]

Whether CRUD of otherwise, in the model I was promoting, there would be
two index columns in the table along with other data, a public index
and a serial primary key. The public index is based on the primary key:

pk | public_pk
1 | md5(1 + fixed salt)
2 | md5(2 + fixed salt)
...

AFAIK, an MD5 hash is guaranteed to generate a unique output for any
unique input, so the serial key and fixed salt would guarantee no hash
index collisions on the MD5 output. Of course if a competitor knows
you're using MD5 and they know your salt, they could calculate all the
md5 integer hashes and see which ones exist..

But I could care less if he uses md5 or sha-1 or Guids! (I just picked
MD5 because another poster recommended it and it's very easy to
implement in Pg). The point I care about is that there would be a
public_pk that associates to one-and-only-one serial pk. Also that
public_pk should be 1) not easily guessable, 2) non-clustering (and
therefore non-serial). Then his url's would look like something like:

[website]/contact/c4ca4238a0b923820dcc509a6f75849b
[website]/property/c81e728d9d4c2f636f067f89cc14862c

> > The issue is about creating an index into a sparse hash so that
> each
> > record is somewhat randomly located in a sparse hash "index space".
>
> > (One valid reason to do this would be if you wanted to hide the
> total
> > number of records in your table from competitors or customers).
> (Just
>
>If that is your goal then start your serial at something other than 1.
>Start at 1,000,000 for example and your first user will think that
>you already have one million clients. Actually, he will think that
>you started elsewhere than 1 but he won't know where.

The original post did not want users to be able to type in random
integers like:

/contact/343

And find out if that record #343 exists or not (regardless of whether
they can get access to the record - the error generated on
no-authorization may be different from record-not-found). So starting
at a million does not fix the OP's issue.

From my perspective, wherever you start your serial index, competitors
can watch it grow over time, if it's a numeric serial. That could be
more valuable in many businesses than knowing the initial size of the
table.

Anyway, I hope that clears up what I was recommending! I didn't
anticipate it would stir up this much analysis and I hope the OP finds
your input and mine useful in coming up with a final answer to his
issue. Thanks for taking the time to consider the issue and I'll look
forward to any additional ideas or comments you have on this too!

Sincerely,

Steve

From:	"D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To:	Steve Midgley <public(at)misuse(dot)org>
Cc:	lance(at)uiuc(dot)edu, pgsql-sql(at)postgresql(dot)org
Subject:	Re: Create on insert a unique random number
Date:	2008-03-19 16:37:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-sql

On Wed, 19 Mar 2008 08:28:28 -0700
Steve Midgley <public(at)misuse(dot)org> wrote:
> I'm not following this line. Maybe we're talking about two different
> things here.. I don't know if Lance is using "CRUD" methodology per se,
> but that's a well accepted web approach and uses (generally) serial
> primary keys in the URL structure as (where numbers are serial pk's):
>
> [website]/contact/12345
> [website]/property/45678
> [and the client sends GET, POST, PUT, DELETE http requests, or
> mimics, to activate various functions]

Yes, I do this all the time.

> Whether CRUD of otherwise, in the model I was promoting, there would be
> two index columns in the table along with other data, a public index
> and a serial primary key. The public index is based on the primary key:
>
> pk | public_pk
> 1 | md5(1 + fixed salt)
> 2 | md5(2 + fixed salt)
> ...
>
> AFAIK, an MD5 hash is guaranteed to generate a unique output for any
> unique input, so the serial key and fixed salt would guarantee no hash
> index collisions on the MD5 output. Of course if a competitor knows
> you're using MD5 and they know your salt, they could calculate all the
> md5 integer hashes and see which ones exist..
>
> But I could care less if he uses md5 or sha-1 or Guids! (I just picked
> MD5 because another poster recommended it and it's very easy to
> implement in Pg). The point I care about is that there would be a
> public_pk that associates to one-and-only-one serial pk. Also that
> public_pk should be 1) not easily guessable, 2) non-clustering (and
> therefore non-serial). Then his url's would look like something like:
>
> [website]/contact/c4ca4238a0b923820dcc509a6f75849b
> [website]/property/c81e728d9d4c2f636f067f89cc14862c

Right and, as you state above, they could be guessable if someone gets
their hands on a relatively small amount of information. If you simply
generate a random string of n characters where n is based on the amount
of security you need, you can store that and store it in a separate
field in the record. You don't even need to make them unique. Just
incorporate the serial number as well as the random string. There may
conceivably be two records with "1ed6f54e5636837ddae4ef33397ee2cb" as
the key but only one that looks like
"021857.1ed6f54e5636837ddae4ef33397ee2cb". In fact, you could md5 the
serial key and just string the two together if you really wanted
security through more obscurity but that's probably overkill.

The point here is that no one can guess what someone's URL is, even if
they know the ID, administrators can call up records by ID and
individual secret keys can be changed if compromised without affecting
anyone else. Also, it's a normalized table. Storing a value that you
can generate is unnormalized.

> The original post did not want users to be able to type in random
> integers like:
>
> /contact/343

See above. That's not what I was suggesting.

> And find out if that record #343 exists or not (regardless of whether
> they can get access to the record - the error generated on
> no-authorization may be different from record-not-found). So starting
> at a million does not fix the OP's issue.

Certainly you would generate the same error to the web user, even if
you differentiate in your internal error log.

> From my perspective, wherever you start your serial index, competitors
> can watch it grow over time, if it's a numeric serial. That could be
> more valuable in many businesses than knowing the initial size of the
> table.

I guess it depends on the business case. Certainly we can always find
a use case where a specific solution fails but that's not how we really
work. We get all the details of the requirements and then code what
solves them. See above for the "overkill" method that solves that
issue if it really is one.

> Anyway, I hope that clears up what I was recommending! I didn't
> anticipate it would stir up this much analysis and I hope the OP finds
> your input and mine useful in coming up with a final answer to his
> issue. Thanks for taking the time to consider the issue and I'll look
> forward to any additional ideas or comments you have on this too!

Yes, discussion is always useful, and fun. :-)

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.