Re: Fwd: little request :)

From: Date: Thu, 06 Feb 2014 14:24:48 +0000
Subject: Re: Fwd: little request :)
References: 1 2 3  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
On Wed, Feb 05, 2014 at 11:20:27AM +0100, Pierre Joye wrote:
> About the timing attack RFC, I have asked for some review and advice
> and here is a useful one already, thanks Alex :)
> 
> Please keep him as CC as I do not know if he is on this list.

Thanks.  I am in fact not subscribed, and I appreciate the CC's.
(I also find the "little request" message Subject weird for use on a
public mailing list, but now it's stuck.)

On Thu, Feb 06, 2014 at 07:57:17AM +0900, Yasuo Ohgaki wrote:
> Another way to protect from timing attack is have a random sleep, but
> this is tricky also. Too short or too long sleep doesn't help.

Let's not even consider this.  Adding random noise doesn't necessarily
prevent the signal from being heard, even if the noise is louder than
the signal.

There's a related approach, though: blinding.  Instead of comparing the
two strings directly, they may be hashed along with some secret input
(may be a newly generated random number) and the hashes then compared.
There are some arguments in favor of this approach when it's implemented
in a scripting language - e.g., if someone were writing code in PHP -
but when we're talking about native implementation (to be part of PHP),
those arguments don't apply.  Some references to this approach were in
yesterday's and today's Twitter discussion between @zooko and others,
including me.  There were lots of tweets.  You may take a look at:

https://www.isecpartners.com/blog/2011/february/double-hmac-verification.aspx
http://openwall.info/wiki/people/solar/algorithms/challenge-response-authentication#Timing-attacks
https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/hashutil.py?annotate=blame&rev=a4a6c02ef8ae2e0edb30bb0051873ffca6af6fc0#L205
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2165

When the secret component is very low entropy, this attack may work:

http://lists.randombit.net/pipermail/cryptography/2011-December/001866.html
http://lists.randombit.net/pipermail/cryptography/2011-December/001874.html

@mik235 pointed out that Python 3.4+ already does something similar for
the == operator all the time, by using SipHash and, if I understood
correctly, caching SipHash values along with the strings (just like it's
also done for the lengths).  There's no attempt at not leaking the
length.  I haven't looked at the code, so this is merely my current
understanding based on @mik235's tweets.

On Wed, Feb 05, 2014 at 03:00:10PM -0800, Stas Malyshev wrote:
> I'm not sure the issue of hiding length is a real one here. Most crypto
> algorithms would have fixed-length or one-of-the-very-short-list length
> results, and in this scenario it is usually obvious from the app code
> what the hash length would be, at least within very small number of
> options, so I'm not sure how big a deal it is.

You're right, for most uses it should not matter.

On Thu, Feb 06, 2014 at 01:06:26AM +0100, Rouven We?ling wrote:
> I may just be blind, because it's getting late. I don't see any integer division
> occurring.

As Tjerk Meesters pointed out, the % operator involves integer division.
This function can easily spend 10x more time doing the (variable time)
integer division than anything else, thereby exacerbating timing leaks.

> > Is MAX() implemented without branching?  Hardly.
>
> It's not (for reference the code is below). I think the impact of this is reduced by the
> fact, that unless the length is 0 always the same branch is used.

Yes, it will most likely only leak whether the secret's length is 0 or not.

> I think we should just document this as potentially leaking information about the length.

I agree.

> If we do find ways to reduce this, by all means, we should take them.

Some of them may involve tradeoffs or in some cases actually make the
timing leaks worse (while reducing them in other cases), so it's not so
simple.  For example, this suggestion:

On Thu, Feb 06, 2014 at 12:06:35PM +0900, Yasuo Ohgaki wrote:
> For example, read few random bytes and add them to get number of
> iterations, then iterate something for the number. Although it's not
> perfect, it makes much harder than nothing.

This may potentially make the leak even worse: the random noise may be
filtered out given enough measurements, whereas the multiple iterations
may be amplifying the signal.  Let's not do something so questionable.

On Thu, Feb 06, 2014 at 11:51:55AM +0800, Tjerk Meesters wrote:
> Btw, could we use the name hash_equals() instead? This function returns a
> boolean whereas a standard comparison functions return -1, 0, 1.

Makes sense to me.  In the Twitter discussion, it was pointed out that
Tor project actually has a memcmp()-like constant time comparison
function, but ends up using it to compare only for equal/not-equal
anyway.  So the extra complexity of supporting 3 possible return values
is not needed.

Also, I recommend an approach-neutral, purpose-centric function name.
This operation doesn't have to be constant-time to achieve the end goal.
It may as well be e.g. random time, as long as it does not leak info
about the secret via timings.  And the approach taken may change in a
future version, as long as the function's properties are retained (or
even improved upon).  e.g. Tahoe-LAFS went with timing_safe_compare() today:

https://github.com/zooko/tahoe-lafs/commit/a79b6927af57dc701af2f9037f11accc324a9cfb

but you may use a name more consistent with PHP's other related function
names, and also emphasizing that the return value is TRUE if the strings
match (if this is how you define the function to behave).  e.g.
timing_safe_eq() or hash_eq().

On Thu, Feb 06, 2014 at 01:53:29PM +0900, Yasuo Ohgaki wrote:
> hash_compare() may only compare hashed values as the function
> name imply. If length differs, it may return FALSE simply. (As well as
> wrong type)
>
> This way, we don't have to worry about length leak and constant operation.
> Hash string length is not a secret.

Right.

> If user have 'raw data' (e.g. raw password,
> etc) to compare, make them apply hash function first.

... which will leak the length via timing of the hash function.

More from Twitter, on trying to hide the length (no, that discussion was
not about PHP, but in general):

<@SteveBellovin> @marshray @tqbf You could, I suppose, "compare" dummy data to reach
a constant length, making sure you match cache line alignment.

As you can see, after getting rid of the modulo division, we quickly get
to consider how our memory reads pattern differs from what we'd have
when comparing against a sufficiently long secret string.  It's tricky.

Alexander


Thread (42 messages)

« previous php.internals (#72341) next »