Re: [VOTE] Timing attack safe string comparison function

From: Date: Mon, 10 Feb 2014 09:29:31 +0000
Subject: Re: [VOTE] Timing attack safe string comparison function
References: 1 2 3 4 5  Groups: php.internals 
Request: Send a blank email to [email protected] to get a copy of this message
Hi,

On Mon, Feb 10, 2014 at 9:15 AM, Yasuo Ohgaki <[email protected]> wrote:

> Hi all,
>
> On Mon, Feb 10, 2014 at 9:24 AM, Yasuo Ohgaki <[email protected]> wrote:
>
> > On Fri, Feb 7, 2014 at 10:39 AM, Yasuo Ohgaki <[email protected]>
> wrote:
> >
> >> On Fri, Feb 7, 2014 at 8:05 AM, Yasuo Ohgaki <[email protected]>
> wrote:
> >>
> >>> I made SipHash version of str_compare() as a sample.
> >>> There is timing safe php_compare(), which is stolen from BSD.
> >>>
> >>> https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare
> >>>
> >>> [yohgaki@dev github-php-src]$ ./php-bin -r
> 'var_dump(str_compare("abc",
> >>> "abc"));'
> >>> bool(true)
> >>> [yohgaki@dev github-php-src]$ ./php-bin -r
> >>> 'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
> >>> bool(false)
> >>>
> >>> It's quick patch made less than 30 min.
> >>> So it can be improved, I suppose.
> >>>
> >>
> >> I thought it would be better to compare performance difference.
> >> Added more functions to play with.
> >> There are
> >>
> >>  bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
> >>   bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
> >>  bool str_md5_compare(str, str) - md5. Timing safe (128bit)
> >>   bool str_byte_compare(str, str) - Byte compare. Timing safe. No
> >> division.
> >>  bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
> >> division. (Modulo as this RFC)
> >>  bool str_compare(str, str) - plain strncmp(). Not timing safe.
> >>
> >> I didn't took bench mark and did minimum tests.
> >> I appreciate if anyone take benchmark.
> >>
> >
> > Added yet another function to compare suggested by Lester.
> >
> > bool str_word_compare(str, str)
> >
> > This function compares data word by word rather than byte by byte.
> > It supposed to be faster for large data.
> >
>
> I took a benchmark. str_compare() is not timing safe. It's there for
> reference.
>
> str_siphash_compare  Elapsed: 1.389824   Iterations: 1000000 DataSize: 8
> str_xxhash32_compare Elapsed: 1.241737   Iterations: 1000000 DataSize: 8
> str_md5_compare      Elapsed: 3.029127   Iterations: 1000000 DataSize: 8
> str_byte_compare     Elapsed: 1.236183   Iterations: 1000000 DataSize: 8
> str_byte_compare2    Elapsed: 1.269901   Iterations: 1000000 DataSize: 8
> str_word_compare     Elapsed: 1.273266   Iterations: 1000000 DataSize: 8
> str_compare          Elapsed: 1.181425   Iterations: 1000000 DataSize: 8
>
> str_byte_compare() is the winner for small data.
> I'm a little surprised that str_xxhash32_compare() is the second.
> str_word_compare() is marginally slower.
>
> str_siphash_compare  Elapsed: 2.341025   Iterations: 1000000 DataSize: 128
> str_xxhash32_compare Elapsed: 1.560131   Iterations: 1000000 DataSize: 128
> str_md5_compare      Elapsed: 6.055007   Iterations: 1000000 DataSize: 128
> str_byte_compare     Elapsed: 1.799050   Iterations: 1000000 DataSize: 128
> str_byte_compare2    Elapsed: 2.163229   Iterations: 1000000 DataSize: 128
> str_word_compare     Elapsed: 1.337508   Iterations: 1000000 DataSize: 128
> str_compare          Elapsed: 1.194582   Iterations: 1000000 DataSize: 128
>
> str_word_compare() is the winner for relatively large data.
>
> It seems str_word_compare() is the way to go.
>

Quite honestly, I don't think that an absolute difference of 0.46 μs per
function call should be used to pick the "winner" from the line-up you're
made.

That said, the implementation of str_word_compare introduces a MIN()
branch and the second loop should loop until MAX(b1_len, b2_len) instead
of b2_len alone; imo you've just made it harder to prove that it actually
prevents timing attacks.

From what has been discussed so far, to me, the question that needs an
answer is:

"Should we have a function that attempts to provide a constant comparison
for strings of different lengths?"

If not, reduce the function to just a length mismatch branch and a loop;
length information is leaked as long as both strings are of difference
lengths only.

If so, then we need to create and run a simulation program to prove that a
particular algorithm is indeed suitable with a certain amount of confidence
for the most common scenarios.


>
> Regards,
>
> --
> Yasuo Ohgaki
> [email protected]
>



-- 
--
Tjerk


Thread (54 messages)

« previous php.internals (#72434) next »