Significantly improve performance of row decoding. #130

MrMage · 2019-03-07T18:34:09Z

Motivation:

PostgreSQLRowDecoder: _KeyedDecodingContainer.decode is called once for each key on the decoded object. With the current implementation, this performs a linear search on the row dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if tableOID != 0, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time).
PostgreSQLConnection.TableNameCache: Most lookups are tableName -> OID. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search.

Effect:

The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.

Motivation: 1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time). 2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search. Effect: The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.

jdmcd · 2019-03-07T18:42:38Z

Can this be applied to MySQL as well or is it Postgres specific?

MrMage · 2019-03-07T18:45:56Z

@mcdappdev looks like https://github.com/vapor/mysql/blob/master/Sources/MySQL/Codable/MySQLRowDecoder.swift is doing a linear search as well (firstValue(forColumn:), so Optimization 1. Not sure if a table name cache is being used in the MySQL driver, so Optimization 2 might not apply.

jdmcd · 2019-03-07T18:47:27Z

Cool... @tanner0101 thoughts?

twof · 2019-03-07T23:05:00Z

@MrMage Are you able to include the performance tests you're running?

Great work btw!

MrMage · 2019-03-08T07:32:03Z

@MrMage Are you able to include the performance tests you're running?

My tests were run with production code and data, so I'm afraid I can't share them. Also, I'm not sure whether Fluent has a good way to observe and record performance improvements/regressions, anyway. (There is https://github.com/vapor/fluent/tree/3/Sources/FluentBenchmark/Benchmarks, but I don't see how changes in these benchmarks' performance would be exposed and recorded.)

MrMage · 2019-03-08T08:06:58Z

Added another micro-optimization (implementing decodeIfPresent to avoid two dictionary lookups), gaining another 15ms (i.e. ~8% in my case). Further gains might be had from overriding all the methods in KeyedDecodingContainerProtocol for the types which have specializations (Bool, String, Double, Int, etc.), provided PostgreSQLDataDecoder.decoded is being specialized in the same fashion.

twof · 2019-03-08T10:17:11Z

@MrMage Working on something right now to track performance metrics
https://github.com/vapor/bot-github
vapor/routing-kit#66 (comment)

Hoping to have the bot redeployed tomorrow and then integrated with this repo so we can get you some charts :)

MrMage · 2019-03-11T08:22:15Z

@twof sounds good; let me know how to use that bot when its done (and where to add tests to showcase the performance gains; they would be most apparent for DBs with many tables and/or tables with many columns). Also, I assume this provides no way of measuring encode and decode time separately, given that both are being done in the same test?

tanner0101

These changes look great, thanks @MrMage!

tanner0101 · 2019-03-11T23:26:02Z

they would be most apparent for DBs with many tables and/or tables with many columns

We should add some performance tests for specifically this use case. Most of them end up being against dummy models with very few columns which gives unrealistic results.

Generally, this should be less of a problem in Fluent 4 since we will no longer be decoding all model properties by default. Properties will be decoded lazily on an as needed basis. (i.e., user.name.get()). However, it might be worth optimizing PostgresRow's value fetch method. Maybe we can use a dictionary here? Here's the relevant code in NIOPostgres which we will be using for PostgresKit 2.0:

https://github.com/vapor/nio-postgres/blob/master/Sources/NIOPostgres/Data/PostgresRow.swift#L16

MrMage · 2019-03-12T09:00:18Z

We should add some performance tests for specifically this use case. Most of them end up being against dummy models with very few columns which gives unrealistic results.

Agreed. I would propose a model with five fields each of types String, String?, Int, Int?, Date, Date?, Data, Data?. That should provide us a good baseline to optimize in the future, as well as let us catch future regressions.

Generally, this should be less of a problem in Fluent 4 since we will no longer be decoding all model properties by default. Properties will be decoded lazily on an as needed basis. (i.e., user.name.get()). However, it might be worth optimizing PostgresRow's value fetch method. Maybe we can use a dictionary here? Here's the relevant code in NIOPostgres which we will be using for PostgresKit 2.0:

https://github.com/vapor/nio-postgres/blob/master/Sources/NIOPostgres/Data/PostgresRow.swift#L16

We could probably add a "column name -> [(tableOID: Int, fieldIndex: Int)]" lookup dictionary to PostgresRow.LookupTable or PostgresMessage.RowDescription. Is PostgresMessage.RowDescription different for each row, or could we cache and reuse it for all rows in a single result set returned by Postgres?

tanner0101 · 2019-03-12T18:14:29Z

Postgres sends back one RowDescription per query. That's why I made PostgresRow.LookupTable a class type. That way we can share it between rows without worrying about copying.

So we should be able to create a lookup dictionary once (we can even do it lazily), then re-use for all rows. That should be an even bigger performance win than is possible in the current version.

MrMage · 2019-03-12T18:23:12Z

Sounds great! Is there any possibility of backporting the re-use of the column description to Fluent 3 as well?

…

On 12. Mar 2019, at 19:14, Tanner ***@***.***> wrote: Postgres sends back one RowDescription per query. That's why I made PostgresRow.LookupTable a class type. That way we can share it between rows without worrying about copying. So we should be able to create a lookup dictionary once (we can even do it lazily), then re-use for all rows. That should be an even bigger performance win than is possible in the current version. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tanner0101 · 2019-03-12T18:55:24Z

I'm not sure, it might be.

Is this PR ready to merge? I've reverted #117 so the 1 branch should be good to go.

MrMage · 2019-03-12T19:03:05Z

This PR is ready to merge; I had just been waiting for better benchmark infrastructure to demonstrate the improvements with hard numbers (and possibly add a benchmark case for this first). It the benchmark infrastructure is already working, please let me know (a pointer to the instructions would be appreciated).

…

On 12. Mar 2019, at 19:55, Tanner ***@***.***> wrote: I'm not sure, it might be. Is this PR ready to merge? I've reverted #117 so the 1 branch should be good to go. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tanner0101 · 2019-03-12T20:53:57Z

I looked into adding the performance testing bot to 1 and it looks like it won't be possible without a significant amount of work. @testable is not usable with swift test -c release on Linux for some reason, and a lot of our tests use internal types. We'll need to find a way around this. But I don't want that to hold up this PR, so I think we should go ahead and merge if you all agree.

MrMage · 2019-03-12T22:27:29Z

I'm fine with merging now; however we still have two options left that we could discuss first: 1. Run benchmarks in Debug mode. This won't give us 100% accurate data, but trends and changes in algorithmic complexity (eg from quadraric to linear should become apparent). 2. Run benchmarks in Release mode, as part of a separate target. We should be able to create realistic testing scenarios without having to rely on internal APIs, after all.

…

On 12. Mar 2019, at 21:53, Tanner ***@***.***> wrote: I looked into adding the performance testing bot to 1 and it looks like it won't be possible without a significant amount of work. @testable is not usable with swift build -c release on Linux for some reason, and a lot of our tests use internal types. We'll need to find a way around this. But I don't want that to hold up this PR, so I think we should go ahead and merge if you all agree. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

tanner0101 · 2019-03-13T15:10:52Z

Run benchmarks in Release mode, as part of a separate target. We should be able to create realistic testing scenarios without having to rely on internal APIs, after all.

I like this option, but I'm not sure how we'd do it. Any use of @testable causes build errors when using swift test -c release, and AFAIK you must compile the entire package with each invocation of swift build or swift test.

I think the real solution here is to make @testable work in release mode on Linux.

Edit: Oh, looks like swift test -c release -Xswiftc -enable-testing may work. Let me dig in.

MrMage · 2019-03-13T15:21:34Z

I like this option, but I'm not sure how we'd do it. Any use of @testable causes build errors when using swift test -c release, and AFAIK you must compile the entire package with each invocation of swift build or swift test.

I think the real solution here is to make @testable work in release mode on Linux.

If meant creating "benchmarks" target with different tests that don't use any @testable APIs. How about that?

tanner0101 · 2019-03-13T15:22:38Z

@MrMage see my edit, I think swift test -c release -Xswiftc -enable-testing might work

tanner0101 · 2019-03-13T15:23:19Z

Our bot is built on XCTest.measure so it would be a significant amount of work to start doing testing in executable targets.

tanner0101 · 2019-03-13T15:26:25Z

Merging this into a local branch so that I can merge latest changes.

penny-coin · 2019-03-13T15:26:31Z

Hey @MrMage, you just merged a pull request, have a coin!

You now have 13 coins.

MrMage · 2019-03-13T15:35:20Z

Our bot is built on XCTest.measure so it would be a significant amount of work to start doing testing in executable targets.

But couldn't we use XCTest.measure without @testable? I.e. have an extra "benchmarks" test target with custom benchmark code that does not use @testable? (Trying -Xswiftc -enable-testing is also an option, of course, but I feel like we should be able to get by without @testable, anyway.

* Significantly improve performance of row decoding. Motivation: 1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time). 2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search. Effect: The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms. * Whitespace fixes. * Comment fix. * More whitespace, sorry. * Implement `decodeIfPresent` to avoid two dictionary lookups per call. * Minor code simplification.

MrMage added 4 commits March 7, 2019 19:33

Whitespace fixes.

616c5da

Comment fix.

fa6229c

More whitespace, sorry.

ea2e31b

Implement decodeIfPresent to avoid two dictionary lookups per call.

466968c

Minor code simplification.

11fe423

calebkleveter added the enhancement New feature or request label Mar 8, 2019

tanner0101 approved these changes Mar 11, 2019

View reviewed changes

tanner0101 changed the base branch from 1 to mrmage-decode-performance March 13, 2019 15:25

tanner0101 merged commit 345da4f into vapor:mrmage-decode-performance Mar 13, 2019

tanner0101 mentioned this pull request Mar 13, 2019

Significantly improve performance of row decoding. (#130) #134

Merged

Uh oh!

Significantly improve performance of row decoding. #130

Significantly improve performance of row decoding. #130

Uh oh!

Conversation

MrMage commented Mar 7, 2019

Uh oh!

jdmcd commented Mar 7, 2019

Uh oh!

MrMage commented Mar 7, 2019

Uh oh!

jdmcd commented Mar 7, 2019

Uh oh!

twof commented Mar 7, 2019

Uh oh!

MrMage commented Mar 8, 2019

Uh oh!

MrMage commented Mar 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twof commented Mar 8, 2019

Uh oh!

MrMage commented Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanner0101 left a comment

Choose a reason for hiding this comment

Uh oh!

tanner0101 commented Mar 11, 2019

Uh oh!

MrMage commented Mar 12, 2019

Uh oh!

tanner0101 commented Mar 12, 2019

Uh oh!

MrMage commented Mar 12, 2019 via email

Uh oh!

tanner0101 commented Mar 12, 2019

Uh oh!

MrMage commented Mar 12, 2019 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanner0101 commented Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrMage commented Mar 12, 2019 via email

Uh oh!

tanner0101 commented Mar 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrMage commented Mar 13, 2019

Uh oh!

tanner0101 commented Mar 13, 2019

Uh oh!

tanner0101 commented Mar 13, 2019

Uh oh!

tanner0101 commented Mar 13, 2019

Uh oh!

penny-coin commented Mar 13, 2019

Uh oh!

MrMage commented Mar 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

MrMage commented Mar 8, 2019 •

edited

Loading

MrMage commented Mar 11, 2019 •

edited

Loading

MrMage commented Mar 12, 2019 via email •

edited

Loading

tanner0101 commented Mar 12, 2019 •

edited

Loading

tanner0101 commented Mar 13, 2019 •

edited

Loading