-
-
Notifications
You must be signed in to change notification settings - Fork 69
Significantly improve performance of row decoding. #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly improve performance of row decoding. #130
Conversation
Motivation: 1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time). 2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search. Effect: The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.
|
Can this be applied to MySQL as well or is it Postgres specific? |
|
@mcdappdev looks like https://github.com/vapor/mysql/blob/master/Sources/MySQL/Codable/MySQLRowDecoder.swift is doing a linear search as well ( |
|
Cool... @tanner0101 thoughts? |
|
@MrMage Are you able to include the performance tests you're running? Great work btw! |
My tests were run with production code and data, so I'm afraid I can't share them. Also, I'm not sure whether Fluent has a good way to observe and record performance improvements/regressions, anyway. (There is https://github.com/vapor/fluent/tree/3/Sources/FluentBenchmark/Benchmarks, but I don't see how changes in these benchmarks' performance would be exposed and recorded.) |
|
Added another micro-optimization (implementing |
|
@MrMage Working on something right now to track performance metrics Hoping to have the bot redeployed tomorrow and then integrated with this repo so we can get you some charts :) |
|
@twof sounds good; let me know how to use that bot when its done (and where to add tests to showcase the performance gains; they would be most apparent for DBs with many tables and/or tables with many columns). Also, I assume this provides no way of measuring encode and decode time separately, given that both are being done in the same test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look great, thanks @MrMage!
We should add some performance tests for specifically this use case. Most of them end up being against dummy models with very few columns which gives unrealistic results. Generally, this should be less of a problem in Fluent 4 since we will no longer be decoding all model properties by default. Properties will be decoded lazily on an as needed basis. (i.e., https://github.com/vapor/nio-postgres/blob/master/Sources/NIOPostgres/Data/PostgresRow.swift#L16 |
Agreed. I would propose a model with five fields each of types
We could probably add a "column name -> [(tableOID: Int, fieldIndex: Int)]" lookup dictionary to |
|
Postgres sends back one So we should be able to create a lookup dictionary once (we can even do it lazily), then re-use for all rows. That should be an even bigger performance win than is possible in the current version. |
|
Sounds great! Is there any possibility of backporting the re-use of the column description to Fluent 3 as well?
… On 12. Mar 2019, at 19:14, Tanner ***@***.***> wrote:
Postgres sends back one RowDescription per query. That's why I made PostgresRow.LookupTable a class type. That way we can share it between rows without worrying about copying.
So we should be able to create a lookup dictionary once (we can even do it lazily), then re-use for all rows. That should be an even bigger performance win than is possible in the current version.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I'm not sure, it might be. Is this PR ready to merge? I've reverted #117 so the |
|
This PR is ready to merge; I had just been waiting for better benchmark infrastructure to demonstrate the improvements with hard numbers (and possibly add a benchmark case for this first). It the benchmark infrastructure is already working, please let me know (a pointer to the instructions would be appreciated).
… On 12. Mar 2019, at 19:55, Tanner ***@***.***> wrote:
I'm not sure, it might be.
Is this PR ready to merge? I've reverted #117 so the 1 branch should be good to go.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
I looked into adding the performance testing bot to |
|
I'm fine with merging now; however we still have two options left that we could discuss first:
1. Run benchmarks in Debug mode. This won't give us 100% accurate data, but trends and changes in algorithmic complexity (eg from quadraric to linear should become apparent).
2. Run benchmarks in Release mode, as part of a separate target. We should be able to create realistic testing scenarios without having to rely on internal APIs, after all.
… On 12. Mar 2019, at 21:53, Tanner ***@***.***> wrote:
I looked into adding the performance testing bot to 1 and it looks like it won't be possible without a significant amount of work. @testable is not usable with swift build -c release on Linux for some reason, and a lot of our tests use internal types. We'll need to find a way around this. But I don't want that to hold up this PR, so I think we should go ahead and merge if you all agree.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I like this option, but I'm not sure how we'd do it. Any use of I think the real solution here is to make Edit: Oh, looks like |
If meant creating "benchmarks" target with different tests that don't use any |
|
@MrMage see my edit, I think |
|
Our bot is built on |
|
Merging this into a local branch so that I can merge latest changes. |
|
Hey @MrMage, you just merged a pull request, have a coin! You now have 13 coins. |
But couldn't we use |
* Significantly improve performance of row decoding. Motivation: 1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time). 2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search. Effect: The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms. * Whitespace fixes. * Comment fix. * More whitespace, sorry. * Implement `decodeIfPresent` to avoid two dictionary lookups per call. * Minor code simplification.
Motivation:
PostgreSQLRowDecoder:_KeyedDecodingContainer.decodeis called once for each key on the decoded object. With the current implementation, this performs a linear search on therowdictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups iftableOID != 0, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time).PostgreSQLConnection.TableNameCache: Most lookups aretableName -> OID. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search.Effect:
The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.