Skip to content

Conversation

@MrMage
Copy link
Contributor

@MrMage MrMage commented Mar 7, 2019

Motivation:

  1. PostgreSQLRowDecoder: _KeyedDecodingContainer.decode is called once for each key on the decoded object. With the current implementation, this performs a linear search on the row dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if tableOID != 0, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time).
  2. PostgreSQLConnection.TableNameCache: Most lookups are tableName -> OID. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search.

Effect:

The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.

MrMage added 4 commits March 7, 2019 19:33
Motivation:

1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time).
2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search.

Effect:

The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.
@jdmcd
Copy link
Member

jdmcd commented Mar 7, 2019

Can this be applied to MySQL as well or is it Postgres specific?

@MrMage
Copy link
Contributor Author

MrMage commented Mar 7, 2019

@mcdappdev looks like https://github.com/vapor/mysql/blob/master/Sources/MySQL/Codable/MySQLRowDecoder.swift is doing a linear search as well (firstValue(forColumn:), so Optimization 1. Not sure if a table name cache is being used in the MySQL driver, so Optimization 2 might not apply.

@jdmcd
Copy link
Member

jdmcd commented Mar 7, 2019

Cool... @tanner0101 thoughts?

@twof
Copy link
Member

twof commented Mar 7, 2019

@MrMage Are you able to include the performance tests you're running?

Great work btw!

@MrMage
Copy link
Contributor Author

MrMage commented Mar 8, 2019

@MrMage Are you able to include the performance tests you're running?

My tests were run with production code and data, so I'm afraid I can't share them. Also, I'm not sure whether Fluent has a good way to observe and record performance improvements/regressions, anyway. (There is https://github.com/vapor/fluent/tree/3/Sources/FluentBenchmark/Benchmarks, but I don't see how changes in these benchmarks' performance would be exposed and recorded.)

@MrMage
Copy link
Contributor Author

MrMage commented Mar 8, 2019

Added another micro-optimization (implementing decodeIfPresent to avoid two dictionary lookups), gaining another 15ms (i.e. ~8% in my case). Further gains might be had from overriding all the methods in KeyedDecodingContainerProtocol for the types which have specializations (Bool, String, Double, Int, etc.), provided PostgreSQLDataDecoder.decoded is being specialized in the same fashion.

@twof
Copy link
Member

twof commented Mar 8, 2019

@MrMage Working on something right now to track performance metrics
https://github.com/vapor/bot-github
vapor/routing-kit#66 (comment)

Hoping to have the bot redeployed tomorrow and then integrated with this repo so we can get you some charts :)

@calebkleveter calebkleveter added the enhancement New feature or request label Mar 8, 2019
@MrMage
Copy link
Contributor Author

MrMage commented Mar 11, 2019

@twof sounds good; let me know how to use that bot when its done (and where to add tests to showcase the performance gains; they would be most apparent for DBs with many tables and/or tables with many columns). Also, I assume this provides no way of measuring encode and decode time separately, given that both are being done in the same test?

Copy link
Member

@tanner0101 tanner0101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great, thanks @MrMage!

@tanner0101
Copy link
Member

they would be most apparent for DBs with many tables and/or tables with many columns

We should add some performance tests for specifically this use case. Most of them end up being against dummy models with very few columns which gives unrealistic results.

Generally, this should be less of a problem in Fluent 4 since we will no longer be decoding all model properties by default. Properties will be decoded lazily on an as needed basis. (i.e., user.name.get()). However, it might be worth optimizing PostgresRow's value fetch method. Maybe we can use a dictionary here? Here's the relevant code in NIOPostgres which we will be using for PostgresKit 2.0:

https://github.com/vapor/nio-postgres/blob/master/Sources/NIOPostgres/Data/PostgresRow.swift#L16

@MrMage
Copy link
Contributor Author

MrMage commented Mar 12, 2019

We should add some performance tests for specifically this use case. Most of them end up being against dummy models with very few columns which gives unrealistic results.

Agreed. I would propose a model with five fields each of types String, String?, Int, Int?, Date, Date?, Data, Data?. That should provide us a good baseline to optimize in the future, as well as let us catch future regressions.

Generally, this should be less of a problem in Fluent 4 since we will no longer be decoding all model properties by default. Properties will be decoded lazily on an as needed basis. (i.e., user.name.get()). However, it might be worth optimizing PostgresRow's value fetch method. Maybe we can use a dictionary here? Here's the relevant code in NIOPostgres which we will be using for PostgresKit 2.0:

https://github.com/vapor/nio-postgres/blob/master/Sources/NIOPostgres/Data/PostgresRow.swift#L16

We could probably add a "column name -> [(tableOID: Int, fieldIndex: Int)]" lookup dictionary to PostgresRow.LookupTable or PostgresMessage.RowDescription. Is PostgresMessage.RowDescription different for each row, or could we cache and reuse it for all rows in a single result set returned by Postgres?

@tanner0101
Copy link
Member

Postgres sends back one RowDescription per query. That's why I made PostgresRow.LookupTable a class type. That way we can share it between rows without worrying about copying.

So we should be able to create a lookup dictionary once (we can even do it lazily), then re-use for all rows. That should be an even bigger performance win than is possible in the current version.

@MrMage
Copy link
Contributor Author

MrMage commented Mar 12, 2019 via email

@tanner0101
Copy link
Member

I'm not sure, it might be.

Is this PR ready to merge? I've reverted #117 so the 1 branch should be good to go.

@MrMage
Copy link
Contributor Author

MrMage commented Mar 12, 2019 via email

@tanner0101
Copy link
Member

tanner0101 commented Mar 12, 2019

I looked into adding the performance testing bot to 1 and it looks like it won't be possible without a significant amount of work. @testable is not usable with swift test -c release on Linux for some reason, and a lot of our tests use internal types. We'll need to find a way around this. But I don't want that to hold up this PR, so I think we should go ahead and merge if you all agree.

@MrMage
Copy link
Contributor Author

MrMage commented Mar 12, 2019 via email

@tanner0101
Copy link
Member

tanner0101 commented Mar 13, 2019

Run benchmarks in Release mode, as part of a separate target. We should be able to create realistic testing scenarios without having to rely on internal APIs, after all.

I like this option, but I'm not sure how we'd do it. Any use of @testable causes build errors when using swift test -c release, and AFAIK you must compile the entire package with each invocation of swift build or swift test.

I think the real solution here is to make @testable work in release mode on Linux.

Edit: Oh, looks like swift test -c release -Xswiftc -enable-testing may work. Let me dig in.

@MrMage
Copy link
Contributor Author

MrMage commented Mar 13, 2019

I like this option, but I'm not sure how we'd do it. Any use of @testable causes build errors when using swift test -c release, and AFAIK you must compile the entire package with each invocation of swift build or swift test.

I think the real solution here is to make @testable work in release mode on Linux.

If meant creating "benchmarks" target with different tests that don't use any @testable APIs. How about that?

@tanner0101
Copy link
Member

@MrMage see my edit, I think swift test -c release -Xswiftc -enable-testing might work

@tanner0101
Copy link
Member

Our bot is built on XCTest.measure so it would be a significant amount of work to start doing testing in executable targets.

@tanner0101 tanner0101 changed the base branch from 1 to mrmage-decode-performance March 13, 2019 15:25
@tanner0101
Copy link
Member

Merging this into a local branch so that I can merge latest changes.

@tanner0101 tanner0101 merged commit 345da4f into vapor:mrmage-decode-performance Mar 13, 2019
@penny-coin
Copy link

Hey @MrMage, you just merged a pull request, have a coin!

You now have 13 coins.

@MrMage
Copy link
Contributor Author

MrMage commented Mar 13, 2019

Our bot is built on XCTest.measure so it would be a significant amount of work to start doing testing in executable targets.

But couldn't we use XCTest.measure without @testable? I.e. have an extra "benchmarks" test target with custom benchmark code that does not use @testable? (Trying -Xswiftc -enable-testing is also an option, of course, but I feel like we should be able to get by without @testable, anyway.

tanner0101 added a commit that referenced this pull request Mar 14, 2019
* Significantly improve performance of row decoding.

Motivation:

1. `PostgreSQLRowDecoder`: `_KeyedDecodingContainer.decode` is called once for each key on the decoded object. With the current implementation, this performs a linear search on the `row` dictionary, which results in a total runtime (per object) of O(keys * columns), i.e. quadratic runtime ~O(keys^2). We replace the current linear search with one or two dictionary lookups if `tableOID != 0`, resulting in linear runtime (per object) in the number of keys (provided dictionary lookups can be assumed to take roughly constant time).
2. `PostgreSQLConnection.TableNameCache`: Most lookups are `tableName -> OID`. We accelerate that lookup by preparing a dictionary for that kind of lookup ahead of time, again replacing linear search.

Effect:

The time required for decoding ~5k objects with 9 fields each drops from ~0.4s on a Core i7-6700k (Release build) to ~0.2s, effectively doubling throughput. Optimization 1 contributes ~130 ms, Optimization 2 contributes ~70ms.

* Whitespace fixes.

* Comment fix.

* More whitespace, sorry.

* Implement `decodeIfPresent` to avoid two dictionary lookups per call.

* Minor code simplification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants