Marshall Pierce
2018-10-18 22:10:45 UTC
I'm using jOOQ (3.11.5) with PostgreSQL, and I've managed to find a
situation where jOOQ's overhead on top of JDBC seems to be rather
significant. tl;dr it was easy enough to simply avoid using the data that
required the problematic query, so I didn't dive too deeply into it, but
I'm curious for future reference if I'm doing something egregiously wrong,
so here goes anyway.
The query in question is rather fat: it joins almost 30 tables together
assembling a complex entity that has a number of one-to-many collections.
All those joins produce not only a large query, but also a large number of
result rows for each top-level entity (could be hundreds), and each row is
pretty wide (~100 columns). (I'm not carefully selecting only a few fields
because the point of this query is to load the complete representation of
the entity, so I really do need almost column in each table.)
In a coarse synthetic benchmark of preparing some test data and loading it
repeatedly, a combination of p6spy for jdbc timing, basic logging, and
IntelliJ's handy async profiler yields the following characteristics:
- just executing the query takes JDBC about 900ms
- Creating a `Result` (via `fetch()`) takes about 3000ms -- this includes
the above query time
- Turning the `Result` into the `List<Foo>` I want takes another 1300.
The profiler indicates that creating the Result is almost entirely spent in
`CursorRecordInitializer#setValue()`, and that is almost 60% setting
`OffsetDateTimes`. (That's not jOOQ's fault -- that parser in the JDK is
just slow. I actually want `Instant`, so I wonder if parsing those could be
any faster?)
When transforming the `Result` into the type I want, that time is almost
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
Anyway, while there are clearly some optimizations I could perform (rely on
sorting to eliminate some grouping, for instance), I'm wondering if there's
anything obviously wrong I'm doing, or do I simply need to slice the query
up into different pieces so the combinatorial explosion of joined rows
isn't so egregious?
Thanks,
Marshall
situation where jOOQ's overhead on top of JDBC seems to be rather
significant. tl;dr it was easy enough to simply avoid using the data that
required the problematic query, so I didn't dive too deeply into it, but
I'm curious for future reference if I'm doing something egregiously wrong,
so here goes anyway.
The query in question is rather fat: it joins almost 30 tables together
assembling a complex entity that has a number of one-to-many collections.
All those joins produce not only a large query, but also a large number of
result rows for each top-level entity (could be hundreds), and each row is
pretty wide (~100 columns). (I'm not carefully selecting only a few fields
because the point of this query is to load the complete representation of
the entity, so I really do need almost column in each table.)
In a coarse synthetic benchmark of preparing some test data and loading it
repeatedly, a combination of p6spy for jdbc timing, basic logging, and
IntelliJ's handy async profiler yields the following characteristics:
- just executing the query takes JDBC about 900ms
- Creating a `Result` (via `fetch()`) takes about 3000ms -- this includes
the above query time
- Turning the `Result` into the `List<Foo>` I want takes another 1300.
The profiler indicates that creating the Result is almost entirely spent in
`CursorRecordInitializer#setValue()`, and that is almost 60% setting
`OffsetDateTimes`. (That's not jOOQ's fault -- that parser in the JDK is
just slow. I actually want `Instant`, so I wonder if parsing those could be
any faster?)
When transforming the `Result` into the type I want, that time is almost
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
Anyway, while there are clearly some optimizations I could perform (rely on
sorting to eliminate some grouping, for instance), I'm wondering if there's
anything obviously wrong I'm doing, or do I simply need to slice the query
up into different pieces so the combinatorial explosion of joined rows
isn't so egregious?
Thanks,
Marshall
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.