Discussion:
jOOQ Result query overhead
Marshall Pierce
2018-10-18 22:10:45 UTC
Permalink
I'm using jOOQ (3.11.5) with PostgreSQL, and I've managed to find a
situation where jOOQ's overhead on top of JDBC seems to be rather
significant. tl;dr it was easy enough to simply avoid using the data that
required the problematic query, so I didn't dive too deeply into it, but
I'm curious for future reference if I'm doing something egregiously wrong,
so here goes anyway.

The query in question is rather fat: it joins almost 30 tables together
assembling a complex entity that has a number of one-to-many collections.
All those joins produce not only a large query, but also a large number of
result rows for each top-level entity (could be hundreds), and each row is
pretty wide (~100 columns). (I'm not carefully selecting only a few fields
because the point of this query is to load the complete representation of
the entity, so I really do need almost column in each table.)

In a coarse synthetic benchmark of preparing some test data and loading it
repeatedly, a combination of p6spy for jdbc timing, basic logging, and
IntelliJ's handy async profiler yields the following characteristics:

- just executing the query takes JDBC about 900ms
- Creating a `Result` (via `fetch()`) takes about 3000ms -- this includes
the above query time
- Turning the `Result` into the `List<Foo>` I want takes another 1300.

The profiler indicates that creating the Result is almost entirely spent in
`CursorRecordInitializer#setValue()`, and that is almost 60% setting
`OffsetDateTimes`. (That's not jOOQ's fault -- that parser in the JDK is
just slow. I actually want `Instant`, so I wonder if parsing those could be
any faster?)

When transforming the `Result` into the type I want, that time is almost
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?

Anyway, while there are clearly some optimizations I could perform (rely on
sorting to eliminate some grouping, for instance), I'm wondering if there's
anything obviously wrong I'm doing, or do I simply need to slice the query
up into different pieces so the combinatorial explosion of joined rows
isn't so egregious?

Thanks,
Marshall
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-19 08:47:05 UTC
Permalink
Hi Marshall,

Thank you very much for your detailed report.

Sounds like you're loading tons of rows if the last step takes so long
(turning the Result into the List<Foo>). In that case, it might be
generally useful to use fetchLazy() instead of fetch(), because that will
not materialise all the rows in the client before further processing them.
Also, you might want to specify a ResultQuery.fetchSize(), which translates
to JDBC's Statement.setFetchSize(). This may help with both JDBC and jOOQ.
Some info about lazy fetching here:
https://www.jooq.org/doc/latest/manual/sql-execution/fetching/lazy-fetching

Regarding your individual findings
Post by Marshall Pierce
The profiler indicates that creating the Result is almost entirely spent
in `CursorRecordInitializer#setValue()`, and that is almost 60% setting
`OffsetDateTimes`. (That's not jOOQ's fault -- that parser in the JDK is
just slow. I actually want `Instant`, so I wonder if parsing those could be
any faster?)
Oh wow, that's an interesting catch! I should check if the PostgreSQL JDBC
driver finally implemented native support for JSR 310 types, in case of
which we shouldn't pass through an intermediate String representation on
the client side.

I can't tell (yet), I'd have to see the relevant parts of your query.
Probably, however, you could get best results by using a custom data type
binding for Instant:
https://www.jooq.org/doc/latest/manual/sql-building/queryparts/custom-bindings

I do wonder, however, if jOOQ should support the Instant type out of the
box. The mapping is straightforward, and we could bypass that parsing that
you've measured:
https://github.com/jOOQ/jOOQ/issues/7952
Post by Marshall Pierce
When transforming the `Result` into the type I want, that time is almost
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
There are some caches, but they only work under specific conditions.
Perhaps your case could be cached as well... Could you provide an example
piece of code that resembles your real code, that helps reproduce this
particular part of the problem?
Post by Marshall Pierce
Anyway, while there are clearly some optimizations I could perform (rely
on sorting to eliminate some grouping, for instance), I'm wondering if
there's anything obviously wrong I'm doing, or do I simply need to slice
the query up into different pieces so the combinatorial explosion of joined
rows isn't so egregious?
Yes, that definitely helps. Mapping to-many relationships using joins is
not always the best solution. In particular, it can also be simply wrong.
Consider:

class Book {
List<Author> authors;
List<Category> categories;
}

If you now join both the authors to-many relationship and the categories
to-many relationship in one single query, you will get a cartesian product
between authors and categories, and that could lead to wrong results later
on, especially when you transitively join to-many relationships inside of
authors and/or categories.

The SQL solution here would be the MULTISET operator, which allows for
ad-hoc nesting of tables:
https://github.com/jOOQ/jOOQ/issues/3884

Unfortunately, it's not yet supported by jOOQ, nor by most databases (it
can be emulated with XML or JSON, though).

Another option which would probably be best in your case, and even bring
the query time down from 900ms to something more reasonable would be to run
more than one query, as you've mentioned. If done correctly, this will not
lead to N+1 problems, but maybe to 3-4 queries.

I hope this helps.
Lukas
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Marshall Pierce
2018-10-19 17:11:59 UTC
Permalink
Very helpful, thanks. Responses inline.
Post by Lukas Eder
Sounds like you're loading tons of rows if the last step takes so long
(turning the Result into the List<Foo>). In that case, it might be
generally useful to use fetchLazy() instead of fetch(), because that will
not materialise all the rows in the client before further processing them.
Also, you might want to specify a ResultQuery.fetchSize(), which translates
to JDBC's Statement.setFetchSize(). This may help with both JDBC and jOOQ.
https://www.jooq.org/doc/latest/manual/sql-execution/fetching/lazy-fetching
If I'm understanding correctly, lazy fetching would (if suitably used)
lower peak heap usage as well as allowing me to offload turning `Record`s
into tuples of `FooRecord`, `BarRecord`, etc to another thread, but
wouldn't help with the overhead of turning a JDBC `ResultSet` into a
`Record`. Correct?

Oh wow, that's an interesting catch! I should check if the PostgreSQL JDBC
Post by Lukas Eder
driver finally implemented native support for JSR 310 types, in case of
which we shouldn't pass through an intermediate String representation on
the client side.
Yes, they do:
https://jdbc.postgresql.org/documentation/head/java8-date-time.html, but
only for OffsetDateTime, not Instant. There's an open issue for Instant
(https://github.com/pgjdbc/pgjdbc/issues/833) but converting the resulting
ODT to an Instant is still way cheaper than parsing.

I can't tell (yet), I'd have to see the relevant parts of your query.
Post by Lukas Eder
Probably, however, you could get best results by using a custom data type
https://www.jooq.org/doc/latest/manual/sql-building/queryparts/custom-bindings
The timestamps are `created_at` columns sprinkled on pretty much every
table. Any specific query details you'd like to find out about? I've
cobbled one together (happy to put that up somewhere if it's helpful), and
it took the ~3000ms step down to ~1700ms. In other words, ResultSet ->
Record went from 2100ms to 800ms, so that's a decent win.

A few questions about that process:

- JDBC41ResultSet's getObject() implementations throw (and I need the (int,
Class) overload to let the PG JDBC driver get the ODT), and while
DefaultResultSet overrides them, CursorResultSet does not, so I had to add
those overloads. Well, really I only needed one overload, but I added both
for good measure. Was it just an oversight that CRS didn't already have
those overloads?
- In the forcedType `types` element, what should I be using for a Postgres
TIMESTAMP WITH TIME ZONE? After fiddling around with a few guesses, I found
that using "timestamp.*with.*time.*zone", the forcedType matches and I get
the desired Instant fields, but "timestampwithtimezone" and "timestamp with
time zone" don't match and of course the fields end up as OffsetDateTime
again. Is there some mystery separator that's not a space between those
words?
- I was using DSL.currentOffsetDateTime() in a certain UPDATE statement.
Now that the fields are `Instant`s, that doesn't typecheck, so I'm using
`DSL.currentTimestamp().cast(SQLDataType.INSTANT)` (mimicking
currentOffsetDateTime()). This necessitated the creation of
SQLDataType.INSTANT as follows:
public static final DataType<Instant> INSTANT = new
DefaultDataType<Instant>(null, Instant.class, "timestamp with time zone");
Is this the right way to handle `... set foo = now()` with Instant columns?
Post by Lukas Eder
I do wonder, however, if jOOQ should support the Instant type out of the
box. The mapping is straightforward, and we could bypass that parsing that
https://github.com/jOOQ/jOOQ/issues/7952
Please do! :) I think I've already implemented a (sloppy) version of at
least part of that to get the above prototype working.
Post by Lukas Eder
When transforming the `Result` into the type I want, that time is almost
Post by Marshall Pierce
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
There are some caches, but they only work under specific conditions.
Perhaps your case could be cached as well... Could you provide an example
piece of code that resembles your real code, that helps reproduce this
particular part of the problem?
Sure, I'm basically doing this to get the Result<Record>:
txnContext.select().from(WIDGETS).join(FLAVORS).on(trivial key
match).join(COLORS).on(...).leftOuterJoin(THINGS).on(..).join(PARTS_OF_THINGS).fetch().

I then map each Record into a simple tuple class:
WidgetTuple(record.into(WIGETS), record.into(FLAVORS), record.into(COLORS),
...). Some of the Tables I'm passing to `.into()` are table aliases
(because the same table is joined into a few different ways). If that's not
enough detail we can look at getting you access to the code or I can make a
synthetic repro.

Once the tuples are created, it's then easy to traverse across the
List<WidgetTuple>, grouping, filtering, etc as needed, and according to the
profiler, that logic takes very very little time.
Post by Lukas Eder
Post by Marshall Pierce
Anyway, while there are clearly some optimizations I could perform (rely
on sorting to eliminate some grouping, for instance), I'm wondering if
there's anything obviously wrong I'm doing, or do I simply need to slice
the query up into different pieces so the combinatorial explosion of joined
rows isn't so egregious?
Yes, that definitely helps. Mapping to-many relationships using joins is
not always the best solution. In particular, it can also be simply wrong.
class Book {
List<Author> authors;
List<Category> categories;
}
If you now join both the authors to-many relationship and the categories
to-many relationship in one single query, you will get a cartesian product
between authors and categories, and that could lead to wrong results later
on, especially when you transitively join to-many relationships inside of
authors and/or categories.
I *think* I'm cleaning up that mess after the fact by (in this example)
filtering my tuples for ones where the contained AuthorsRecord.id != null,
grouping by that id, and arbitrarily choosing the first of those tuples to
be the one to use when instantiating the actual Author type (not a
codegen'd but rather the one that's exposed to the rest of the system). I'm
open to better ways though!
Post by Lukas Eder
The SQL solution here would be the MULTISET operator, which allows for
https://github.com/jOOQ/jOOQ/issues/3884
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FjOOQ%2FjOOQ%2Fissues%2F3884&sa=D&sntz=1&usg=AFQjCNEP1Yv0xPSqiPIZUzqhHYm3NhnhQw>
Unfortunately, it's not yet supported by jOOQ, nor by most databases (it
can be emulated with XML or JSON, though).
Another option which would probably be best in your case, and even bring
the query time down from 900ms to something more reasonable would be to run
more than one query, as you've mentioned. If done correctly, this will not
lead to N+1 problems, but maybe to 3-4 queries.
OK, that's what I suspected -- basically avoiding the "massive number of
rows via cartesian product" issue the brute force way. Is there some clever
way to parallelize the subsequent queries that won't force me to open
separate transactions?
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-20 14:36:13 UTC
Permalink
Post by Marshall Pierce
Very helpful, thanks. Responses inline.
Post by Lukas Eder
Sounds like you're loading tons of rows if the last step takes so long
(turning the Result into the List<Foo>). In that case, it might be
generally useful to use fetchLazy() instead of fetch(), because that will
not materialise all the rows in the client before further processing them.
Also, you might want to specify a ResultQuery.fetchSize(), which translates
to JDBC's Statement.setFetchSize(). This may help with both JDBC and jOOQ.
https://www.jooq.org/doc/latest/manual/sql-execution/fetching/lazy-fetching
If I'm understanding correctly, lazy fetching would (if suitably used)
lower peak heap usage
Yes. In fact, chances are many intermediary objects could be stack
allocated and if not, at least don't make it to old gen space, as they
would be quickly discarded again.
Post by Marshall Pierce
as well as allowing me to offload turning `Record`s into tuples of
`FooRecord`, `BarRecord`, etc to another thread,
You could do that, but it shouldn't be necessary. I'm sure these 3 seconds
can be brought down to less than 50ms, if the fetching is improved and the
queries don't produce as much duplicate data from the joins.
Post by Marshall Pierce
but wouldn't help with the overhead of turning a JDBC `ResultSet` into a
`Record`. Correct?
No. There is a pending feature request (and related changes) to allow for
bypassing the RecordListener SPI, which would allow for skipping some of
the mapping that is unnecessary in your case:
https://github.com/jOOQ/jOOQ/issues/6544

It's non trivial to implement but it would definitely be worth it in cases
like yours.

I can't tell (yet), I'd have to see the relevant parts of your query.
Post by Marshall Pierce
Post by Lukas Eder
Probably, however, you could get best results by using a custom data type
https://www.jooq.org/doc/latest/manual/sql-building/queryparts/custom-bindings
The timestamps are `created_at` columns sprinkled on pretty much every
table. Any specific query details you'd like to find out about?
No, I'll get back to you if I cannot reproduce this in a trivial case,
which should probably be possible. Just need to generate enough data, e.g.
using generate_series()
Post by Marshall Pierce
I've cobbled one together (happy to put that up somewhere if it's
helpful), and it took the ~3000ms step down to ~1700ms. In other words,
ResultSet -> Record went from 2100ms to 800ms, so that's a decent win.
Great start!
Post by Marshall Pierce
- JDBC41ResultSet's getObject() implementations throw (and I need the
(int, Class) overload to let the PG JDBC driver get the ODT), and while
DefaultResultSet overrides them, CursorResultSet does not, so I had to add
those overloads. Well, really I only needed one overload, but I added both
for good measure. Was it just an oversight that CRS didn't already have
those overloads?
That's a nice catch, as well! Definitely an oversight. Will fix ASAP:
https://github.com/jOOQ/jOOQ/issues/7959
Post by Marshall Pierce
- In the forcedType `types` element, what should I be using for a Postgres
TIMESTAMP WITH TIME ZONE? After fiddling around with a few guesses, I found
that using "timestamp.*with.*time.*zone", the forcedType matches and I get
the desired Instant fields, but "timestampwithtimezone" and "timestamp with
time zone" don't match and of course the fields end up as OffsetDateTime
again. Is there some mystery separator that's not a space between those
words?
Yeah, the default regular expression flags (<regexFlags>) is having
Pattern.COMMENTS turned on. This allows for adding comments to more complex
regexes, as well as formatting them as this flag ignores whitespace in the
regex. You will have to escape it: timestamp\ with\ time\ zone. See, e.g.:
https://www.jooq.org/doc/latest/manual/code-generation/codegen-advanced/codegen-config-database/codegen-database-regex-flags

- I was using DSL.currentOffsetDateTime() in a certain UPDATE statement.
Post by Marshall Pierce
Now that the fields are `Instant`s, that doesn't typecheck, so I'm using
`DSL.currentTimestamp().cast(SQLDataType.INSTANT)` (mimicking
currentOffsetDateTime()). This necessitated the creation of
public static final DataType<Instant> INSTANT = new
DefaultDataType<Instant>(null, Instant.class, "timestamp with time zone");
Is this the right way to handle `... set foo = now()` with Instant columns?
Well, you're using internal API (DefaultDataType). It may work in
unexpected way. Much better to create a new data type from
SQLDataType.TIMESTAMPWITHTIMEZONE by calling asConvertedDataType(new
YourBinding()).

That data type is then reusable. Also, the cast is probably not what you
want. It may result in an actual SQL CAST(...) expression. Field.coerce()
is one way to switch data types on an expression. Or, you can just use
plain SQL templating to re-create your now() function as an Instant data
type:
https://www.jooq.org/doc/latest/manual/sql-building/plain-sql-templating
Post by Marshall Pierce
When transforming the `Result` into the type I want, that time is almost
Post by Lukas Eder
Post by Marshall Pierce
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
There are some caches, but they only work under specific conditions.
Perhaps your case could be cached as well... Could you provide an example
piece of code that resembles your real code, that helps reproduce this
particular part of the problem?
txnContext.select().from(WIDGETS).join(FLAVORS).on(trivial key
match).join(COLORS).on(...).leftOuterJoin(THINGS).on(..).join(PARTS_OF_THINGS).fetch().
WidgetTuple(record.into(WIGETS), record.into(FLAVORS), record.into(COLORS),
...). Some of the Tables I'm passing to `.into()` are table aliases
(because the same table is joined into a few different ways). If that's not
enough detail we can look at getting you access to the code or I can make a
synthetic repro.
Once the tuples are created, it's then easy to traverse across the
List<WidgetTuple>, grouping, filtering, etc as needed, and according to the
profiler, that logic takes very very little time.
I'll look into this, thanks. Specifically, this indexOf() call, which
you've noticed as well, seems like it should be cached:
https://github.com/jOOQ/jOOQ/blob/version-3.11.5/jOOQ/src/main/java/org/jooq/impl/AbstractRecord.java#L750

The question is: Where should that cache be as you're looping over your
records in your own code. I could see some per-loop caching in
ResultImpl.into(Table)

Anyway, while there are clearly some optimizations I could perform (rely on
Post by Marshall Pierce
Post by Lukas Eder
Post by Marshall Pierce
sorting to eliminate some grouping, for instance), I'm wondering if there's
anything obviously wrong I'm doing, or do I simply need to slice the query
up into different pieces so the combinatorial explosion of joined rows
isn't so egregious?
Yes, that definitely helps. Mapping to-many relationships using joins is
not always the best solution. In particular, it can also be simply wrong.
class Book {
List<Author> authors;
List<Category> categories;
}
If you now join both the authors to-many relationship and the categories
to-many relationship in one single query, you will get a cartesian product
between authors and categories, and that could lead to wrong results later
on, especially when you transitively join to-many relationships inside of
authors and/or categories.
I *think* I'm cleaning up that mess after the fact by (in this example)
filtering my tuples for ones where the contained AuthorsRecord.id != null,
grouping by that id, and arbitrarily choosing the first of those tuples to
be the one to use when instantiating the actual Author type (not a
codegen'd but rather the one that's exposed to the rest of the system). I'm
open to better ways though!
Well, there would need to be one query per branch, e.g.

- 1 query fetching all the books
- 1 query fetching all the authors for these books (preferably using an IN
(SELECT ..) predicate, not an IN (?, ?, ? ..) predicate)
- 1 query fetching all the categories for these books (preferably... see
above)

The first two could be combined with a join, but then you'd again
duplicated books. There's no perfect solution here. ORMs like JPA do
similar things to what I'm suggesting behind the scenes and let you
override their defaults (e.g. JPQL with its JOIN FETCH feature)

The SQL solution here would be the MULTISET operator, which allows for
Post by Marshall Pierce
Post by Lukas Eder
https://github.com/jOOQ/jOOQ/issues/3884
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FjOOQ%2FjOOQ%2Fissues%2F3884&sa=D&sntz=1&usg=AFQjCNEP1Yv0xPSqiPIZUzqhHYm3NhnhQw>
Unfortunately, it's not yet supported by jOOQ, nor by most databases (it
can be emulated with XML or JSON, though).
Another option which would probably be best in your case, and even bring
the query time down from 900ms to something more reasonable would be to run
more than one query, as you've mentioned. If done correctly, this will not
lead to N+1 problems, but maybe to 3-4 queries.
OK, that's what I suspected -- basically avoiding the "massive number of
rows via cartesian product" issue the brute force way. Is there some clever
way to parallelize the subsequent queries that won't force me to open
separate transactions?
I don't see any such way using standard JDBC based transactions, but I
think the sequential queries will be much faster than the single query with
duplication. If you're using a database that can run queries asynchronously
(e.g. PostgreSQL), you could try your luck with that. It should be possible
to share a transaction among client threads and run several statements in
parallel (they might still be sequenced by the database) - I have no
experience with that yet.
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-20 14:37:38 UTC
Permalink
... of course, I'm assuming that you really *do* need all that data in the
client. I haven't asked yet, but what are you doing with it? Exporting it?
Or some aggregation?
Post by Lukas Eder
Post by Marshall Pierce
Very helpful, thanks. Responses inline.
Post by Lukas Eder
Sounds like you're loading tons of rows if the last step takes so long
(turning the Result into the List<Foo>). In that case, it might be
generally useful to use fetchLazy() instead of fetch(), because that will
not materialise all the rows in the client before further processing them.
Also, you might want to specify a ResultQuery.fetchSize(), which translates
to JDBC's Statement.setFetchSize(). This may help with both JDBC and jOOQ.
https://www.jooq.org/doc/latest/manual/sql-execution/fetching/lazy-fetching
If I'm understanding correctly, lazy fetching would (if suitably used)
lower peak heap usage
Yes. In fact, chances are many intermediary objects could be stack
allocated and if not, at least don't make it to old gen space, as they
would be quickly discarded again.
Post by Marshall Pierce
as well as allowing me to offload turning `Record`s into tuples of
`FooRecord`, `BarRecord`, etc to another thread,
You could do that, but it shouldn't be necessary. I'm sure these 3 seconds
can be brought down to less than 50ms, if the fetching is improved and the
queries don't produce as much duplicate data from the joins.
Post by Marshall Pierce
but wouldn't help with the overhead of turning a JDBC `ResultSet` into a
`Record`. Correct?
No. There is a pending feature request (and related changes) to allow for
bypassing the RecordListener SPI, which would allow for skipping some of
https://github.com/jOOQ/jOOQ/issues/6544
It's non trivial to implement but it would definitely be worth it in cases
like yours.
I can't tell (yet), I'd have to see the relevant parts of your query.
Post by Marshall Pierce
Post by Lukas Eder
Probably, however, you could get best results by using a custom data type
https://www.jooq.org/doc/latest/manual/sql-building/queryparts/custom-bindings
The timestamps are `created_at` columns sprinkled on pretty much every
table. Any specific query details you'd like to find out about?
No, I'll get back to you if I cannot reproduce this in a trivial case,
which should probably be possible. Just need to generate enough data, e.g.
using generate_series()
Post by Marshall Pierce
I've cobbled one together (happy to put that up somewhere if it's
helpful), and it took the ~3000ms step down to ~1700ms. In other words,
ResultSet -> Record went from 2100ms to 800ms, so that's a decent win.
Great start!
Post by Marshall Pierce
- JDBC41ResultSet's getObject() implementations throw (and I need the
(int, Class) overload to let the PG JDBC driver get the ODT), and while
DefaultResultSet overrides them, CursorResultSet does not, so I had to add
those overloads. Well, really I only needed one overload, but I added both
for good measure. Was it just an oversight that CRS didn't already have
those overloads?
https://github.com/jOOQ/jOOQ/issues/7959
Post by Marshall Pierce
- In the forcedType `types` element, what should I be using for a
Postgres TIMESTAMP WITH TIME ZONE? After fiddling around with a few
guesses, I found that using "timestamp.*with.*time.*zone", the forcedType
matches and I get the desired Instant fields, but "timestampwithtimezone"
and "timestamp with time zone" don't match and of course the fields end up
as OffsetDateTime again. Is there some mystery separator that's not a space
between those words?
Yeah, the default regular expression flags (<regexFlags>) is having
Pattern.COMMENTS turned on. This allows for adding comments to more complex
regexes, as well as formatting them as this flag ignores whitespace in the
https://www.jooq.org/doc/latest/manual/code-generation/codegen-advanced/codegen-config-database/codegen-database-regex-flags
- I was using DSL.currentOffsetDateTime() in a certain UPDATE statement.
Post by Marshall Pierce
Now that the fields are `Instant`s, that doesn't typecheck, so I'm using
`DSL.currentTimestamp().cast(SQLDataType.INSTANT)` (mimicking
currentOffsetDateTime()). This necessitated the creation of
public static final DataType<Instant> INSTANT = new
DefaultDataType<Instant>(null, Instant.class, "timestamp with time zone");
Is this the right way to handle `... set foo = now()` with Instant columns?
Well, you're using internal API (DefaultDataType). It may work in
unexpected way. Much better to create a new data type from
SQLDataType.TIMESTAMPWITHTIMEZONE by calling asConvertedDataType(new
YourBinding()).
That data type is then reusable. Also, the cast is probably not what you
want. It may result in an actual SQL CAST(...) expression. Field.coerce()
is one way to switch data types on an expression. Or, you can just use
plain SQL templating to re-create your now() function as an Instant data
https://www.jooq.org/doc/latest/manual/sql-building/plain-sql-templating
Post by Marshall Pierce
When transforming the `Result` into the type I want, that time is almost
Post by Lukas Eder
Post by Marshall Pierce
entirely spent on `Record.into()` hydrating `BlahRecord` instances for each
row. (I create tuples of the various `*Record` types, and then traverse
over them grouping, etc, as needed to assemble the types the rest of the
system uses.) `into()` is mostly spent in `RowImpl#indexOf()` -- perhaps
caching field offsets might help?
There are some caches, but they only work under specific conditions.
Perhaps your case could be cached as well... Could you provide an example
piece of code that resembles your real code, that helps reproduce this
particular part of the problem?
txnContext.select().from(WIDGETS).join(FLAVORS).on(trivial key
match).join(COLORS).on(...).leftOuterJoin(THINGS).on(..).join(PARTS_OF_THINGS).fetch().
WidgetTuple(record.into(WIGETS), record.into(FLAVORS), record.into(COLORS),
...). Some of the Tables I'm passing to `.into()` are table aliases
(because the same table is joined into a few different ways). If that's not
enough detail we can look at getting you access to the code or I can make a
synthetic repro.
Once the tuples are created, it's then easy to traverse across the
List<WidgetTuple>, grouping, filtering, etc as needed, and according to the
profiler, that logic takes very very little time.
I'll look into this, thanks. Specifically, this indexOf() call, which
https://github.com/jOOQ/jOOQ/blob/version-3.11.5/jOOQ/src/main/java/org/jooq/impl/AbstractRecord.java#L750
The question is: Where should that cache be as you're looping over your
records in your own code. I could see some per-loop caching in
ResultImpl.into(Table)
Anyway, while there are clearly some optimizations I could perform (rely
Post by Marshall Pierce
Post by Lukas Eder
Post by Marshall Pierce
on sorting to eliminate some grouping, for instance), I'm wondering if
there's anything obviously wrong I'm doing, or do I simply need to slice
the query up into different pieces so the combinatorial explosion of joined
rows isn't so egregious?
Yes, that definitely helps. Mapping to-many relationships using joins is
not always the best solution. In particular, it can also be simply wrong.
class Book {
List<Author> authors;
List<Category> categories;
}
If you now join both the authors to-many relationship and the categories
to-many relationship in one single query, you will get a cartesian product
between authors and categories, and that could lead to wrong results later
on, especially when you transitively join to-many relationships inside of
authors and/or categories.
I *think* I'm cleaning up that mess after the fact by (in this example)
filtering my tuples for ones where the contained AuthorsRecord.id != null,
grouping by that id, and arbitrarily choosing the first of those tuples to
be the one to use when instantiating the actual Author type (not a
codegen'd but rather the one that's exposed to the rest of the system). I'm
open to better ways though!
Well, there would need to be one query per branch, e.g.
- 1 query fetching all the books
- 1 query fetching all the authors for these books (preferably using an IN
(SELECT ..) predicate, not an IN (?, ?, ? ..) predicate)
- 1 query fetching all the categories for these books (preferably... see
above)
The first two could be combined with a join, but then you'd again
duplicated books. There's no perfect solution here. ORMs like JPA do
similar things to what I'm suggesting behind the scenes and let you
override their defaults (e.g. JPQL with its JOIN FETCH feature)
The SQL solution here would be the MULTISET operator, which allows for
Post by Marshall Pierce
Post by Lukas Eder
https://github.com/jOOQ/jOOQ/issues/3884
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FjOOQ%2FjOOQ%2Fissues%2F3884&sa=D&sntz=1&usg=AFQjCNEP1Yv0xPSqiPIZUzqhHYm3NhnhQw>
Unfortunately, it's not yet supported by jOOQ, nor by most databases (it
can be emulated with XML or JSON, though).
Another option which would probably be best in your case, and even bring
the query time down from 900ms to something more reasonable would be to run
more than one query, as you've mentioned. If done correctly, this will not
lead to N+1 problems, but maybe to 3-4 queries.
OK, that's what I suspected -- basically avoiding the "massive number of
rows via cartesian product" issue the brute force way. Is there some clever
way to parallelize the subsequent queries that won't force me to open
separate transactions?
I don't see any such way using standard JDBC based transactions, but I
think the sequential queries will be much faster than the single query with
duplication. If you're using a database that can run queries asynchronously
(e.g. PostgreSQL), you could try your luck with that. It should be possible
to share a transaction among client threads and run several statements in
parallel (they might still be sequenced by the database) - I have no
experience with that yet.
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Marshall Pierce
2018-10-20 15:43:37 UTC
Permalink
Post by Lukas Eder
Post by Marshall Pierce
- In the forcedType `types` element, what should I be using for a
Postgres TIMESTAMP WITH TIME ZONE? After fiddling around with a few
guesses, I found that using "timestamp.*with.*time.*zone", the forcedType
matches and I get the desired Instant fields, but "timestampwithtimezone"
and "timestamp with time zone" don't match and of course the fields end up
as OffsetDateTime again. Is there some mystery separator that's not a space
between those words?
Yeah, the default regular expression flags (<regexFlags>) is having
Pattern.COMMENTS turned on. This allows for adding comments to more complex
regexes, as well as formatting them as this flag ignores whitespace in the
https://www.jooq.org/doc/latest/manual/code-generation/codegen-advanced/codegen-config-database/codegen-database-regex-flags
Excellent, that worked, thanks.
Post by Lukas Eder
- I was using DSL.currentOffsetDateTime() in a certain UPDATE statement.
Post by Marshall Pierce
Now that the fields are `Instant`s, that doesn't typecheck, so I'm using
`DSL.currentTimestamp().cast(SQLDataType.INSTANT)` (mimicking
currentOffsetDateTime()). This necessitated the creation of
public static final DataType<Instant> INSTANT = new
DefaultDataType<Instant>(null, Instant.class, "timestamp with time zone");
Is this the right way to handle `... set foo = now()` with Instant columns?
Well, you're using internal API (DefaultDataType). It may work in
unexpected way. Much better to create a new data type from
SQLDataType.TIMESTAMPWITHTIMEZONE by calling asConvertedDataType(new
YourBinding()).
That data type is then reusable. Also, the cast is probably not what you
want. It may result in an actual SQL CAST(...) expression. Field.coerce()
is one way to switch data types on an expression. Or, you can just use
plain SQL templating to re-create your now() function as an Instant data
https://www.jooq.org/doc/latest/manual/sql-building/plain-sql-templating
<https://www.google.com/url?q=https%3A%2F%2Fwww.jooq.org%2Fdoc%2Flatest%2Fmanual%2Fsql-building%2Fplain-sql-templating&sa=D&sntz=1&usg=AFQjCNFrhea7zlxsgZLgTTiA_MggvdZ3UA>
Aha. Field.coerce() with asConvertedDataType() works fine. No shortage of
API to learn in jOOQ...
Post by Lukas Eder
Post by Marshall Pierce
WidgetTuple(record.into(WIGETS), record.into(FLAVORS), record.into(COLORS),
...). Some of the Tables I'm passing to `.into()` are table aliases
(because the same table is joined into a few different ways). If that's not
enough detail we can look at getting you access to the code or I can make a
synthetic repro.
Once the tuples are created, it's then easy to traverse across the
List<WidgetTuple>, grouping, filtering, etc as needed, and according to the
profiler, that logic takes very very little time.
I'll look into this, thanks. Specifically, this indexOf() call, which
https://github.com/jOOQ/jOOQ/blob/version-3.11.5/jOOQ/src/main/java/org/jooq/impl/AbstractRecord.java#L750
The question is: Where should that cache be as you're looping over your
records in your own code. I could see some per-loop caching in
ResultImpl.into(Table)
Perhaps each Record might keep a reference to its parent Result and access
some shared cache of field data? If that's not convenient, though, I'd be
totally fine if there was some other way to make caching explicit (though
it would be a shame to lose caching for all users transparently), like
having Result provide a CachingRecordWidget that had some
recordInto(Record, Table) method that would then make it trivial to cache
the necessary metadata.
Post by Lukas Eder
Well, there would need to be one query per branch, e.g.
- 1 query fetching all the books
- 1 query fetching all the authors for these books (preferably using an IN
(SELECT ..) predicate, not an IN (?, ?, ? ..) predicate)
- 1 query fetching all the categories for these books (preferably... see
above)
The first two could be combined with a join, but then you'd again
duplicated books. There's no perfect solution here. ORMs like JPA do
similar things to what I'm suggesting behind the scenes and let you
override their defaults (e.g. JPQL with its JOIN FETCH feature)
To be clear, if we had a BookTuple(book: BooksRecord, author:
AuthorsRecord, category: CategoriesRecord) type, and we populated that with
the obvious joins between books, authors, and categories, this is the type
of "cleaning up the mess" logic I was referring to the following (in mostly
Kotlin-esque syntax):

tuples
.groupBy { it.book.id }
.values // now it's a sequence of List<BookTuple>, where each list
corresponds to one book
.map { bookTuples ->
val bookRecord = bookTuples.first().book
val authors = bookTuples.filter { it.author.id != null }.groupBy {
it.author.id }.values.map { it.first().authorNameEtc }
val categories = bookTuples.filter { it.category.id != null}.groupBy {
it.category.id}.values.map {it.first().categoryDataHere}
// construct your Book type appropriately from the above locals
}

AFAICT this takes care of all the deduplication.
Post by Lukas Eder
I don't see any such way using standard JDBC based transactions, but I
think the sequential queries will be much faster than the single query with
duplication. If you're using a database that can run queries asynchronously
(e.g. PostgreSQL), you could try your luck with that. It should be possible
to share a transaction among client threads and run several statements in
parallel (they might still be sequenced by the database) - I have no
experience with that yet.
OK, I will look into it.

Also, for your other question about if we really do need all the data...
yes, it's for a good cause: exporting everything to CSV for people to
slice'n'dice in a spreadsheet, populating the "show me everything about
this entity" view, etc.
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-24 17:08:32 UTC
Permalink
Post by Lukas Eder
Well, you're using internal API (DefaultDataType). It may work in
Post by Lukas Eder
unexpected way. Much better to create a new data type from
SQLDataType.TIMESTAMPWITHTIMEZONE by calling asConvertedDataType(new
YourBinding()).
That data type is then reusable. Also, the cast is probably not what you
want. It may result in an actual SQL CAST(...) expression. Field.coerce()
is one way to switch data types on an expression. Or, you can just use
plain SQL templating to re-create your now() function as an Instant data
https://www.jooq.org/doc/latest/manual/sql-building/plain-sql-templating
<https://www.google.com/url?q=https%3A%2F%2Fwww.jooq.org%2Fdoc%2Flatest%2Fmanual%2Fsql-building%2Fplain-sql-templating&sa=D&sntz=1&usg=AFQjCNFrhea7zlxsgZLgTTiA_MggvdZ3UA>
Aha. Field.coerce() with asConvertedDataType() works fine. No shortage of
API to learn in jOOQ...
Well, this definitely isn't everyday API :)
Post by Lukas Eder
I'll look into this, thanks. Specifically, this indexOf() call, which
Post by Lukas Eder
https://github.com/jOOQ/jOOQ/blob/version-3.11.5/jOOQ/src/main/java/org/jooq/impl/AbstractRecord.java#L750
The question is: Where should that cache be as you're looping over your
records in your own code. I could see some per-loop caching in
ResultImpl.into(Table)
Perhaps each Record might keep a reference to its parent Result and access
some shared cache of field data? If that's not convenient, though, I'd be
totally fine if there was some other way to make caching explicit (though
it would be a shame to lose caching for all users transparently), like
having Result provide a CachingRecordWidget that had some
recordInto(Record, Table) method that would then make it trivial to cache
the necessary metadata.
That would scratch your immediate itch but is generally a not a good idea
as records are independent entities and could be placed in several results.
One example is the Result.intoGroups() method, which tends to wrap
sub-results in a Map. Other examples are methods like DSL.values(), which
have overloads accepting a Result, which might be constructed using
DSLContext.newResult().

This definitely looks easier at first than it really is. Any record can be
mapped into any other record type, including a Table reference, which may
or may not be generated. Blindly caching all these calls would blow up
cache sizes in some systems.

Of course, for large data sets, perhaps the built-in jOOQ mapping methods
might just not be good enough, and you can do easily better, knowing
exactly what you're doing given a particular use case.
Post by Lukas Eder
AuthorsRecord, category: CategoriesRecord) type, and we populated that with
the obvious joins between books, authors, and categories, this is the type
of "cleaning up the mess" logic I was referring to the following (in mostly
tuples
.groupBy { it.book.id }
.values // now it's a sequence of List<BookTuple>, where each list
corresponds to one book
.map { bookTuples ->
val bookRecord = bookTuples.first().book
val authors = bookTuples.filter { it.author.id != null }.groupBy {
it.author.id }.values.map { it.first().authorNameEtc }
val categories = bookTuples.filter { it.category.id != null}.groupBy
{ it.category.id}.values.map {it.first().categoryDataHere}
// construct your Book type appropriately from the above locals
}
AFAICT this takes care of all the deduplication.
Yes, without going through it thoroughly, that looks reasonable. But again,
the duplication is part of the performance problem, so surely there's a
better solution from that perspective.

Thanks,
Lukas
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Marshall Pierce
2018-10-24 17:27:26 UTC
Permalink
Post by Marshall Pierce
Perhaps each Record might keep a reference to its parent Result and
access some shared cache of field data? If that's not convenient,
though, I'd be totally fine if there was some other way to make
caching explicit (though it would be a shame to lose caching for all
users transparently), like having Result provide a
CachingRecordWidget that had some recordInto(Record, Table) method
that would then make it trivial to cache the necessary metadata.
That would scratch your immediate itch but is generally a not a good
idea as records are independent entities and could be placed in several
results. One example is the Result.intoGroups() method, which tends to
wrap sub-results in a Map. Other examples are methods like DSL.values(),
which have overloads accepting a Result, which might be constructed
using DSLContext.newResult().
This definitely looks easier at first than it really is. Any record can
be mapped into any other record type, including a Table reference, which
may or may not be generated. Blindly caching all these calls would blow
up cache sizes in some systems.
Of course, for large data sets, perhaps the built-in jOOQ mapping
methods might just not be good enough, and you can do easily better,
knowing exactly what you're doing given a particular use case.
Gotcha. Not surprising that there are 470 overloads I don't know about
yet in jOOQ that make it complex to automatically cache, but is it at
least feasible to opt in to caching so that a consumer could use it when
they're always looking up the same way (as I am)? Perhaps a
.cachedSomethingOrOther() that returns another Result -- that transient
object would be convenient to GC if it's only used in a loop, for instance.
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-24 17:33:23 UTC
Permalink
Post by Marshall Pierce
Gotcha. Not surprising that there are 470 overloads I don't know about
yet in jOOQ that make it complex to automatically cache, but is it at
least feasible to opt in to caching so that a consumer could use it when
they're always looking up the same way (as I am)? Perhaps a
.cachedSomethingOrOther() that returns another Result -- that transient
object would be convenient to GC if it's only used in a loop, for instance.
But how would that work? By the time this caching mechanism has identified
the cache key (e.g. composed of all the fields of the source record type
and all the fields of the target record type), the actual mapping would
have executed many times...
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Marshall Pierce
2018-10-24 17:46:03 UTC
Permalink
Perhaps a
Post by Marshall Pierce
.cachedSomethingOrOther() that returns another Result -- that transient
object would be convenient to GC if it's only used in a loop, for instance.
But how would that work? By the time this caching mechanism has
identified the cache key (e.g. composed of all the fields of the source
record type and all the fields of the target record type), the actual
mapping would have executed many times...
I'm not sure I follow, but I'm also not sure my proposal was thought
through very well. :)

What about this? Instead of:

result.map { SomeTuple(it.into(BOOKS), it.into(AUTHORS)) }

something more like:

val metadataCache = result.helpfulCommunicativeNameForThisMethod()

result.map {
SomeTuple(metadataCache.into(it, BOOKS),
metadataCache.into(it, AUTHORS))
}

Since the cache is explicitly re-used, presumably the work to map BOOKS
columns into a BooksRecord would only be done once.

That could be cleaned up some to make mis-use harder (provide the
transient cache and allow iteration via some callback perhaps to make
accidental leakage of the cache harder?), but hopefully that gets the
idea across.
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-26 11:24:15 UTC
Permalink
The discussion has somehow gotten off-list. Back to the list now...
but you’re right I’d much rather have #1 above. Also, just the
OffsetDateTime -> Instant improvement that came out of this thread will be
a nice win for *all* my queries when the next patch is released. :)
Yes, that's what I plan to work on these days
I can reproduce this. In a trivial case where I run 10000 queries producing
10000 timestamps each, a profiling session almost exclusively runs through
this parsing logic. I've created an issue to track this:
https://github.com/jOOQ/jOOQ/issues/7986

Further comments on the issue directly

Thanks again for reporting,
Lukas
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-10-26 15:10:04 UTC
Permalink
For the record, #7986 is now fixed for jOOQ 3.12 and will be backported. I
could achieve significant speedups when parsing OffsetDateTime manually,
rather than through the OffsetDateTime.parse() method
Post by Lukas Eder
The discussion has somehow gotten off-list. Back to the list now...
but you’re right I’d much rather have #1 above. Also, just the
OffsetDateTime -> Instant improvement that came out of this thread will be
a nice win for *all* my queries when the next patch is released. :)
Yes, that's what I plan to work on these days
I can reproduce this. In a trivial case where I run 10000 queries
producing 10000 timestamps each, a profiling session almost exclusively
https://github.com/jOOQ/jOOQ/issues/7986
Further comments on the issue directly
Thanks again for reporting,
Lukas
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Lukas Eder
2018-11-01 15:19:58 UTC
Permalink
Also for the record, the JDK issue is now reported:
https://bugs.openjdk.java.net/browse/JDK-8213243
Post by Lukas Eder
For the record, #7986 is now fixed for jOOQ 3.12 and will be backported. I
could achieve significant speedups when parsing OffsetDateTime manually,
rather than through the OffsetDateTime.parse() method
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...