Hi JC,
Thank you very much for your message. That's a very interesting question,
and I am very happy to review the existing implementation. In fact,
incidentally, I have recently noticed indexOf() to be a "hot spot" for some
queries at a customer site (where they were projecting huge results with
100s of columns and fetched 10000s of rows). The solution there was to move
more logic into SQL and reduce the projection. Not all columns were needed,
and by using aggregation and window functions, not all rows were needed
either.
Caching is very hard. While a HashMap provides O(1) access complexity (O(N)
in rare cases, when hash codes are ill chosen), it still has quite some
high "setup" cost:
- hash code calculation
- equals calculation
- several heap jumps following the internal data structures to the Entry
In fact, I frequently encounter ill chosen HashMaps as the main bottleneck
of some algorithm, such as [1] and [2]
Iterating an array is O(N), but individual array access is much cheaper.
Also, HashMaps consume quite a bit more memory than arrays. Both are O(N),
but a HashMap stores several auxiliary values for each key / value. Notice,
we don't gain anything by using HashSet, because it's just delegating
everything to a HashMap
So, by hypothesis, arrays tend to be better for small results (few columns)
whereas HashMaps tend to be better for larger results (many columns).
One thing that's definitely not optimal is how field equality is currently
resolved through the various methods. You may have noticed that
Fields.indexOf(Field) [3] first delegates to Fields.fields(Field) [4],
which does more linear searching. Both are optimised for identity
comparisons (which usually applies when using generated code), but we
should definitely be able to avoid a few loops by providing more
specialised implementations on a per-method case. I have created an issue
on GitHub for this [5]
Notice that we cannot use JDK's HashMap for such a cache, because while ...
(field(name("a", "b", "c")) == field(name("c"))) == false
... we can definitely retrieve a field(name("a", "b", "c)) from a record by
querying for field(name("c")) for a variety of convenience reasons. So,
what we would need is a HashMap with an external hashCode() and equals()
provider, such as the commons collections AbstractHashedMap:
https://stackoverflow.com/a/20030782/521799
I must admit, I have never benchmarked that AbstractHashedMap option, as it
hasn't occurred to me as an option until right now. I will definitely
benchmark this and post results to GitHub [5] and this list.
Regarding your questions and assuming a cache would actually provide
significant help:
1. Should it be used for every type of Result? I don't think so. For
Post by JC Mannexample, I don't think it makes sense to apply an optimization of a Result
which contains only 1 column and 1 row.
Exactly. We would have to find the sweet spot. Let's say (random guess)
with 16 columns or more. There's no distinction between 1 row results and
100000 row results. The number of columns is what makes a difference
because...
2. Where should this cache live? It would make sense to store it on the
Post by JC MannResult so it is optimized for each row, but is there a better place? What
if it were stored with the query?
It would live in org.jooq.impl.Fields, which is already a shared instance
for queries, results, and records. Its main purpose is to abstract over any
kind of org.jooq.RecordType access in all of jOOQ's internals.
Notice that in the first versions of jOOQ, a Record was really a Map (i.e.
it contained a HashMap). That was a major source of allocation trouble,
which is why we now have that array inside of the record.
Thanks again for bringing this up. I will definitely follow up in this
area. If you find anything additional that is noteworthy, please do post it
here. I'm very happy to discuss this.
[1]: https://bugs.openjdk.java.net/browse/JDK-8213243
[2]: https://bugs.openjdk.java.net/browse/JDK-8213280
[3]:
https://github.com/jOOQ/jOOQ/blob/version-3.11.7/jOOQ/src/main/java/org/jooq/impl/Fields.java#L267
[4]:
https://github.com/jOOQ/jOOQ/blob/version-3.11.7/jOOQ/src/main/java/org/jooq/impl/Fields.java#L86
[5]: https://github.com/jOOQ/jOOQ/issues/8040
--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.