No Extra MultipleBagFetchException Due to Multiset Nested Collections – Java, SQL and jOOQ.

[ad_1]

I’ve not too long ago stumbled upon this fascinating Stack Overflow query about Hibernate’s widespread MultipleBagFetchException. The query is tremendous widespread, and the solutions are a lot. The assorted limitations are mentioned all through the query, all of it boils right down to a easy truth:

Joins are the flawed instrument to nest collections.

Given a schema just like the Sakila database:

There are numerous to many relationships between:

  • ACTOR and FILM
  • FILM and CATEGORY

Nothing extraordinary. The issue is that when utilizing an ORM, given the character of the O (Object, Object Orientation), we need to characterize this knowledge in a hierarchy, as a graph, or at the very least a tree. The identical holds true after we need to characterize this in JSON or XML.

For instance, in Java, the next DTOs are a pure illustration of the above schema:

document Actor(
    String firstName, 
    String lastName
) {}

document Class(
    String identify
) {}

document Movie(
    String title,
    Listing<Actor> actors,
    Listing<Class> classes
) {}

In JSON, the information would possibly look one thing like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "classes": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Emulated nesting using joins

But the problem in Hibernate and in SQL in general is the fact that joins create cartesian products. It’s not actually a problem. It’s a feature of SQL and relational algebra. We have an entire blog post about how our industry has been teaching joins wrong, using Venn Diagrams.

Joins are filtered cartesian products. Here’s an example of a cartesian product (without filter):

venn-cross-product

Now, if you want to produce the previous nested collection representation using joins only, then you’ll probably write something like this:

SELECT *
FROM film AS f
  JOIN film_actor AS fa USING (film_id)
    JOIN actor AS a USING (actor_id)
  JOIN film_category AS fc USING (film_id)
    JOIN category AS c USING (category_id)

I’ve indented the joins deliberately to illustrate the tree structure of this denormalisation. For each film, we join:

  • Many actors (e.g. M)
  • Many categories (e.g. N)

This means, that we duplicate the film M * N times due to the nature of joins being cartesian products. Not just that, but worse, we also duplicate:

  • Each actor N times (once per category)
  • Each category M times (once per actor)

Eventually, this might even lead to wrong results, e.g. when aggregating, as some combinations shouldn’t be combined.

Apart from the potential correctness problem, this is a very big performance problem. As the ubiquitous Vlad has explained in his answer, JOIN FETCH syntax is being suggested along with DISTINCT and multiple queries as a workaround. You then have to re-assemble the results manually (I stand corrected, Hibernate takes care of re-assembling, see Vlad’s comment below) and take proper care about eager and lazy loading (while you generally have to be careful with these, they don’t apply here, see again Vlad’s comment below). Quite the chore if you ask me!

This is my most favourite Google search on the subject matter:

To be fair, the chore used to be also present with jOOQ, in the past – at least you couldn’t shoot yourself in the foot with accidentally loading all the entire database.

Actual nesting

Ever since ORDBMS were introduced (e.g. Informix, Oracle, PostgreSQL), and the more popular SQL/XML and SQL/JSON extensions were added, it is possible to perform actual nesting directly in SQL. I’ve blogged about this many times now on this blog:

The correct way to nest collections is with SQL via one of the above 3 serialisation formats (native, JSON, XML).

With the above techniques, you can nest your data to any nested DTO structure in Java, or to any nested JSON format. This is possible with native SQL or with jOOQ. It might also be possible with Hibernate in the future, or with other ORMs that follow jOOQ’s lead in this area.

Given the popularity of this Stack Overflow question, it is hard to ignore how important of a problem the nesting of multiple to-many relationships is, and how both SQL (the language) and ORMs have ignored this problem for so long, offering only quirky workarounds that leave users to implement their serialisation manually, when jOOQ has shown how simple and transparent it could be.

Try jOOQ’s MULTISET operator today, no need to wait. It’s as simple as this:

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("categories").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch(mapping(Film::new));

And the above query is type safe! As soon as you modify your DTO, the query no longer compiles. Not just that! jOOQ also has a parser, so you can pretend your favourite SQL dialect already has MULTISET support today. Try this query here: https://www.jooq.org/translate/

SELECT
  f.title,
  MULTISET(
    SELECT a.first_name, a.last_name
    FROM film_actor AS fa
    JOIN actor AS a USING (actor_id)
    WHERE fa.film_id = f.film_id
  ) AS actors,
  MULTISET(
    SELECT c.name
    FROM film_category AS fc
    JOIN category AS c USING (category_id)
    WHERE fc.film_id = f.film_id
  ) AS categories
FROM film AS f
ORDER BY f.title

jOOQ’s translator will translate this to the following on PostgreSQL:

SELECT
  f.title,
  (
    SELECT coalesce(
      jsonb_agg(jsonb_build_array("v0", "v1")),
      jsonb_build_array()
    )
    FROM (
      SELECT
        a.first_name AS "v0",
        a.last_name AS "v1"
      FROM film_actor AS fa
        JOIN actor AS a
          USING (actor_id)
      WHERE fa.film_id = f.film_id
    ) AS "t"
  ) AS actors,
  (
    SELECT coalesce(
      jsonb_agg(jsonb_build_array("v0")),
      jsonb_build_array()
    )
    FROM (
      SELECT c.name AS "v0"
      FROM film_category AS fc
        JOIN category AS c
          USING (category_id)
      WHERE fc.film_id = f.film_id
    ) AS "t"
  ) AS categories
FROM film AS f
ORDER BY f.title

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *