Handling Large Collection Data – Hibernate Pagination, ScrollableResults and Native SQL

In Hibernate it is common to fetch a collection of data like this:

List l = session.createQuery(“select * from … “).list()

However this solution may not scale well if the query returns large amount of data(such as in a batch job etc):

-  Consuming a lot of memory

-  Possible slow speed as large data being moved and processed between middleware and database

Here we discuss a number of options to improve the performance of handling large data set.

(1). Hibernate Pagination

The idea here is to divide the large result set into a number of pages and fetch only one page at a time.  You need to specify the start position and the page size in the query parameter. If not in original query yet, you also need to add an “order by” clause to ensure the result data set is sorted and kept in a consistent view of data between pages.

The code itself is very simple:

Query q = session.createQuery(select * from ... order by ...");
q.setFirstResult(startPosition);
q.setMaxResults(pageSize); List l = q.list();

This solution resolves the memory issue nicely and works with most major databases via different dialects offered by Hiberante. For instance in a search-like application, this works well as an user may click through different search result pages.

However if an application does need to process all the result set data this approach may significantly increase query execution time as previously a single query becomes a large number of queries. For example, if the original query returns 100k rows and the page size is 100, the total queries to be run in the pagination would be 1000. The “order by” clause may also impact the query time as well. You also need to figure out total number of pages in which another query “select count(*) from … ” is added.

(2) Hibernate ScrollableResults

This approach loops through the entire results set in a stream fashion, thus will not incur significant slow-down in query execution time.

      Query q = session.createQuery("select * from ...");
      ScrollableResults results = q.scroll();
       while (results.next() )
       {
         Long id = results.getLong(0); ...
      }
      results.close();

Moreover if session cache is enabled, you need to add explicit code to clear the session cache, such as a code snippet here to clear cache every 100 rows:

 readCount++;
if ( readCount % 100 == 0) {
session.flush();
session.clear();
}

Hibernate has different level of caching and be careful about it. If session cache is not cleared periodically and the results set data is large, you will soon get out-of-memory error.

This approach has some drawbacks as well: business logic may be moved to dao tier; the connection remains open during entire operation; some databases may not support this feature(need to check the related jdbc driver)


(3) Native SQL

If operation speed is a major issue and none of above solutions is fast enough, you may consider to move the entire operation to database side via native jdbc and/or stored procedure. This approach can leverage database specific features to improve performance, and may also reduce unnecessary data transportation between middleware and database. In some real applications, we have some very I/O intensive operations and execution time is reduced significantly(3~5 times) via direct jdbc sql.

However this approach makes your application database specific and actual performance improvement will be based on application’s specific behaviors. It also moves the cpu and I/O load to database server and all needs to be considered before hand.

Summary

When dealing with large data, we list three solutions to improve performance.


Solution Memory usage Speed Portablity across databases Simplicity
Hibernate pagination Bound Slow Most Simplest
Hibernate ScrollableResults Conditional bound (explicit clear session cache needed) Fair Somewhat(depends on driver) More work(biz logic moved to dao tier)
Nativce SQL/Stored Procedure Bound Fast No Most amount of work

Each solution has its advantages and disadvantages. Based on each application’s specific requirements, you may choose a proper solution here.

     

About these ads

Tags:

3 Responses to “Handling Large Collection Data – Hibernate Pagination, ScrollableResults and Native SQL”

  1. Shashank Says:

    How about using 1 and 3 together ie pagination and native sql query together should give you added performance improvement?

  2. herbertwu Says:

    Yes. It may work in some cases. For example in a search application, most time users only view first or second pages and native sql can speed up a single page loading speed.
    However if your application needs to loop through all the result pages, option 1 may be still too slow as number of pages could be very large and each page requires a single native sql execution.

  3. Robin Says:

    the native sql can specify a range (using between) on a indexed column (i.e., id) to be used for paging. so if the loop through operation requires to process each record with certain cpu time, this ranged native sql solution may be beneficial for a multi-threaded solution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: