Best SQL approach for populating composite Business Objects

Well, I didn't get much response out of CFCDev, so I figured I would move the discussion over to my blog.
(So applogies to all this sounds really familiar to)

I always wonder what the best way is to get the results from a database when dealing with composite Business Objects.

Reading Matt Woordards blog post on DAOs and Composition,

He talks about composition, and to quote:

We're using composition in our beans, so why not use composition of
sorts (this isn't strictly composition, but bear with me …) in our
DAO as well? When we instantiate the Person DAO, why not just
instantiate an Address DAO inside the Person DAO so we can call things
that way?

Now, in my mind – there are two options for getting the PersonBO information and the AddressBO information.

1) Seperate SQL queries for both BO's – The Address DAO gets the results for a Address, and the PersonDAO gets the results for the
Person BO – there are 2 query hits on the db.
Pros: It's clean, consise and really reusable
Cons: For complex objects, that's alot of database hits – is that a good idea?

2) JOIN between tables – a Gateway object (maybe?) gets a joined result by getting the relevent data form both the Person Table and the
Address Table, and the DAO's intelligently create the BO's that they need from that single query's results
Pros: It's alot more efficient than the above
Cons: It doesn't exactly promote code reuses
It also can get tircky as JOIN can cause duplicate data to display, and you have to avoid creating the same BO twice or more (unless you want to join up data usion UNION or something similar).

Personally I've been running with option (2) for a while, simply from a database performance perspective – but it does leave me with alot of pretty ugly code.  Which makes me lean more these days towards options (1)

But sometimes I write BO's that can be composite of greater than 20 objects (and yes, all are required), so querying the database will 20 small hits I think would take a longer time.

So, which way around do you do it? and why?

Leave a Comment

Comments

  • Bill Rawlinson | January 27, 2005

    run with the join. 20 hits on the database is ALOT considering that will probably happen alot of the time. The project I am working on now does one big join; which, sadly, is different for each database I support due to some outer joins.

    I’m a big proponent of easilly managed and maintainable code; but i"m an even bigger fan of performance in this scenario. Plus, I think having one query do all this work makes alot more sense than having twenty, and then having some sort of composition logic go on to bring it all together.

    In the end its all a trade off of course – but don’t break up your query. In fact, typically if I know I ALWAYS need to get data from multiple tables for one full object definition I will always go the route of one query with JOINS instead of multiple trips to the database.

  • Mark | January 27, 2005

    Bill –

    Where would you put your query?
    a) PersonDAO
    b) AddressDAO (which makes no sense to me)
    c) PersonGateway

    I would think more into putting into PersonGateway? but really you could put it into PersonDAO if you don’t mind queries in your DAO.

  • Bill Rawlinson | January 27, 2005

    personally, I would put it in the personDAO. The Address information is, as you stated, a required part of the Person. So, as long as everytime I get all of the Person data from the database I also need to get the address then I would have it in the DAO.

    However, if most of the time when I get the person I don’t need extended info like the Address then I would probably have two queries in the DAO – read and readWithAddress (or extendedRead-or something). But I would still put both in the PersonDAO

  • Greg | January 28, 2005

    As the complexity of your domain model increases it will get more and more difficult to join all of your SQL into one query. Plus, you may end up including data that you will not nead (like your read and readWithAddress sample).

    This is a place where the Proxy pattern comes in handy. When your PersonDAO creates a Person object, instead of populating with an Address it would populate it with an AddressProxy. From outside the AddressProxy looks exactly like an Address, but it hasn’t been populated any data. All it contains is an instance of AddressDAO and the primary key fields. The first time you access one of the methods on the AddressDAO object it uses AddressDAO to populate itself with the real data.

    So in this case you would still execute two SQL statements, but the second one would not execute until you actually needed it. Overkill for this situation, but handy when the domain model gets big (where loading one object could pull back a large chunk of the database).

  • Bill Rawlinson | January 28, 2005

    the proxy pattern is a pretty cool idea and I hadn’t encountered that one yet.

    thanks for the heads up Greg. Very cool.

  • Mike Rankin | January 28, 2005

    How would you deal with a flexible number of addresses per person, ie. home address, work address, shipping address, billing address, etc.?

  • Michael Dawson | March 29, 2005

    I, too, am interested in hearing how you handle 1:M issues.

  • Bill Rawlinson | March 30, 2005

    I would go with the proxy iea Greg mentioned on the 1:M scenario – especially if you don’t normally have all of that data.

    However, if you know all of that data was required for every user in the system, I would still stick with one big query.

  • enigment | May 4, 2005

    One big query seems to have several major issues:

    – Bad match for complex objects. For instance, take an object composed of user data, plus variable numbers of address, phone, email and website objects. That’s going to be very awkward as one large query.

    – In a simple object-and-one-set-of-children scenario where a join could make some sense, the one-object’s data is returned redundantly with each many-object row.

    – Even if a join can read a complex object, it can’t write all the tables of it at once in any sql implementation I know of. That leads to an asymmetry, where there’s one monolithic read, but a compound write composed of individual writes for the component objects.