Quantcast
Channel: Sam's Code » nhibernate
Viewing all articles
Browse latest Browse all 4

The Entity Framework (v1 and v4) Deal Breaker: TPT Inheritance

$
0
0
9/10/2010 Update #3: It appears that MS has deleted the bug on Microsoft Connect. Here is a cached version indexed from Google where you can see it has 34 upvotes, and 22 users saying they can reproduce. I blogged about this in January 2010, and the bug was opened on 3/17/2010. MS didn’t respond until 6/29/2010, when they said:
Sorry for not responding here. We are aware of performance issues with TPT hierarchies. We are currently investigating solutions and expect to make improvements in this area in a future release. This is definitely at the top of our query/perf optimizations list. I know this isn’t a great answer, and is not something that helps you today. Will keep you posted as we find out more, however. Hope this helps. Entity Framework Team
Apparently “keep you posted” means “we’ll delete your posts”. I don’t want to jump to any conclusions, but it is certainly very odd. Especially when you consider that they have mentioned the problem on their own blog (though they don’t nearly describe how bad it actually is).


3/17/2010 Update #2: I have submitted this problem as a bug on Microsoft Connect. If you’ve been bitten by this bug, vote it up (click the “I can too” link indicating that you can reproduce the bug), and leave some helpful comments.


3/17/2010 Update: I’ve put together a quick sample program that illustrates this problem, if you want to try it for yourself.
Download EFInheritanceTest.zip.
The data project has the included SQL scripts to create a simple EFTest database, with a base class table, some sub class tables, and a bit of dummy data.


I’ll keep the intro short and sweet. Everyone says you’ll eventually get burned by Microsoft data access strategies, because they eventually get abandoned. Well, my turn finally came. And it’s not that Entity Framework is being abandoned, it’s just that it DOESN’T WORK.
</intro>

Regarding inheritance mapping, EF advertises that it offers you 3 solutions:
  • TPT: Table per type
  • TPH: Table per hierarchy
  • TPC: Table per concrete class
I’ll be focusing on TPT: Table per type.

Let’s say I have a line of business application where users answer a bunch of Yes/No questions, and depending on their answers, fill out a series of forms, each form having a unique set of fields that we want to capture end-user data for.

Q: Did you open a bank account?
Yes
Name of bank:
Date account opened:
Purpose of account:
etc.

EVERY form has a common set of fields as well: The end-user that the particular instance belongs to, the date it was created, whatever. Now, when I have a User object, and I want to see all the forms he’s filled out, I don’t want to have to access 20 different collections of the User object, one collection for each form type:
User user = User.GetByKey(20);
user.NewAccountForms;
user.TaxForms;
user.ReplacedLimbsForms;
user.CentralServicesForms;

I want an IList<FormBase> or something similar. This means I need to use inheritance, and each form needs to be derived from my FormBase class (actually named FormInstance in this case).

If you need this type of design, or think that your software at some point in the future will require this type of design, DO NOT use the Entity Framework. The Table-Per-Type inheritance is 100% unusable. If you feel that one of the other forms of inheritance suits you, well, you may be crazy, but just the same….tread lightly. (And please don’t have 50 different discriminator values in a table that’s 200 columns wide…the SQL gods will strike you down).

So what’s wrong with EF’s TPT inheritance? The SQL that EF generates. With some simple testing (2 or 3 subclasses) everything may appear to be fine. In fact, you may even be impressed that EF manages to fetch all the data you need in a single trip, indeed, in a single query. Wow, their SQL generation must be top notch… But once you start adding some more subclasses, surprise! Your application grinds to a halt. So how bad is it? I decided to find out definitively.

Results

The setup is very basic. Here’s the Linq query I’m executing against my model:
	var q =
		from insts in context.FormInstances
		select insts;
	
	List<FormInstances> instances = q.ToList();
Very basic. Give me the FormInstances. FormInstance is the base class. I started with just a single derived class, ran my test 500 times, averaging the time it took to execute my simple query. Then I added a 2nd derived class, a 3rd one, etc etc. The base class has less than 10 fields. Each derived class (to keep things consistent) is exactly the same, with 3 simple fields. Also, the tables are empty. Yes, empty. No data! So the performance numbers here strictly represent the time it takes EF to generate its abhorrent SQL, and the amount of time it takes SQL Server to grind it out.

Most of these are averaged over 500 executions. Some of the later ones were fewer, because it started to take over an hour to execute.
Num Derived Classes Avg ms for query generation and execution Lines of SQL Generated
1 16 80
2 35 106
3 67 152
4 119 214
5 203 292
6 340 386
7 523 496
8 784 622
9 1137 764
10 1608 922
11 2198 1096
12 2958 1286
13 3900 1492
14 4828 1686
15 5962 1892
20 17412 3302
25 41206 5112
30 108807 7862
So what happens when your application has 30 subclasses? It takes 108 seconds to generate and execute the SQL to retrieve the records in them! That’s 3.6 seconds PER FORM! And that’s when the query returns ZERO RECORDS, because the tables are EMPTY. Why does it take so long? Well, look at the 7,862 lines of SQL it generates, to start with (I won’t post it here; I’ll spare you that).
Here’s what it looks like graphed:

Ouch! That sure doesn’t look linear folks… And guess what, my “I got burned” experience doesn’t have 30 form tables. It has 89. Based on the trend outlined above, it would take 97 minutes to generate and execute the SQL to select the data from those 89 tables. Funny, because 30 SQL statements querying those tables, even with WHERE fk IN(SELECT id FROM BaseTable) clauses on each one only take 132ms on the same computer…And it returns the same data.

Pragmatists vs Idealists

Joel Spolsky has some great words (scroll down to “The Two Forces at Microsoft”) regarding two developer camps at Microsoft; the Pragmatists, and the Idealists. Windows XP was a product of the Pragmatists; Vista, of the Idealists. Enough said. While the SQL generated by EF in cases such as these may call to the inner relational mathematician idealist in us, the result is impossible to read or easily understand, takes a long time to be generated, and even longer to execute. In other words, it’s 100% unusable.

Now, as a matter of research, I have to create this same test in NHibernate, to see what the generated SQL looks like, and to see how well it performs. I’ll update this post when I have that data. Until then, if you decide to use EF’s inheritance “features”, here is a list of movies that are all 97 minutes long, so that your end users will have something to do while your site grates off your SQL Server’s face from the comfort of their browsers.







Viewing all articles
Browse latest Browse all 4

Trending Articles