10/4/10

10+ ways to screw up your database design

Database developers often fall into certain design traps — and it’s generally the users who pay the price. See how you can improve your design process and build a more flexible, reliable, and efficient database.

Database developers are people too — you make mistakes just like everyone else. The problem is, users are the ones who pay for those mistakes when they have to work with slow, clunky applications. If you’re part of your company’s IT team, a mistake could cost you your job. If you’re a freelance developer, a bad database usually means a lost client — and even more lost business, if word gets around. Here are some common design pitfalls to watch out for.

Note: This article is also available as a PDF download.

1: The data’s unimportant; it’s the architecture that matters

It doesn’t matter if your code sings, if you don’t know the data. You want the two to work in harmony and that means spending some time with the people who use and manipulate all that data. This is probably the most important rule: Before you do anything, you absolutely must get intimate with the data. Without that firsthand knowledge, you might make errors in judgment that have far-reaching consequences — like dragging the whole application to a complete halt.

A nonchalant attitude that the data isn’t important isn’t a sign of laziness. It’s a mistaken perception that anything that doesn’t work quite right early on can be fixed later. I just don’t agree. Doing it right from the bottom up will produce a foundation that can grow and accommodate change quickly. Without that foundation, any database is just a few Band-Aids away from disaster.

2: I can do anything with a little code

Some developers are so skilled that they can make just about anything happen with a bit of code. But you can take a good thing too far. One of my biggest complaints about the developers’ psyche is that they want to solve everything with code, even when a system feature exists to handle the need. They claim it’s just easier — easier for them, maybe, but not necessarily easier for those maintaining the database. My recommendation is to use the built-in features, when available, unless you don’t know the RDMS well enough. If that’s the case, see #3.

3: I can use whatever RDBMS you have

This brings me to the next point: Developers who think the system is unimportant because their coding ability is the only magic they need. Wrong! Unless your hands are tied, choose the best system for the job. You’ll save your client time and money and build a reputation as an honest and comprehensive developer. You might not have a choice, of course. If you find yourself stuck with a system you’re not familiar with or that’s just not right for the job, you might want to excuse yourself from the project. You’re going to take the fall for that decision eventually, even if you didn’t make it.

4: That doesn’t need an index

Very little affects performance like failing to apply an index or applying an index incorrectly. It isn’t rocket science, and there are guidelines that help. But many developers still avoid the task altogether. Without proper indexing, your database will eventually slow down and irritate users. Perhaps the only thing that causes as much trouble as no indexing is too much indexing.

There’s a lot of free information on indexing, but here are my simplest recommendations on the subject:

  • Learn about the index construction the RDBMS uses.
  • Learn when to apply those indexes for optimum coverage.
  • Run the RDBMS’s index analyzation or tune-up tool for hints.

5: This database doesn’t require referential integrity

Enforcing referential integrity protects the validity of your data by eliminating orphans (foreign keys that have no related primary key entity). For instance, in a sales database, you might have an ordered item that doesn’t point to a customer — not a good idea. If your RDBMS supports referential integrity, I recommend that you use it.

6: Natural keys are best

Relational database theory relies on keys, primary and foreign. Natural keys are based on data, which of course has meaning within the context of the database’s purpose. Natural keys are obsolete now that we have systems that can generate sequential values, known as surrogates. They have little purpose beyond identifying entities. (They are usually an auto-incrementing data type).

The superiority of natural versus surrogate keys is a hotly debated topic. Just bring it up in your favorite development list or forum, sit back, and watch the show. Here’s the nitty-gritty though:

  • Natural keys can be unwieldy and awkward to maintain. It might take several columns to create a unique key for each record. It’s doable, but do you really want to accommodate that kind of structure if you don’t have to?
  • Primary keys are supposed to be stable; nothing about data is stable. It changes all the time. In contrast, there’s no reason to ever change a surrogate key. You might delete one, but if you have to change a surrogate key, something’s wrong with your design.

The biggest argument for natural keys is one of association. Proponents insist that you need to be able to associate the key to the actual record. Why? Keys are used by the RDBMS, not users. The other most commonly heard argument is that surrogate keys allow duplicate records. My response is to apply indexes appropriately to avoid duplicate records.

I recommend surrogate keys — always, which is an invitation to hate mail, but it’s my recommendation just the same. I can think of no circumstance where a natural key would be preferable to a surrogate.

7: Normalization is a waste of time

Just writing that title hurts. Unfortunately, I do run into developers who don’t take normalization seriously enough. Normalization is the process of removing any repeating groups and redundant data to related tables. This process supports the RDBMS by theory and design. Without normalization, a RDBMS is doomed. Despite its importance, many developers make a cursory pass through the data and normalize very little, and that’s a mistake you should avoid. Take the time to break down your data, normalizing at least to 2nd or 3rd Normal form.

8: You can’t normalize enough

The previous point may seem to imply that normalization is the panacea of database design. But like code, too much of a good thing can slow things down. The more tables and joins involved in pulling data together into meaningful information, the slower the database will perform. Don’t overdo it — be thorough without being obsessed.

If your normalization scheme requires several tables to generate a common view, you’ve gone too far (probably). In short, if performance slows and there’s nothing wrong with the connection, the query, and so on, excessive normalization might be the culprit.

9: It’ll perform just as well with real data

Failing to test a database for scalability is a huge mistake. During the development stage, it’s acceptable to work with a scant amount of data. On the other hand, a few rows of test data just can’t provide a realistic view of how the database will perform in a production environment. Before going live, be sure to test your database with real data, and lots of it. Doing so will expose bottlenecks and vulnerabilities.

You can blame the database engine for choking on real data — nice work if you can get it (and the client believes you).

10: Only the most elegant code is good enough for my clients

This attitude is another example of how too much of a good thing can be bad. We all want to write the best code possible, but sometimes, good enough is, well, good enough. Time spent optimizing routines that already perform well and accurately can be money down the drain for your client. If the database runs great with a bit of ugly code, so what? Is the trade-off worth the extra time and money you’ll spend to optimize the code to its fullest? I’m betting your client would answer in the negative. I’m not saying write clunky code. Nor am I suggesting that you write code that performs poorly because doing so makes your job easier. I’m saying, don’t put your client’s money into optimizing something that works fine as is. Put that time into good design and a solid foundation — that’s what will support the best performance.

11: You can back it up later

If the data is important enough to store, it’s important enough to protect. Hardware breaks. Mistakes happen. A backup plan should be part of your development process, not an afterthought: I meant to do that. How often should you back up the database, where will you store those backups, and so on, are questions to answer up front, not after your client losses important data.

12: You promised that wouldn’t change

The client promised that a specific business rule would never change and you believed it. Never believe them! Don’t take the easy way out on this one; apply the best design and logic so that change is easy. The truth is, once users become accustom to the database, they’ll want more — and that means change. It’s just about the only part of the whole development process you can depend on.

13: Yes, I can give you the moon

Some developers are so ambitious. Wanting to give users everything they want in the first version is a nice sentiment, but it’s also impractical. Unless the project is small with a specific focus, producing a foundation version that can go into production quickly is preferable. Users won’t get everything they asked for, but they’ll have a production database much sooner. You can add features with subsequent versions. The client gets work quickly and you get job security.

http://blogs.techrepublic.com.com/10things/?p=1855

No comments: