Database Design for Mere Mortals

Michael J. Hernandez

Mentioned 2

A guide to relational database design covers such topics as setting objectives, establishing table structures, and identifying and establishing business rules.

More on Amazon.com

Mentioned in questions and answers.

I am fairly new to SQL so appologies if this questions sounds odd.

I keep coming accross the issue of poor data. As an example London could be stored as LON, London UK, London England etc. Prior to using SQL I had lots of Excel lookup tables where in the first column I would have the original then the 2n the corrected version. As an example:

Name             Name_1
London, UK       London
Lon              London
LON              London
London           London
London, England  London
LND              London

Is there a straightforward way of doing this in SQL, I am currently trying to create lookup tables and then use joins. This is becoming tricky as I do not always have corrections for every instance, so in some scenarions (most) my lookup tables have fewer items than what I'm joining them to.

I've been teaching myself about stored procedures and I wondered if this could fit what I need. Trouble is my searching on the subject of lookup tables is coming up empty.

Any advice or pointers would be gratefully received. Even if it is just this cannot be done.

Thanks as always for you help and appologies for the long post.

The bottom line... bad data is bad data and it takes a lot of work to either use bad data or clean bad data or both.

UPDATE AFTER CLARIFICATION

Build your own ETL (Extract, Transform, Load) process to handle all the variant incoming data. Your ETL process will most likely get modified with each new batch of data you receive because you will have to trap for new "Bad Data" variants.

Import the data into an ALL VARCHAR table
Run the ETL Process

  • Good Data goes into real data tables
  • Bad data goes into an exception table

Repeat
Modify ETL Process
Run ETL Process
Until No More Exceptions

-- End Update

If you use LEFT JOIN you can identify the missing values fairly easily.

SELECT
t1.FirstName,
t1.LookupField,
t2.Name_1
FROM People as t1
LEFT INNER JOIN TableLookupCities as t2
ON t1.LookupField = t2.Name

Anywhere t2.Name_1 returns a NULL you know you need to add that "LookupField" to your lookup table. Here is a good book for learning database design Database Design for Mere Mortals

-- Group By to Find Missing Unique Values
t1.LookupField,
t2.Name_1
FROM People as t1
LEFT INNER JOIN TableLookupCities as t2
ON t1.LookupField = t2.Name
GROUP BY 
t1.LookupField,
t2.Name_1

Some people tell me null is evil within a relational database is this true or just subjective?

Answer:

I believe in black and white answers... Yes, Null values in a relational database are bad. Some individuals (including senior level DBAs) will argue that some NULL values are fine for this that and other, but this is like saying some water in a gas tank is OK. Yes, technically a car will still run, albeit poorly, with some water in the tank, but wouldn't you avoid it if you could? The more NULL values in a database the more SQL you will write for your reports.

If data is optional then the business model obviously does not "need" it, and so you're better off either leaving it out or enforcing it so that it must be captured.

Recomendation

For individuals looking for answers to this and similar questions, I cannot recommend Database Design for Mere Mortals: second edition strong enough. It was recommended to me by a software architect and I have yet to meet anyone who recommends anything else. I actually purchased two copies just in case.

There is also a newer Database Design for Mere Mortals: third edition by the same author. I have not purchased this yet, so I can't say better or worse.

Personal Note: As far as this Question & answer being "primarily opinion-based" goes, please bear in mind that an opinion without logical evidence to the contrary is not opinion at all- It's a fact. If another SSRS or Crystal Reports Analyst/Developer comes in here and says "NULL values are no problem at all" then I will remove my answer.