I am fairly new to SQL so appologies if this questions sounds odd.
I keep coming accross the issue of poor data. As an example London could be stored as LON, London UK, London England etc. Prior to using SQL I had lots of Excel lookup tables where in the first column I would have the original then the 2n the corrected version. As an example:
Name Name_1 London, UK London Lon London LON London London London London, England London LND London
Is there a straightforward way of doing this in SQL, I am currently trying to create lookup tables and then use joins. This is becoming tricky as I do not always have corrections for every instance, so in some scenarions (most) my lookup tables have fewer items than what I'm joining them to.
I've been teaching myself about stored procedures and I wondered if this could fit what I need. Trouble is my searching on the subject of lookup tables is coming up empty.
Any advice or pointers would be gratefully received. Even if it is just this cannot be done.
Thanks as always for you help and appologies for the long post.
The bottom line... bad data is bad data and it takes a lot of work to either use bad data or clean bad data or both.
UPDATE AFTER CLARIFICATION
Build your own ETL (Extract, Transform, Load) process to handle all the variant incoming data. Your ETL process will most likely get modified with each new batch of data you receive because you will have to trap for new "Bad Data" variants.
Import the data into an ALL VARCHAR table
Run the ETL Process
Modify ETL Process
Run ETL Process
Until No More Exceptions
-- End Update
If you use LEFT JOIN you can identify the missing values fairly easily.
SELECT t1.FirstName, t1.LookupField, t2.Name_1 FROM People as t1 LEFT INNER JOIN TableLookupCities as t2 ON t1.LookupField = t2.Name
Anywhere t2.Name_1 returns a NULL you know you need to add that "LookupField" to your lookup table. Here is a good book for learning database design Database Design for Mere Mortals
-- Group By to Find Missing Unique Values t1.LookupField, t2.Name_1 FROM People as t1 LEFT INNER JOIN TableLookupCities as t2 ON t1.LookupField = t2.Name GROUP BY t1.LookupField, t2.Name_1
Some people tell me null is evil within a relational database is this true or just subjective?
I believe in black and white answers... Yes, Null values in a relational database are bad. Some individuals (including senior level DBAs) will argue that some NULL values are fine for this that and other, but this is like saying some water in a gas tank is OK. Yes, technically a car will still run, albeit poorly, with some water in the tank, but wouldn't you avoid it if you could? The more NULL values in a database the more SQL you will write for your reports.
If data is optional then the business model obviously does not "need" it, and so you're better off either leaving it out or enforcing it so that it must be captured.
For individuals looking for answers to this and similar questions, I cannot recommend Database Design for Mere Mortals: second edition strong enough. It was recommended to me by a software architect and I have yet to meet anyone who recommends anything else. I actually purchased two copies just in case.
There is also a newer Database Design for Mere Mortals: third edition by the same author. I have not purchased this yet, so I can't say better or worse.
Personal Note: As far as this Question & answer being "primarily opinion-based" goes, please bear in mind that an opinion without logical evidence to the contrary is not opinion at all- It's a fact. If another SSRS or Crystal Reports Analyst/Developer comes in here and says "NULL values are no problem at all" then I will remove my answer.