Monday, November 03, 2014

Removing Superfluous Spaces

We all can relate to dealing with systems that have data integrity problems. But some data integrity problems can be cleaned up using a touch of SQL. Consider the common data entry problem of extraneous spaces (or blanks) inserted into a name field. Not only is it annoying, sometimes it can cause the system to ignore relationships between data elements because the names do not match. For example, “Craig  Mullins” is not equivalent to “Craig Mullins”; the first one has two spaces between the first and last name whereas the second one only has one.

You can write an SQL UPDATE statement to clean up the problem, if you know how to use the REPLACE function. REPLACE does what it sounds like it would do: it reviews a source string and replaces all occurrences of a one string with another. For example, to replace all occurrences of Z with A in the string BZNZNZ you would code:

  REPLACE(‘BZNZNZ’,’Z’,’A’)

And the result would be BANANA. So, let’s create a SQL statement using REPLACE to get rid of any unwanted spaces in the NAME column of our EMPLOYEE table:

  UPDATE EMPLOYEE
     SET NAME = REPLACE(
                 REPLACE(
                  REPLACE(NAME, SPACE(1), '<>')
                  '><', SPACE(0))
                 '<>', SPACE(1));

"Wait-a-minute," you might be saying. "What are all of those left and right carats and why do I need them?"

OK, fair enough. Let’s explain how this works starting from the inside out. The inside REPLACE statement takes the NAME column and converts every occurrence of a single space into a left/right carat. The next REPLACE (working outward), takes the string we just created, and removes every occurrence of a right/left carat combination by replacing it with a zero length string. The final REPLACE function takes that string and replaces any left/right carats with a single space. The reversal of the carats is the key to removing all spaces except one – remember, we want to retain a single space anywhere there was a single space as well as anywhere that had multiple spaces. Try it, it works.

Of course, you can use any two characters you like, but the left and right carat characters work well visually. Be sure that you do not choose to use characters that occur naturally in the string that you are acting upon.


Finally, the SPACE function was used for clarity. You could have used strings encased in single quotes, but the SPACE function is easier to read. It simply returns a string of spaces the length of which is specified as the integer argument. So, SPACE(12) would return a string of twelve spaces.

3 comments:

Gwyn Pritchard, UK said...

Interesting thanks Craig - it is always good to see demonstrated what use could be made of seldom used SQL features, stretches the imagination. PS There are 2 commas missing from the SQL - easily fixed (",)

Gwyn

pricolatino said...

This was very helpful, thanks!

Craig S. Mullins said...

Yes, I am on Twitter. You can follow me at https://twitter.com/craigmullins