A close-up view of a yellow rotary telephone.
Data Governance, SQL Scripting, The Museum System

Standardizing your Constituent Phone Numbers

Did you ever notice that American and Canadian phone numbers are both prefaced with the international calling code of “1”? This is because twenty countries, primarily in North America and the Caribbean, participate in the North American Numbering Plan (NANP). Let’s examine how these numbers are structured, establish standards for data entry and then apply those standards to our existing records.

The NANP number format consists of three parts: a three digit area code, a three digit exchange code, and a four digit subscriber number. Traditionally, if you were going to place a call to a number within the same area code, you only needed to dial the exchange code and subscriber number, referred to as 7-digit dialing. In metropolitan areas, there is a greater demand for numbers, especially with the proliferation of cellular telephones. These demands were met with the introduction of new area codes, but necessitated the need for 10-digit dialing.

Since area codes where only required when placing long distance calls, the traditional formatting convention for phone numbers is “(NXX) NXX-XXXX”, wrapping the area codes in parathenses. The new formatting convention of “NXX-NXX-XXXX” better supports the mandatory 10-digit dialing.

Profiling the Data

So what format do our phone numbers us in TMS? We can answer that question by profiling our data. We will run the following script against our database. The script changes all the integers in our phone numbers to “9” so that we can examine the existing patterns, without changing any records.

WITH Tally AS (
	SELECT TOP (50) ROW_NUMBER() OVER (ORDER BY @@SPID) AS N FROM sys.all_columns
), NumberPattern AS (
    SELECT CONVERT(NVARCHAR(MAX), CONVERT(XML,
		STUFF((SELECT CASE WHEN PATINDEX('%[0-9]%',CHR) = 1 THEN '9' ELSE UPPER(CHR) END
			FROM (
				SELECT N, SUBSTRING(PhoneNumber, N, 1) AS CHR 
				FROM Tally
				WHERE N <= DATALENGTH(PhoneNumber)) AS [1]
			ORDER BY N
			FOR XML PATH('')), 1, 0, ''))) AS Pattern
    FROM ConPhones 
)
SELECT Pattern, COUNT(*) AS Quantity
FROM NumberPattern
GROUP BY Pattern
ORDER BY COUNT(*) DESC

The script will return results, similar to the following:

Pattern                   Quantity
------------------------- -----------
(999) 999-9999            51
999-999-9999              37
999 999 9999              28
999.999.9999              13
999 999-9999              6
+9.999.999.9999           6
999-9999                  3
FAX: 999 999 9999         3
FAX: 999-9999             2
BUS FAX:999 999 9999      2

Looking over these results, we see that some of our phone numbers include text (“FAX:” or “BUS FAX”) that belong in either the Description or the new Phone Type field. Fortunately, there are very few of these, and we can probably manually correct them. We also see that a number of our records were probably copied from business cards or email signature lines that were stylized for print (“999 999 9999” or “999.999.9999”).

After profiling is completed, we can determine a standard for phone number formats. This is an institutional decision, but given that 10-digit dialing is becoming more norm than exception, I lean towards the 10-digit “NXX-NXX-XXXX”. After codifying your decision in the data entry manual, we will begin modifying the existing numbers to reflect our newly adopted standard.

Apply the Standard

There are sophisticated ways to do this, but most database administrators are familiar with the REPLACE function. Here, the LIKE statement uses simple regular expressions (REGEX) to target only patterns of integers.

UPDATE ConPhones SET
    PhoneNumber = REPLACE(REPLACE(PhoneNumber,') ','-'),'(','')
WHERE PhoneNumber LIKE '([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'

Alternately, we can use single character wildcards instead of the REGEX. However, depending upon your profiles, this may not have the desired effect. For instance, “(FAX) 999-9999” would be transformed into “FAX-999-9999”.

UPDATE ConPhones SET
     PhoneNumber = REPLACE(REPLACE(PhoneNumber,') ','-'),'(','')
WHERE PhoneNumber LIKE '(___) ___-____'

Work through all your profiles, using either human middleware or SQL scripting, to implement your new data standard.

Thanks for taking the time to read my post – I hope you found it helpful.
Leave feedback in the comments below.

If this post save you time, then consider sharing it with your friends and colleagues. If it saved you some money, then please consider buying me a coffee. If you want to know when I publish new content, then subscribe to my feed. Your support encourages me to write more. Thanks!

Standardizing your Constituent Phone Numbers

Leave a Comment