Oracle SQL: Normalize the UnNormalized

What is the problem?

Sometimes you are faced with a bad (poorly normalized) schema design. For example:

But, you may need the output to be in the following normalized form:

The correct solution may be to fix the schema design, but in most cases this will not be feasible.

Test Case 1:

We need to write our query to present the data as it should be stored, in spite of how it is being actually stored.

Let us build the following test case:

CREATE TABLE Cust_Phones
(
CustomerID INT PRIMARY KEY,
Phone1 VARCHAR(32),
Phone2 VARCHAR(32),
Phone3 VARCHAR(32)
);
INSERT into Cust_Phones (CustomerID, Phone1, Phone2, Phone3)
VALUES   (1,’705-491-1111′, ‘705-491-1110’, NULL);
INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)
VALUES (2,’613-492-2222′, NULL, NULL);
INSERT into Cust_Phones   (CustomerID, Phone1, Phone2, Phone3)
VALUES (3,’416-493-3333′, ‘416-493-3330’, ‘416-493-3339’);
Commit;

Select * from cust_phones;

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone
FROM
( SELECT CustomerID, Phone1, Phone2, Phone3 FROM Cust_Phones )
UNPIVOT ( Phone FOR Phones IN (Phone1, Phone2, Phone3));

The magic operator here is the “unpivot”

The “Phones” alias is saying: “extract a new row for every value you find in the columns Phone1, Phone2, and Phone3.

Test Case 2:

What if we have a more complicated case of 3 phone numbers with different types, like:

And we need to normalize the output to be as follows:

Let us build the following test case:

CREATE TABLE Cust_Phones2
(
CustomerID INT PRIMARY KEY,
Phone1 VARCHAR(32),
PhoneType1 CHAR(4),
Phone2 VARCHAR(32),
PhoneType2 CHAR(4),
Phone3 VARCHAR(32),
PhoneType3 CHAR(4)
);
INSERT into Cust_Phones2 VALUES (1,’705-491-1111′, ‘cell’, ‘705-491-1110’, ‘home’, NULL,NULL);
INSERT into Cust_Phones2 VALUES (2,’613-492-2222′, ‘home’, NULL, NULL, NULL, NULL);
INSERT into Cust_Phones2 VALUES (3,’416-493-3333′, ‘work’, ‘416-493-3330’, ‘cell’,’416-493-3339′, ‘home’);
commit;

select * from Cust_Phones2;

Now to normalize the output, we will write our query to use the unpivot operator as follows:

SELECT CustomerID, Phone, PhoneType
FROM
( SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes
, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,
SUBSTR(PhoneTypes, LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt
FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )
UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))
UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3)) )
WHERE idp = idpt;

We used one “unpivot” operator to normalize the “phone” è produced number of rows = number of phones of each customer.

We used another “unpivot” operator to normalize the “PhoneType” è produced number of rows = number of phone types of each customer.

Note that if you run only the middle select query:

SELECT CustomerID, Phone, PhoneType, Phones , PhoneTypes
, SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,
SUBSTR(PhoneTypes, LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt
FROM ( SELECT CustomerID, Phone1, Phone2, Phone3, PhoneType1, PhoneType2, PhoneType3 FROM Cust_Phones2 )
UNPIVOT (Phone FOR Phones IN (PHONE1, PHONE2, PHONE3))
UNPIVOT (PhoneType FOR PhoneTypes IN (PhoneType1, PhoneType2, PhoneType3))

You will get the following output:

Which is like a Cartesian product of the number of phones * the number of corresponding phone types

To get only the relevant rows, we added the last where condition:

WHERE idp = idpt;

The following part of the query:

SUBSTR(Phones, LENGTH(Phones) – REGEXP_INSTR( REVERSE(Phones) , ‘[^0-9]’) + 2, 32) as idp,

Tills us it is phone1 or phone2 or phone3. For example:

select substr(‘PHONE1’,LENGTH(‘PHONE1’) – REGEXP_INSTR(REVERSE(‘PHONE1′),'[^0-9]’)+2,32) from dual;

Will produce 1

While:

select substr(‘PHONE2’,LENGTH(‘PHONE2’) – REGEXP_INSTR(REVERSE(‘PHONE2′),'[^0-9]’)+2,32) from dual;

Will produce 2

and so on.

The following part of the query:

SUBSTR(PhoneTypes, LENGTH(PhoneTypes) – REGEXP_INSTR(REVERSE(PhoneTypes) , ‘[^0-9]’) + 2, 32) as idpt

Tills us it is phonetype1 or phonetype2 or phonetype3. For example:

select substr(‘PHONETYPE1’,LENGTH(‘PHONETYPE1’) – REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’)+2,32) from dual;

Will produce 1

Select REGEXP_INSTR(REVERSE(‘PHONETYPE1′),'[^0-9]’) from dual;

Will return 2, which is the position of the first non-numeric character from the end of the string.

Test Case 3:

What if you add a new phone number to the above table “Phone4” & “PhoneType4”,in this case you have to modify your queries. But is there a more dynamic way to build the UNPIVOT queries without advanced knowledge of the number of Phone/PhoneTypes? . This may be a subject of another future blog !

References:

A similar SQL Server case:

https://www.mssqltips.com/sqlservertip/3000/use-sql-servers-unpivot-operator-to-help-normalize-output/

Donate to support our blogging work

$1.00

Thanks
Ahmed

Oracle SQL: Normalize the UnNormalized

What is the problem?

Test Case 1:

Test Case 2:

Like this:

Leave a ReplyCancel reply

What is the problem?

Test Case 1:

Test Case 2:

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Ahmed AbdelFattah Blog !