Transform Data Using Transact‐SQL – Transform, Manage, and Prepare Data

Transact‐SQL (T‐SQL), as mentioned previously, is an extension of the SQL language developed by Microsoft. In this chapter and preceding chapters, you have read about and used T‐SQL statements, functions, and commands. Any time you ran queries against a dedicated or serverless SQL pool, you were using T‐SQL. In many of those scenarios, you transformed data, so you have already experienced hands‐on activities concerning the transformation of data using T‐SQL. This section will review and expand on what you have already learned.

Chapter 2 “CREATE DATABASE dbBame; GO,” provided examples of the following two SELECT statements, one of which is SQL and the other T‐SQL. Although they result in the same output, only one would work on a dedicated SQL pool. Do you remember which statement is T‐SQL?

SELECT * FROM READINGS ORDER BY VALUE LIMIT 10;
SELECT TOP 10 (*) FROM READINGS ORDER BY VALUE;

The second statement is T‐SQL. The point about the difference between SQL and T‐SQL is that they both generally do the same thing, with some exceptions, but achieving the desired outcomes requires the proper syntax. You can find the full documentation for T‐SQL at https://docs.microsoft.com/sql/t-sql, which also includes content about some of the important DBCC, DDL, and DML topics for the exam.

In Exercise 2.1, you executed the following T‐SQL query against an Azure SQL database. This was one of your first exposures in this book to an Azure database and to the brainjammer brain wave data. This query performs a JOIN between the READING table and the FREQUENCY table and returns the friendly name of the FREQUENCY with a primary key of 1, along with the reading values. The FREQUENCY with a primary key of 1 is THETA, which has a relation to the brain’s creative abilities and usage.

SELECT F.FREQUENCY, R.VALUE
FROM READING R JOIN FREQUENCY F
ON R.FREQUENCY_ID = F.FREQUENCY_ID
WHERE R.FREQUENCY_ID = 1

In Exercise 3.7, you executed the following T‐SQL query on a dedicated SQL pool. This query has been reused in many examples. This is the form in which the data is transformed into business data that is ready for analysis and insight gathering. Exercise 3.7 used only a small amount of data, approximately 450,000 rows and 15 sessions. In Exercise 5.1, you learned that and now have access to all 150 sessions that include over 4.5 million rows of data.

SELECT SCENARIO, ELECTRODE, FREQUENCY, VALUE
FROM SCENARIO sc, [SESSION] s, ELECTRODE e, FREQUENCY f, READING r
WHERE sc.SCENARIO_ID = s.SCENARIO_ID
  AND s.SESSION_ID = r.SESSION_ID
  AND e.ELECTRODE_ID = r.ELECTRODE_ID
  AND f.FREQUENCY_ID = r.FREQUENCY_ID

In Exercise 4.8, you added the first line of code syntax to a Derived Column transformation as part of a data flow. The first row is not T‐SQL, but the second line is. Both lines of sample syntax result in the same output when transforming an epoch timestamp to T‐SQL timestamp. This example shows some different approaches to achieving the same outcome and points out the need for different syntactical structure.

toTimestamp(toLong(toDecimal(Timestamp, 14, 4) * (1000l)))
DATEADD(S, CAST([TIMESTAMP] AS DECIMAL(14, 4)), ‘19700101’) AS Timestamp

Most recently, you created a dedicated SQL pool stored procedure that created a fact table by using a CTAS statement. The following is the T‐SQL statement that performed the transformation from the raw TmpREADING table to the FactREADING table:

SELECT  se.SESSION_DATETIME, r.READING_DATETIME,
            s.SCENARIO, e.ELECTRODE, f.FREQUENCY, r.[VALUE]
    FROM    [brainwaves].[DimSESSION] se, [brainwaves].[TmpREADING] r,
            [brainwaves].[DimSCENARIO] s, [brainwaves].[DimELECTRODE] e,
            [brainwaves].[DimFREQUENCY] f
    WHERE   r.SESSION_ID = se.SESSION_ID AND se.SCENARIO_ID = s.SCENARIO_ID
            AND r.ELECTRODE_ID = e.ELECTRODE_ID
            AND r.FREQUENCY_ID = f.FREQUENCY_ID;

The point is that when you are working with dedicated SQL pools in Azure Synapse Analytics (previously called Azure SQL Data Warehouse), you are using T‐SQL to CRUD data. The same is true with data stored on serverless SQL pools.

Raymond Gallardo

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *