Now that you have performed some data transformation exercises, it is a good time to read about some applicable transformation and data management concepts. Transformation As you progressed through the exercises that transformed the...
Read More
Configure Error Handling for the Transformation– Transform, Manage, and Prepare Data
As you transform data using Azure Synapse Analytics, there may be some failures when writing to the sink. The failures might happen due to data truncation, such as when the data type is defined...
Read More
Transform Data by Using Scala– Transform, Manage, and Prepare Data
In Exercise 5.4 you used the Scala language to perform data transformation. You received a data file in Parquet format, transformed it to a more queryable form, and stored it in a delta lake....
Read More
Split Data – Transform, Manage, and Prepare Data
FIGURE 5.21 Splitting the data source—Projection tab FIGURE 5.22 Splitting the data sink—Optimize tab In Exercise 5.6 you created a data flow that contains a source to import a large CSV file from ADLS....
Read More
Transform Data Using Transact‐SQL – Transform, Manage, and Prepare Data
Transact‐SQL (T‐SQL), as mentioned previously, is an extension of the SQL language developed by Microsoft. In this chapter and preceding chapters, you have read about and used T‐SQL statements, functions, and commands. Any time...
Read More
Cleanse Data – Transform, Manage, and Prepare Data
%%pysparkdf = spark.read \.load(‘abfss://*@*.dfs.core.windows.net/SessionCSV/BRAINWAVES_WITH_ NULLS.csv’,format=’csv’, header=True) The final action to take after cleansing the data is to perhaps save it to a temporary table, using the saveAsTable(tableName) method, or into the Parquet file format....
Read More
Flatten, Explode, and Shred JSON– Transform, Manage, and Prepare Data
The first snippet of code imports the explode() and col() methods from the pyspark.sql.functions class. Then the JSON file is loaded into a DataFrame with an option stipulating that the file is multiline as...
Read More
Transform Data Using Azure Synapse Pipelines – Transform, Manage, and Prepare Data-3
One action you may have noticed in Exercise 5.1 is that you used the existing pipeline that you created in Exercise 4.13. That pipeline performed one activity, which was to copy data from the...
Read More
Transform Data Using Azure Synapse Pipelines – Transform, Manage, and Prepare Data
It does provide some benefit to understand the structure of the data you must ingest, transform, and progress through the other Big Data pipeline stages. It is helpful to know because as you make...
Read More
Transform Data Using Apache Spark—Azure Databricks – Transform, Manage, and Prepare Data
The Azure Databricks workspace should resemble Figure 5.12. FIGURE 5.12 Transforming data using an Apache Spark Azure Databricks workspace The first important point for Exercise 5.4 has to do with the location of the...
Read More