Remember, as you begin this section, that Chapter 7, “Design and Implement a Data Stream Processing Solution,” is devoted to data streaming. The content in this section will therefore target the Azure Stream Analytics...
Read More
Azure HDInsight – Transform, Manage, and Prepare Data
If you provision an Azure HDInsight Apache Spark cluster, there exists a Jupyter notebook interactive environment. The Jupyter notebook environment is accessible by URL. If your HDInsight cluster is named brainjammer, for example, the...
Read More
Jupyter Notebooks – Transform, Manage, and Prepare Data
Throughout the exercises in this book, you have created numerous notebooks. The notebooks are web‐based and consist of a series of ordered cells that can contain code. The code within these cells is what...
Read More
Transform Data Using Transact‐SQL – Transform, Manage, and Prepare Data
Transact‐SQL (T‐SQL), as mentioned previously, is an extension of the SQL language developed by Microsoft. In this chapter and preceding chapters, you have read about and used T‐SQL statements, functions, and commands. Any time...
Read More
Cleanse Data – Transform, Manage, and Prepare Data
%%pysparkdf = spark.read \.load(‘abfss://*@*.dfs.core.windows.net/SessionCSV/BRAINWAVES_WITH_ NULLS.csv’,format=’csv’, header=True) The final action to take after cleansing the data is to perhaps save it to a temporary table, using the saveAsTable(tableName) method, or into the Parquet file format....
Read More
Shred JSON– Transform, Manage, and Prepare Data
When you shred something, the object being shredded is torn into small pieces. In many respects, it means that the pieces that result from being torn are in the smallest possible size. In this...
Read More
Flatten, Explode, and Shred JSON– Transform, Manage, and Prepare Data
The first snippet of code imports the explode() and col() methods from the pyspark.sql.functions class. Then the JSON file is loaded into a DataFrame with an option stipulating that the file is multiline as...
Read More
Encode and Decode Data– Transform, Manage, and Prepare Data
The output is SQL_Latin1_General_CP1_CI_AS, which is the default (refer to Figure 3.28). GO INSERT INTO [dbo].[ENCODE] ([ENCODE_ID], [ENCODE]) VALUES (1, ‘殽’)INSERT INTO [dbo].[ENCODE] ([ENCODE_ID], [ENCODE]) VALUES (2, ‘Ž’)INSERT INTO [dbo].[ENCODE] ([ENCODE_ID], [ENCODE]) VALUES (3,...
Read More
Transform Data Using Apache Spark—Azure Synapse Analytics – Transform, Manage, and Prepare Data-1
Transform Data Using Apache SparkApache Spark can be used in a few products running on Azure: Azure Synapse Analytics Spark pools, Azure Databrick Spark clusters, Azure HDInsight Spark clusters, and Azure Data Factory. The...
Read More
Transform Data Using Azure Synapse Pipelines – Transform, Manage, and Prepare Data-3
One action you may have noticed in Exercise 5.1 is that you used the existing pipeline that you created in Exercise 4.13. That pipeline performed one activity, which was to copy data from the...
Read More