There is a lot of history surrounding the encoding and decoding of data. Fundamentally, this concept revolves around how to store and render letter characters. As you know, all things that are computed must...
Read More
Azure Cosmos DB—Shred JSON– Transform, Manage, and Prepare Data
FIGURE 5.23 Shredding JSON with Azure Cosmos DB The query you executed in step 4 begins with a SELECT, which is followed by the OPENROWSET that contains information about the PROVIDER, CONNECTION, and OBJECT.SELECT...
Read More
Split Data – Transform, Manage, and Prepare Data
FIGURE 5.21 Splitting the data source—Projection tab FIGURE 5.22 Splitting the data sink—Optimize tab In Exercise 5.6 you created a data flow that contains a source to import a large CSV file from ADLS....
Read More
Transform Data Using Stream Analytics – Transform, Manage, and Prepare Data
Remember, as you begin this section, that Chapter 7, “Design and Implement a Data Stream Processing Solution,” is devoted to data streaming. The content in this section will therefore target the Azure Stream Analytics...
Read More
Azure HDInsight – Transform, Manage, and Prepare Data
If you provision an Azure HDInsight Apache Spark cluster, there exists a Jupyter notebook interactive environment. The Jupyter notebook environment is accessible by URL. If your HDInsight cluster is named brainjammer, for example, the...
Read More
Jupyter Notebooks – Transform, Manage, and Prepare Data
Throughout the exercises in this book, you have created numerous notebooks. The notebooks are web‐based and consist of a series of ordered cells that can contain code. The code within these cells is what...
Read More
Transform Data Using Transact‐SQL – Transform, Manage, and Prepare Data
Transact‐SQL (T‐SQL), as mentioned previously, is an extension of the SQL language developed by Microsoft. In this chapter and preceding chapters, you have read about and used T‐SQL statements, functions, and commands. Any time...
Read More
Cleanse Data – Transform, Manage, and Prepare Data
%%pysparkdf = spark.read \.load(‘abfss://*@*.dfs.core.windows.net/SessionCSV/BRAINWAVES_WITH_ NULLS.csv’,format=’csv’, header=True) The final action to take after cleansing the data is to perhaps save it to a temporary table, using the saveAsTable(tableName) method, or into the Parquet file format....
Read More
Shred JSON– Transform, Manage, and Prepare Data
When you shred something, the object being shredded is torn into small pieces. In many respects, it means that the pieces that result from being torn are in the smallest possible size. In this...
Read More
Flatten, Explode, and Shred JSON– Transform, Manage, and Prepare Data
The first snippet of code imports the explode() and col() methods from the pyspark.sql.functions class. Then the JSON file is loaded into a DataFrame with an option stipulating that the file is multiline as...
Read More