Remember, as you begin this section, that Chapter 7, “Design and Implement a Data Stream Processing Solution,” is devoted to data streaming. The content in this section will therefore target the Azure Stream Analytics feature that performs data transformation. Consider reviewing Exercise 3.17, where you provisioned an Azure Stream Analytics job. That exercise is followed by a thorough overview of the capabilities and features of Azure Stream Analytics. Also review Exercise 3.16, where you provisioned an Azure Event Hub. You will use that Azure Event Hub to manage the stream input into the Azure Stream Analytics job.
When you look at the existing brainjammer brain wave data, you will notice that the session—for example, ClassicalMusic—in which it was collected is included in the file. The objective of the data analysis on these 4.5 million rows of collected data is to determine which session the individual sending the brain waves is doing, in real time or near real time. Therefore, the structure of the data will resemble the following, to not include any session information. A sample JSON document that contains a single reading is available on GitHub at https://github.com/benperk/ADE, in the BrainwaveData/SessionJson directory. The JSON document named POWReading.json is structured in the following summarized format:
{“ReadingDate”: “2021-07-30T09:26:25.54″,”Counter”: 0,”AF3″:{“THETA”: 17.368,”ALPHA”: 2.809,”BETA_L”: 2.72,”BETA_H”: 2.725,”GAMMA”: 1.014},”T7″: { … },…
Figure 5.20 illustrates an example of a Azure Stream Analytics query. The query will transform the JSON document, which contains a brain wave reading, in real time, from an event hub and pass it to Azure Synapse Analytics.
FIGURE 5.20 Transforming data using Azure Stream Analytics JSON
What is and is not considered a transformation is in the eye of the beholder. The JSON document includes a Counter attribute which is not passed along to Azure Synapse. As you learned in Chapter 3, “Data Sources and Ingestion,” the Azure Stream Analytics query language has many capabilities that can be useful in additional transformations, such as aggregate functions (e.g., AVG, SUM, and VAR), conversion functions (e.g., CAST and TRY_CAST), and windowing functions (e.g., tumbling and sliding). At this moment, it is not clear what the final query here will be, as the brainjammer brain wave data has yet to be analyzed thoroughly. It is getting close to that point, as you have learned how to ingest it and then transform it into a structure that is amenable to insight gathering and pattern recognition.
Cleanse Data
In addition to managing the platform the analysis of your data happens on, you must manage the data itself. There can be scenarios where portions of your data are missing, there is unnecessary or corrupted data, or even the whole data file or table is no longer available. The point is that it takes more time than you might expect to make sure the data used for gathering business insights is and remains valid. If the data itself has lost its integrity, then any analytics performed, and the resulting findings, could lead to making wrong decisions. Perform Exercise 5.5 to cleanse some brain wave data.