Azure HDInsight – Transform, Manage, and Prepare Data

If you provision an Azure HDInsight Apache Spark cluster, there exists a Jupyter notebook interactive environment. The Jupyter notebook environment is accessible by URL. If your HDInsight cluster is named brainjammer, for example, the Jupyter notebook environment is accessible at a web address similar to the following:

https://brainjammer.azurehdinsight.net/jupyter

Once you access the web‐based environment, you can upload an existing notebook or create a new Jupyter notebook. The default page resembles Figure 5.16.

FIGURE 5.16 Transforming data using Apache Spark Jupyter notebooks Azure HDInsight default

Notice that the exported notebook from Exercise 5.3 has already been uploaded. This is achieved by clicking the Upload button, selecting the exported Azure Synapse Analytics notebook (Ch05Ex03.ipynb), and uploading it. Figure 5.17 illustrates how the uploaded Jupyter notebook renders in an Azure HDInsight Apache Spark cluster.

FIGURE 5.17 Transforming data using Apache Spark Jupyter notebooks Azure HDInsight Jupyter notebook

Azure Data Studio

There is a Jupyter Extension for Azure Data Studio that enables you to work with Jupyter notebooks hosted on a public repository. After installing the extension, you can add and create Jupyter notebooks, as shown in Figure 5.18.

Once downloaded from a remote source or opened locally, it is possible to perform analytics on your local workstation using a Jupyter notebook created from any Big Data analytics product or service. The Jupyter notebook in Azure Data Studio resembles Figure 5.19.

A Jupyter notebook is useful for creating and sharing code written in many languages, which can then be run on numerous Big Data analytics products. Remember that a small amount of configuration is necessary on each platform to provide permissions to the data referenced within it. But in every case, it gives you a starting point for analyzing the data in a way that brings faster results.

FIGURE 5.18 Transforming data using Apache Spark Jupyter notebooks Azure Data Studio open Jupyter notebook

FIGURE 5.19 Transforming data using Apache Spark Jupyter notebooks Azure Data Studio Jupyter notebook

Raymond Gallardo

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *