As you transform data using Azure Synapse Analytics, there may be some failures when writing to the sink. The failures might happen due to data truncation, such as when the data type is defined as VARCHAR(50) but the value attempting to be inserted is 51 characters long. If a column does not allow nulls but one is being inserted, that would throw an error as well. The last example has to do with conversion, where the data type is defined as decimal(7,3), which is a number like 1234.567, but the transformation logic is attempting to insert a value like 12345.678. There are two approaches for handling these kinds of errors. The first is to configure the Error Output Settings for the Sink activity in the data flow, as shown in Figure 5.27.
FIGURE 5.27 Configuring error handling for the transformation
The configuration requires a linked service that is the location where the error log file will be written. An integration runtime is the compute power used to perform the writing of the log. The Error Rows section includes an Error Row Handling drop‐down list box. The default value is Stop on First Error (Default). When the default value is selected, an error occurs. It means that the data flow will stop, which will cause the pipeline to stop as well. This may or may not be desired. The alternative is to select the Continue on Error option, as shown in Figure 5.27, which does what its name implies. The Transaction Commit drop‐down has values of either Single or Batch, which determine whether the error log is written after each transaction or at the end when the transformation completes. Single has better performance but is not the most optimal for large datasets; if you do have a large dataset, you should then consider Batch as the Transaction Commit setting. By default, the Output to a Separate File check box is disabled, which means the error file is appended to during each run. This contrasts with when the setting is enabled, in which case a new file is created for each run. The Report Success on Error check box is also disabled by default. This means that the pipeline will show as errored, but it will complete with an error log for review. If the check box is enabled, the pipeline will show as being successful, even though some errors happened but were handled due to this error handling configuration.
Figure 5.27 also shows the other option for handling errors: an additional activity, NoNulls, that checks for nulls. If no null is found, then the row is passed on to the sink for transformation and storage. If there is a nulls, then the data will not flow to the sink and is abandoned. This is more of a preventative action versus responding to an error, i.e., data cleaning. However, it is an option you might consider when implementing some kind of error handling solution.