Skip to the content.

Azure Database

This is not an exhaustive documentation of all the existing Azure Services. These are summarized notes for the Azure Certifications.
To see the complete documentation, please go to: Azure documentation

Evolution of Databases

Traditionally, databases focused on transaction processing. However, the need for efficient data analysis led to the creation of data warehouses, specifically designed for reporting, utilizing SQL for querying.

Enter Data Lakes and Big Data As unstructured data gained prominence, data lakes became pivotal for comprehensive data storage. Big data analytics, often powered by Apache Spark, provided insights into diverse data types like documents and images.

Back to the top

Azure Synapse Analytics

Microsoft Azure Synapse Analytics bridges the gap between structured and unstructured data analytics. This single service offers:

Azure Synapse Pipelines

Azure Synapse Analytics extends its capabilities with Azure Synapse Pipelines, a robust tool for data extraction and transformation.

For example, a pipeline could:

Building Pipelines: Code or No-Code

There are a couple of different approaches you could take to building a pipeline: one using code and the other without using code.

Code Method

  1. Create a Synapse Workspace which will be used for secure analytics.

  2. Create a Linked Service inside the Synapse workspace.

  3. This would then be used to connect to data sources like Cosmos DB.

  4. Set up Spark and SQL pools.

  5. Create a Notebook containing code that copies data from Cosmos DB to the Spark cluster.

  6. Add analytics code to the notebook to create statistics about the data.

  7. Add code to store the results in the SQL pool.

  8. Schedule notebook execution for recurring tasks.

No-Code Method

  1. Create a Synapse Workspace which will be used for secure analytics.

  2. Create a Linked Service inside the Synapse workspace.

  3. Use the Link service to establish connections for Cosmos DB and SQL pool.

  4. Create a graphical pipeline with a data flow for activities like data loading, transformation, and storage.

  5. Configure activities without writing code.

  6. Set up pipelines for automatic execution at specified intervals.

Back to the top

Cost Management for Spark and SQL Pools

Both Spark and SQL pools operate as clusters of virtual machines, impacting costs. Azure provides mechanisms for cost control:

Azure Synapse Analytics emerges as a unified solution, eliminating the silos between structured and unstructured data analytics, and offering flexibility in data orchestration through pipelines.

For more information: Azure Synapse Analytics documentation.

Resources