🎯 What You’ll Learn
In this module, you’ll:
-
Understand what Data Flows are in Synapse
-
Create a new Data Flow and link it to a pipeline
-
Add transformations like filters, derived columns, joins
-
Test and monitor the transformation step
🧠What Are Data Flows in Synapse?
Data Flows are like the "Power Query" of Azure Synapse. They let you:
-
Clean, shape, and enrich data visually (no code needed)
-
Apply logic like filters, joins, conditional columns
-
Transform big data at scale using Spark behind the scenes
🛠️ Step-by-Step: Build Your First Data Flow
🔹 Step 1: Go to Synapse Studio → Orchestration
-
Navigate to "Integrate" → + New → Data Flow
-
Name it
TransformCustomerData
📸 Image Tip: Show blank data flow canvas
🔹 Step 2: Add a Source
-
Click + Add Source
-
Choose or create a dataset (e.g., Blob, SQL Table)
-
Configure schema and sampling
🔹 Step 3: Add Transformations
-
From the top bar:
➕ Click Add transformation
Choose one of the following:
Transformation | Use Case |
---|---|
Filter | Remove unwanted rows |
Derived Column | Add a calculated field |
Select | Drop columns |
Join | Merge with another dataset |
Conditional Split | Apply logic like IF-ELSE |
Aggregate | Group by and summarize data |
📸 Image Tip: Transformation path visual (source → filter → sink)
🔹 Step 4: Add a Sink (Destination)
-
Choose or create a new dataset (e.g., SQL table, CSV, etc.)
-
Map columns from source to sink
🔹 Step 5: Debug and Preview
-
Use the Debug button to run and preview rows
-
Check how transformations affect your data
🔹 Step 6: Add This Data Flow to Your Pipeline
-
Go back to your existing pipeline
-
Drag in the Data Flow Activity
-
Link it to the data flow you just created
✅ Now your pipeline includes transformation logic before loading data!
💡 Pro Tips
-
You can chain multiple transformations
-
Use expressions (like
iif(condition, result1, result2)
) for custom logic -
Use caching to test small batches without rerunning the full flow
No comments:
Post a Comment