Headder AdSence

Transform Data Visually in Azure Synapse Using Data Flows (No-Code Guide)

 

🎯 What You’ll Learn

In this module, you’ll:

  • Understand what Data Flows are in Synapse

  • Create a new Data Flow and link it to a pipeline

  • Add transformations like filters, derived columns, joins

  • Test and monitor the transformation step




🧠 What Are Data Flows in Synapse?

Data Flows are like the "Power Query" of Azure Synapse. They let you:

  • Clean, shape, and enrich data visually (no code needed)

  • Apply logic like filters, joins, conditional columns

  • Transform big data at scale using Spark behind the scenes


🛠️ Step-by-Step: Build Your First Data Flow


🔹 Step 1: Go to Synapse Studio → Orchestration

  • Navigate to "Integrate" → + New → Data Flow

  • Name it TransformCustomerData

📸 Image Tip: Show blank data flow canvas


🔹 Step 2: Add a Source

  • Click + Add Source

  • Choose or create a dataset (e.g., Blob, SQL Table)

  • Configure schema and sampling


🔹 Step 3: Add Transformations

  • From the top bar:
    ➕ Click Add transformation
    Choose one of the following:

TransformationUse Case
FilterRemove unwanted rows
Derived ColumnAdd a calculated field
SelectDrop columns
JoinMerge with another dataset
Conditional SplitApply logic like IF-ELSE
AggregateGroup by and summarize data

📸 Image Tip: Transformation path visual (source → filter → sink)


🔹 Step 4: Add a Sink (Destination)

  • Choose or create a new dataset (e.g., SQL table, CSV, etc.)

  • Map columns from source to sink


🔹 Step 5: Debug and Preview

  • Use the Debug button to run and preview rows

  • Check how transformations affect your data


🔹 Step 6: Add This Data Flow to Your Pipeline

  • Go back to your existing pipeline

  • Drag in the Data Flow Activity

  • Link it to the data flow you just created

✅ Now your pipeline includes transformation logic before loading data!


💡 Pro Tips

  • You can chain multiple transformations

  • Use expressions (like iif(condition, result1, result2)) for custom logic

  • Use caching to test small batches without rerunning the full flow

No comments:

Post a Comment