Search Icon

Dataflows in SAP Data Warehouse Cloud

20 December 2024

banner image

SAP Data Warehouse Cloud (DWC) is a cloud-based data warehousing solution that combines both efficient data management and advanced analytics.

Dataflows have been introduced in SAP DWC as an easy-to-use data modeling experience for ETL requirements. It allows us to load and combine structured and semi-structured data from different data sources (SAP and non-SAP) like cloud file storage, database management systems (DBMS), or SAP S/4HANA and assists with standard data transformation capabilities and scripting for advanced requirements.

Dataflow builder architecture

The Dataflow builder leverages parts of the HANA Cloud-powered SAP Data Intelligence Cloud (DIC). The Data Warehouse cloud is built on top of the HANA cloud and the SAP Data Intelligence cloud is embedded into the DWC in the form of the Data Flow Builder and offers ETL functionalities. When the Dataflow builder is being used in the DWC, it uses a dedicated subset of the Data Intelligence cloud functionality. i.e., on triggering a Dataflow execution, there is a Data Intelligence pipeline generated in a side-by-side Data Intelligence cluster.

SAP Data Warehouse Cloud Dataflows


Figure 1: SAP HANA cloud services (Refer link)

Data Views vs Dataflows

How is a Dataflow different from a Data view? This is detailed in the table below.

DATA VIEW BUILDER

DATA FLOW

The main aim of the Data view builder is Data federation

The main aim of Dataflows is to persist data

Data outside the DWC is made accessible as one integrated dataset

Enables working with large data sources like datalakes, where federation would cause slow response times

Supports Graphical and SQL builder views and a standard set of data transformations

Supports a Graphical view and standard set of transformations; Also provides Python scripting functionality

Supports connections that in-turn support federation, real-time replication, or momentary data snapshots

Draws from a richer network of connections, including non-SAP sources, cloud file storage, or APIs

Single output structure, in an inherited form. The target will also be federated

Results come in multiple, definable output structures - you can choose to add/replace the data in an existing table or create a new output. The target will be persisted

One strategy is to use the Data view builder and Dataflows in a way that they complement each other – using Dataflows to move data from multiple sources to DWC and then, using the view builder to build quick insights.

Messer Webinar

Data operations in Dataflows

Dataflows offer several standard data operations (similar to those available in the Data view builder) which can be used to model data, such as Unions, Joins, Projections, Filters, Aggregations. One major advantage of Dataflows in DWC is that it includes a ‘Script’ operator which can be used to perform more advanced transformations in Python.

SAP Data Warehouse Cloud Dataflows


Figure 2: Data operators in DWC – marked in red (left to right – Join, Union, Projection, Aggregation, and Script)

Python scripting in Dataflows

The Script operator currently runs on Python 3.6.x. It allows for data manipulations and vector operations in Python by providing support for NumPy and Pandas modules. NumPy and Pandas functions can be referenced by aliases np and pd directly within the transform function without any explicit imports.

The incoming data is fed into the data parameter of the transform function of the Script node. It is accessible within the function as a pandas DataFrame for further data transformations. The return from this function is sent to the output.

It is important to note that the returning DataFrame from the transform function has the same column namesand types as specified in the output schema of the operator. Otherwise, the execution results in a failure.

NOTE: The operator is executed in sandbox mode; accessing the file system or network and importing other Python modules is restricted, as much as building classes and using coroutines. Restricted Pandas and NumPy functions are listed in the help section (in the Properties pane of the Script node). Updates if any to the Python scripting documentation are also added here.

Dataflow execution with Python scripting will be discussed in detail in an upcoming blog.

Further reading:

Related Blogs

18 June 2025

Why Clean Core for your Journey to RISE and AI

A group of round wooden circles with black people icons

16 May 2025

Roles and Authorization – The Often-Neglected Aspect of a S/4HANA Migration Journey

Celerite Assessment Webinar for S/4Hana Migration

25 April 2025

Why the Right Assessment is Key to a Successful S/4HANA Migration

Need-For-Speed

20 December 2024

I feel the need, the need for speed (How to speed up your SAP S/4HANA Digital Transformation)

Data-Warehouse-Cloud_banner

20 December 2024

Integration of Google BigQuery with SAP Data Warehouse Cloud (DWC)

performance-optimization-in-sap-business-objects-data-services_Banner

20 December 2024

Performance Optimization in BODS

accounts-payable-automation-challenges-banner

20 December 2024

How to Overcome the Challenges in Accounts Payable Automation

ui-enhance-businesses-in-accounts-payable-process

20 December 2024

How Intuitive UI Drives Productivity, Improves ROI, and Enhances Collaboration in Accounts Payable Process

SAP Data Warehouse Cloud Dataflows | Data Warehouse Solutions