June 22, 2022 - Data AnalyticsData EngineeringManufacturing

4 Steps to Overcome SAP Integration Challenges

dyvenia_michal_zawadzki Michal Zawadzki - June 22, 2022

Finding a solution to overcome SAP integration challenges can help you get a holistic overview of all your organization's data, and therefore, improve the quality of your insights. This article explains how dyvenia's data engineering team managed to solve this challenge in a 4-step process, enabling us to create a working connector prototype within two hours and, then, provide the complete Minimum Viable Product (MVP) in three months.

The Problem

Integration challenges

Organizations are, on average, using 976 individual applications (compared to 843 a year ago). Yet only 28% of these applications are integrated, indicating there is still an enormous opportunity to improve the digital experience.

The main culprit behind this low number are legacy applications. Due to historical reasons related to the adoption of open-source software and open standards, many old applications implemented custom interfaces that, at some point, would become widespread. With mass adoption of open source software within the last 10 years and the resulting standardization across the field, most modern data applications are pluggable by default – they expose common, highly standardized and very popular interfaces such as REST APIs, JDBC, and ANSI SQL. They also show data in standard formats – typically JSON. Many high-quality, free tools exist that help upload, download, and work with data coming from these interfaces in those formats.

This situation was not the case 20+ years ago when they were building the first applications such as SAP or the Oracle database. In those dark ages of the data world, vendors used custom interfaces and often exposed data in custom or obscure formats. To work with this data, you either had to purchase a targeted, dedicated solution or build your own from scratch – and you could not rely on any of the standard tooling or building blocks of today. There were no giants yet on whose shoulders you could stand. Not to mention, maintenance was also challenging, as each version of the software could include changes to its interface.

Client’s SAP Connectivity Challenge

One of the most widely used applications in the enterprise landscape is SAP. Its ecosystem consists of dozens of products that offer similar functionality, yet require very different connectivity setups. Many of them lock you into webs of applications that are tightly integrated or even dependent on each other, which makes it hard to incorporate them with the rest of the IT infrastructure (or even other SAP products), and require you to purchase additional solutions.

When coupled with the vendor’s traditional book format documentation, the approach above creates a situation where it may take weeks or even months to properly research which products are the best while fitting the overall company’s technology stack IT strategy. Integrating these products into the client’s infrastructure might be more complicated than developing an integration solution from scratch..

Our client struggled with the same challenge.

  • Finance wanted to analyze data from SAP; and
  • then combine that data with data coming from other, non-SAP sources (ie., eliminate the data silo problem of SAP).

When we came in, the client had been evaluating several options. We committed to providing the complete Minimum Viable Product (MVP), including several data extracts, a data lake implementation, data modeling, and visualization within three months.


The solution consisted of several modern technologies and techniques:

  1. Orchestration layer (ELT pipelines etc.) written in Python;
  2. A data lake to serve as the company’s centralized data storage;
  3. A database to enable fast dashboarding.

1. Orchestration layer

We use Python for the orchestration layer for several reasons:

  • Automation: it follows the IaC (Infrastructure as Code) principle;
  • Flexibility: it allows expressing any possible business logic;
  • Usability: it’s very abstract (natural language-like), requiring only basic knowledge of computer science or programming;
  • Learnability: Many analysts already use it, so it makes it much easier to implement self-service ELT.

2. Data lake

As for data lake, it’s the standard solution to get rid of data silos. The main benefits are summarized below:

  • Capacity: data lakes are basically infinite storage;
  • Pluggability: data lakes integrate with vast amounts of software. There are many ways to get the data in, out, or to work with the data within it;
  • Flexibility: data lakes can store data in any format – be it standard data formats such as CSV or Parquet or non-structured data, e.g., PDF or HTML documents;
  • Reliability: well set up data lakes offer essentially 100% reliability.

3. Analytical database

While data lakes can store any kind of data for multiple purposes, fast handling of analytical data requires purpose-built solutions. Modern lakehouse solutions allow data visualization without moving it outside the data lake. However, implementing a lakehouse requires several additional components. It also takes significant engineering efforts (even with popular solutions such as Databricks). If you build your data infrastructure and related processes from scratch, it’s probably a good idea to use the lakehouse architecture. In many cases, though, simply adding an OLAP database for final (filtered and aggregated) data will be much faster, cheaper, and require far less maintenance.

Our journey to the solution


We approach our projects in an agile manner. We focus on building a Minimum Viable Product (MVP) first and utilizing our existing architecture and processes.

This way, we can do two things:

  • Build bold and robust data products quickly. We work in close collaboration with the client, allowing us to incorporate their feedback whenever needed.
  • Make opinionated choices & use recipes, which allows us to build off of our existing solutions and processes to move efficiently and effectively.


Recipes are detailed, step-by-step instructions for performing tasks. They allow us to tap into our collective experience and move faster than we would otherwise. We estimate that following a recipe is on the order of 10-100x quicker than following information- or explanation-oriented documentation. Following recipes also allows for easier automation, since a big part of automation is documenting the step-by-step, repeatable process required to execute a task. This recipe approach appears in different places:
  • Process-oriented mindset.
  • Documentation.
  • Infrastructure.
Whenever dealing with any repeatable activity, we aim to create a process and then codify that process into a recipe. This recipe exists either as a document or as a piece of code. This process is part of why we build our infrastructure – including our data infrastructure – as code.


We split tasks in two: finding a connectivity method to build our prototype and ensuring that this method was at least relatively viable. Having the correct contact points on the client’s side (for business logic and technical support) was also crucial in ensuring that we could move quickly.

The following SAP blog article was an important starting point.

Within an hour we started prototyping the solution. Since we have built an in-house Python connector library, it was easy for us to incorporate the new connector into it. It took us two hours to implement the connector and add a simple SQL interface on top of it.  We continued writing the first data extracts and models, adding functionality, tests, and fixing bugs as they came.


During the first few days, we discovered some limitations of the RFC solution. For example, it does not allow filters to have more than 75 characters. Another limitation is that it is not possible for any row in the result to contain more than 512 characters.

However, thanks to our experience handling data programmatically, it was easy to overcome both limitations, not with workarounds but with general and reliable solutions. In particular, by implementing client-side filtering, we fixed the filtering issue. This implementation requires sending as many filters as possible to the pyRFC connector. We apply any filters that don’t fit to the partly filtered data after the client receives it.

For the 512 character limit, we use table metadata to divide the table into blocks and internally create a separate query for each block. Then, the blocks concatenate on the client-side. In our solution, we have abstracted this entire process away from the analyst. They specify their SQL query as expected, and the library does any special handling of the request/data internally.


After building the prototype, we researched the space further and came across a fantastic summary of SAP’s products and connectivity methods. Thank you, Kai Waehner! Thanks to the rapid prototyping of the pyRFC solution and the tremendous research done by Kai, we were able to assess the landscape quickly and conclude that for this use case, the pyRFC approach would be far quicker and cheaper to implement than other solutions.

True, it is sometimes considered a legacy solution. It can be hard to debug and has several poorly documented limitations. The code execution environment also has to be specifically customized to run the RFC connector: SAP RFC requires the installation of a custom proprietary driver into the environment. However, it gets the basic job of reading data from SAP done, and it does it fast and reliably enough that we decided we can add the missing functionalities on our end. We filled the gaps in functionality, codified and automated the environment, and abstracted away from the interface with simple python and SQL, making it an analyst-friendly solution that is also easy to deploy and operate.


With our practical approach, we had a working connector prototype within two hours, and we could spend the rest of the time iterating on it for the full MVP solution. We spent another few weeks adding tests, and features, getting it production-ready, and producing analytical insights and data models. These actions all happened concurrently from the moment we were able to establish connectivity. Though the MVP infrastructure and analytics ultimately took three months to deliver, we could provide a robust and feature-rich analytics solution in that time, including infrastructure, code, data models, dashboards, and documentation.


Follow these 4 steps to overcome SAP integration challenges.

  1. Find a way to prototype a solution
  2. Build the prototype, evaluating it continuously against real-world scenarios, and improving usability for end-users as you go
  3. Evaluate the prototype against other possible solutions and determine whether it’s worth it to pursue them (usually they offer marginal improvement for a lot of extra work)
  4. Integrate the prototype into your MVP by adding components such as security, monitoring, etc.


Michal Zawadzki

The Data Platform Team Lead. With his passion for data engineering and Python programming, he designs and implements data platforms for clients using DataOps principles. Given his commitment to delivering fast and efficient solutions, he works together with a team of data engineers to build our own tools that give our clients the control, ownership, and flexibility to customize these tools.

Our Blog

Career adviceData AnalyticsData Engineering

Event recap—Women in Data: How to Start and Grow Your Career

Get inspired by success stories of women in data from the past and present and catch some tips on starting and navigating your own data career.

Karolina Soppa - May 8, 2023
Augmented Analytics enables companies to increase data quality, improve efficiency, and obtain insights quickly.
Business IntelligenceData Analytics

Data Analytics Just Got Smarter: Understanding Augmented Analytics

In this article, you will discover the benefits, challenges and use cases of Augmented Analytics.

Ira Kovalchyk - February 20, 2023
Machine Learning

Can Machine Learning Help Us Find New Earths?

In this article, you will learn about challenges in the search of exoplanets that can be addressed by machine learning and deep learning.

Diego Hidalgo - October 20, 2022
The future of the supply chain
Business IntelligenceData AnalyticsData EngineeringManufacturingSupply

The Future of the Supply Chain: Data challenges, solutions, and success stories

Although data bottlenecks and silos continue to frustrate supply chains around the world, the article illustrates how a firm grasp of the importance of data foundations can lead to success.

Wiktoria Kuzma - October 13, 2022
Business IntelligenceCareer adviceData AnalyticsData Engineering

If Batman and Spiderman worked in the data world, they would definitely be…

Read the stories our team members shared at dyvenia’s first event in its second season of events for data practitioners.

Data EngineeringManufacturing

Data Challenges of Carbon Accounting for Companies

This article presents three carbon accounting challenges and details steps on how to overcome them.

Alessio Civitillo - September 28, 2022
dyvenia scrum
Business IntelligenceData AnalyticsData Engineering

How are we using Scrum to consistently deliver value?

Using Scrum can help your team solve challenging issues by following a simple and agile framework. Scrum aids teams in concentrating on what really matters, enabling them to collaborate effectively and adapt to changing circumstances. Read the following article to learn about the Scrum fundamentals and how we’ve implemented the framework in dyvenia.

top 4 must-haves for data-driven marketing
Data AnalyticsData Engineering

Top 4 Must-Haves for Data-Driven Marketing

In this article, you will learn about the top 4 must-haves for data-driven marketing every marketer needs to know to take their data game to the next level.

Wiktoria Kuzma - August 18, 2022
5 steps to create effective Tableau & Power BI Dashboards
Business IntelligenceCareer adviceData Analytics

Prepare Your Data for Effective Tableau & Power BI Dashboards

The ability to create effective Tableau & Power BI dashboards is a crucial skill in today’s data-driven world. This guide walks you through the steps that will allow you to create easily updatable, automated and scalable dashboards.

Valeria Perluzzo - June 23, 2022
4 Steps to Overcome SAP Integration Challenges
Data AnalyticsData EngineeringManufacturing

4 Steps to Overcome SAP Integration Challenges

In this article, you will learn how we managed to overcome SAP integration challenges in 4 steps and combine data from different applications to to acquire a consolidated view of it.

Michal Zawadzki - June 22, 2022