top of page

A Lakehouse for consistent Net Sales Reporting 

  • Michelle Schulz
  • Mar 20
  • 5 min read

How a global pharma company (API producer) consolidated its Net Sales reporting using an Azure Databricks Lakehouse – including centralized KPI definitions and a controlled Excel upload.

Waleed Saleem

Project Lead and Senior Analytics Engineer

The objective was to bring together Net Sales KPIs, budget and forecast data, as well as additional steering metrics on a single platform.

The result: faster monthly closing cycles, fewer discussions about numbers, and a data foundation that enables reliable analysis across countries, business units, and sales structures.


As an internationally operating development and manufacturing partner for leading pharma and biotech companies, this very international setup makes steering and reporting inherently complex. When business units (BUs), country organizations, sales teams, and markets operate in parallel, data quickly emerges in different structures, definitions, and levels of granularity.

At the same time, management, finance/controlling, as well as BU and sales leadership require reliable monthly figures: Net Sales at group and BU level, budget vs. actuals, forecast positions, variance analyses, and of course the ability to quickly identify issues when something does not add up.

In a recent project, we therefore built a standardized, DWH-based reporting solution that consistently provides Net Sales KPIs and key steering metrics (including CAPEX, FTE, and budget/planning figures). The goal was not simply to create another dashboard, but to establish a platform that processes monthly data efficiently, stabilizes the overall process structure, and measurably accelerates decision-making.  


Why did we choose Azure Databricks?

The key architectural decision was to adopt a Lakehouse approach based on Azure Databricks.

Not because of trends, but because Databricks brings together data ingestion, transformation, and delivery in a scalable stack, while integrating seamlessly into the Azure ecosystem—covering areas such as security, identity, and monitoring.

This combination is particularly important for reporting platforms that continuously evolve, whether through new KPIs, additional data sources, or new analytical dimensions. It ensures scalability and control, without turning future enhancements into a collection of isolated solutions.


Technically, the solution is built on a modern Databricks Lakehouse architecture:


  • Delta Lake for reliable, transactional tables in the data lake

  • Medallion architecture with clearly separated layers (Bronze, Silver, Gold)

  • Unity Catalog for governance and role-based access control

  • Databricks Jobs/Workflows for orchestration, including logging and error handling

  • Power BI as the frontend, connected via Databricks SQL Warehouse


The key principle: KPI logic should not be distributed across multiple Power BI files, but centralized where data quality, governance, and traceability can be controlled—within the Gold layer of the Lakehouse.


The setup: Bronze, Silver, Gold – and why this separation matters  

The Medallion architecture is not a theoretical concept, but a practical response to common data challenges: manual Excel sources, inconsistent definitions, changing data deliveries, and missing identifiers.


 

A data processing pipeline with Streamlit allows users to upload data via a user-friendly interface, storing it in the Bronze layer. The Silver layer ensures cleaned and standardized data, while the Gold layer provides curated analytics.
A data processing pipeline with Streamlit allows users to upload data via a user-friendly interface, storing it in the Bronze layer. The Silver layer ensures cleaned and standardized data, while the Gold layer provides curated analytics.

Bronze: raw data with traceability 

Excel files are uploaded monthly by business users via a Streamlit application. The application enforces a controlled upload process, including metadata (e.g., period and source), and stores the original files in a standardized way in a Databricks Volume as a landing zone.

From there, the files are loaded into the Bronze layer as Delta raw data, with clear traceability back to the source file.

The original files are fully retained. While this may seem trivial, it becomes critical when it is necessary to understand why a figure has changed at a later point in time.


Silver: cleansing, typing, and data quality rules 

In the Silver layer, data is cleaned and operationalized through typing, standardization, mapping logic, and harmonization. Validation and quality rules are applied to detect inconsistencies early and handle them transparently.

These include, for example:

  • schema checks

  • mandatory field validation

  • duplicate detection logic

  • reference matching against dimension tables

  • period consistency checks

Especially for Excel-based sources, this step is essential.

  

Gold: business-ready tables and KPIs

In the Gold layer, curated tables and views are created to serve as a stable foundation for reporting and analytics. This is where business definitions of key metrics—such as Net Sales and other steering KPIs—are centrally implemented and standardized.

Power BI accesses these structures directly and builds the semantic model and visualizations on top. As a result, analyses across group, BU, and country levels are based on a shared, consistent data foundation.

Power BI (Power Query / M code) connects via Catalog → Schema → Table to the Databricks SQL Warehouse.


Excel upload without chaos: Streamlit as a controlled entry point

As outlined above, many reporting projects fail not because of complex data platforms, but due to seemingly small issues like Excel files with inconsistent templates, manual adjustments, or missing metadata.

For this reason, the upload process was intentionally designed as part of the solution.

The Streamlit application provides:

  • a simple, guided upload process for business users

  • standardized storage in a Databricks Volume

  • mandatory metadata (period, source) and validation checks for missing values and data quality issues to ensure proper loading and historization

  • full traceability of which file populated which tables


This significantly reduces later discussions about “which file was the correct one” and stabilizes the overall data ingestion process.


Orchestration: notebooks for logic, workflows for operations

Data processing is implemented in Python and SQL notebooks, orchestrated via Databricks Workflows (Jobs). This setup clearly separates development and operations: notebooks are used to implement and evolve transformation logic, while workflows handle scheduling, dependencies between processing steps, as well as monitoring and error handling.

For recurring processes, such as monthly data loads, it is not only important that pipelines run successfully, but also that errors and data states are reproducible. A stable data foundation and a transparent loading process are therefore essential for reliable reporting and consistent KPIs.

  

Multi-dimensional market analysis

Analyses can be performed across:

  • geographies (countries and regions)

  • sales organizations

  • sales teams

  • business units

This makes it possible to answer questions that previously required significant manual effort: Which regions are actually growing? Which organization delivers what level of performance? Where are opportunities, and where are risks emerging?


Integrated planning and forecasting

Actuals, budget, and multiple forecast versions are consolidated into a single structure. This enables consistent and historized variance analyses and budget vs. actual comparisons, including clean performance tracking over time.


Global standardization

Unified country and regional hierarchies ensure that analyses remain internationally comparable, without each entity introducing its own definitions.


Price and margin transparency

In addition to gross and net sales, price components and deductions (e.g., logistics, bonuses, discounts) are included. This enables much clearer analysis of margin drivers, discount effects, and pricing logic.


Typical challenges – and how they were addressed

The complexity of such platforms rarely comes from dashboards, but from real-world sales and organizational structures.


Key aspects in the project included:

  • customer segments and hierarchies (direct customers vs. distributors)

  • multi-currency transactions and FX effects

  • delivery terms (Incoterms) as cost and risk drivers

  • consistent business unit assignments

  • heterogeneous data sources and Excel-based inputs


A clearly defined validation and data quality approach in the Silver layer—combined with mapping logic and structured correction processes—ensures that data issues are identified early and not only discovered during management reporting.

  

Operations, cost, and governance

A practical advantage of the platform is its operating model: compute resources are used on demand (pay-as-you-go), rather than running continuously.

The Medallion structure also reduces rework, as raw data, cleaned data, and curated KPIs are clearly separated. Governance is managed via Unity Catalog with role-based access: Bronze and Silver layers are more restricted, while the Gold layer is more broadly accessible for reporting purposes.

Together with original files, load logs, and metadata, this creates strong traceability—an essential foundation for ensuring that numbers are not only available, but also trusted.


Conclusion

The platform delivers more than just reports—it provides a stable foundation for decision-making: scalable, traceable, and consistent in its KPI logic.

In day-to-day operations, the biggest impact is visible in two areas: monthly data is automatically available in reporting after upload, and international structures can be analyzed transparently.

As a result, reporting shifts from a recurring operational task to a tool for market steering, pricing decisions, and reliable forecasting.

 
 
 

Comments


bottom of page