Databricks vs. Microsoft Fabric

Blog post
Data & Cloud Services
Paula Guerra Toni
Jan Ole Munstermann
18
.
06
.
2024

Modern data platforms put to the test

Databricks and Microsoft Fabric are currently among the most important players in the data technology landscape. Microsoft Fabric in particular has caused quite a stir with its new approaches. In this blog post, we take a look behind the scenes and briefly compare the core functionalities of the two tools.  

Databricks has made a name for itself among data platforms for over 10 years. Many companies use it as a standardized analysis platform that is particularly suitable for big data tasks and smooth integration with Apache Spark. Databricks is known for its cloud adaptability and scalability as well as its simplification of collaboration and has set high standards.  

But the dynamic world of technology is constantly evolving - and so the next big change came in 2023: Microsoft unveiled Microsoft Fabric, a solution designed to greatly simplify the processing of data for companies. This new tool combines functions from well-known services such as Power BI, Azure Synapse and Azure Data Factory in a unified platform. It has thus confidently entered the data platform stage and quickly secured a strong position among its competitors. It is therefore worth looking at how it compares to Databricks. To do this, we have taken a closer look at the core functionalities of both tools:

Integration:

  • Databricks is designed to work seamlessly with leading cloud providers such as AWS, Microsoft Azure and Google Cloud. It provides connections to a variety of data sources and BI tools.  
  • Microsoft Fabric can currently be connected to 135 different sources and is characterized by its seamless integration with Microsoft's native services. At its core, OneLake serves as a central data repository that brings together data from various sources.  

Calculation:

  • Databricks uses Spark clusters with dynamic scaling.
  • Microsoft Fabric has introduced the concept of Spark computation on demand via the "Spark Compute Platform" and also offers TSQL and KQL as a calculation method.

Security:

  • Databricks prioritizes security at every level of its platform, offering data encryption at rest and in transit, role-based access control and VPC.
  • Microsoft Fabric pursues the "define-once, enforce-everywhere" strategy for consistency across all compute engines. It also relies on data lake-integrated security with hierarchical authorizations.

Costs:

  • In Databricks, billing takes place via "Databricks Units" (DBU) according to the pay-per-use principle. In addition, there are the costs of the respective cloud provider and storage costs.
  • Microsoft Fabric calculates the costs on the basis of "Capacity Units (CU)" according to the pay-as-you-go model. This is essentially the reservation of certain CUs for a certain period of time. Added to this are the costs for storage in OneLake.

Continuous Integration & Continuous Deployment (CI/CD):

  • Databricks enables a robust and seamless integration of different Git providers and supports notebooks.  
  • Microsoft Fabric currently only has Git integration with Azure DevOps Services, which allows resources such as notebooks, reports and data records to be versioned.
You can find details on the individual points in our current white paper.

Conclusion:

Both Databricks and Microsoft Fabric are developing into leading unified data platforms that efficiently support the entire data journey. Microsoft Fabric stands out by offering additional compute engines such as TSQL and KQL, even though Spark clusters in Microsoft Fabric are comparatively less configurable.  

Microsoft Fabric rationalizes the use of notebooks and minimizes the waiting times for the provision of computing resources. Databricks, on the other hand, is a leader in data visualization capabilities within its notebooks. The most outstanding feature of Microsoft Fabric is its seamless integration with Power BI directly on OneLake, enhancing data analysis capabilities. Microsoft Fabric also excels at fostering collaboration and offers multiple ways to work with it, such as using Dataflow Gen2 and/or Data Pipelines.

Databricks takes a more code-centric approach and is primarily aimed at data experts, while Microsoft Fabric, with its user-friendly interface, is also suitable for people who want to face the challenges of a modern data landscape.

To help you choose the right platform, we recommend our white paper with extensive details on the core functionalities listed as well as further information on the various data expert roles and a real data engineering use case.

,
celver conference on more new ways to use data
Webinar on demand

What opportunities does MS Fabric offer?

New ways to use data

Webinar on demand
Julian Schütt
Length:
32
Minutes

Blog post author

Paula Guerra Toni
Paula Guerra Toni
Data Engineer
celver AG

Paula is a data engineer at celver and, in addition to the classic tasks of a data engineer, is particularly involved in data analysis and the architecture and provision of cloud environments. She is constantly acquiring new knowledge in these areas and actively implements it in her work. Paula has extensive project experience with various tools and technologies - especially within Databricks.

Jan Ole Munstermann
Jan Ole Munstermann
Data Engineer
celver AG

Jan Ole works as a Data Engineer at celver and focuses mainly on cloud environments in Databricks and Azure Synapse. He is responsible for extracting customers' source data and making it available in various data marts within these tools using data pipelines. In this context, he is constantly on the lookout for innovative technologies that enable him to make his day-to-day work even more productive.

Case Study on the topic

Our news provides you with the latest insights into smart planning, smart analytics, smart data and smart cloud.

Register now