BIDA0059 – BIDA Multi Level Data Model Example in PowerBI

0
49

Please watch the video in 1080p because it is recorded in High Resolution.
Please note the script of the presentation is below so you can read along with the demonstration if you like.


Hello and welcome to our latest blog post.

I am Sofie your BIDA AI Assistant.

I will be reading this blog post for you today on behalf of Mihai.

I am really pleased to be able to bring you some news that I think will really interest you.

If you are a Power BI user you will be interested in this Blog Post.

In our prior blog post we showed you how to get Power BI to read a multi level data model.

Of course, because of the limited screen space we needed to limit what we put on the video and into the blog post.

Today we are pleased to announce the release of an actual Power BI report file that contains a very good example of a multi level data model being read in Power BI.

You can download the example report on the button below.

Power BI Report

Of course you do not have access to our test database.

So, when you open the Power BI Report you will get a message saying it can not find the database.

However, you will still be able to open and study the data model.

In this example we have set all tables to be read from the database as we do not want to pass out any of the test data.

This data model is built over the top of our Business Central data warehouse that remains under development.

We are going to release Business Central direct query Power BI reports prior to releasing the updated Business Central data warehouse.

So, as we say.

On with the demonstration.

Demonstration

On the video at the moment you can see the top left hand portion of the data model.

You can see that to get the table names onto the screen we have collapsed all the tables.

Obviously, you can open them and look at the tables.

Unfortunately, the columns are in alphabetical order and not the order they appear in the tables themselves.

However, that is a small limitation.

To give you an idea how this model has been created we will explain briefly.

One. We created a second test database from our development data warehouse.

Two. We copied limited data from our development data warehouse based on views to the test database.

The views were replicated to the target test database as tables.

Three. We then removed columns that were not wanted in various tables.

This includes removing columns from tables that are from a different level.

For example, we removed the week, month, quarter and year data from the day table.

We removed the product group and product category data from the product table.

We removed the city, state, country and continent data from the geography table.

If you have been involved in any level of data modeling at any time you will immediately understand that this is commonly known as a snowflake schema.

But remember, the actual data warehouse that it queries is a star schema, not a snowflake schema.

So the actual data warehouse retains the performance of a star schema all be it just slightly degraded.

Four. We added all the extra rows of dummy data needed to each of these new dimension tables.

Five. We added all the referential integrity constraints.

SQL Server checks for the enforcement of the referential integrity constraints when the constraint is being created.

This meant we had to use realistic test data and not just generated data.

Six. We went through the model turning off the defaulted summarization option on fields that should not be summarized.

Seven. We went through and hid the fields in the model that we want hidden.

This then creates the sort of data model you can see on the screen now.

Because you have the actual report and you can open this data model for yourself we will just give a brief overview here.

This sort of data model is very standard across Microstrategy accounts.

Anyone who has used Microstrategy will immediately recognize this sort of model.

The only difference between how we implement our data warehouses and how Microstrategy implements their data warehouses is that we use Ralph Kimballs idea of multi level tables.

We do not claim to have invented this idea ourselves since it is widely known Ralph Kimball invented it in the early nineteen eighties.

We have other demonstrations on how to navigate multi level data models in Ralph Kimballs Meta five software.

We had believed it is not possible to navigate these models in Power BI.

But you never know until you try something.

And we are very pleased we have come up with this way of putting Power BI right on top of our multi level data models.

In the top left hand corner you can see the time hierarchy.

The only table that might not be obvious to you is the total table.

This is a common feature in Microstrategy models to give the starting point for drilling.

It contains one field with the text total in it.

It is not needed for Power BI but someone might find a way to use it.

You can see the product hierarchy.

You can see the geography hierarchy.

And then, as we scroll down, you can see many dimension tables that you would expect to see for sales transactions.

The currency, the account, the status, the units of measure and similar.

The bottom line of the model is the campaign hierarchy.

Mostly campaigns are used for marketing campaigns, but they can be used for other things.

Now I will scroll up and to the right.

At the top left you can see the sales transaction table.

This is a very normal sales transaction record that you would find in any data warehouse for any retailer.

It is linked to everything we can manage to link it to.

It contains the name of the product, the timestamp of the sale, the unit and extended amounts, the number of units and all the usual fields for a retail sales record.

We have included five summary level tables that are all at different levels of summary.

On the right hand side we have put all the demographic information tables.

This includes demographic information for companies in the case where it is a company buying the retail product and not an individual person.

There are many retailers who will issue loyalty cards to a company.

In these cases any authorized person from the company can come into the retailer and use the company loyalty card when making purchases.

I do not believe it is necessary to explain to you what all the demographic analysis tables are.

If you have been in retail data analysis for any length of time you have seen that retailers who have loyalty cards want to analyze sales by as many personal attributes of the customer as possible.

In the middle of the screen you can see the all demographic table.

You can see the keys on it.

They have obvious names.

In the middle at the top you can also see that the individual customer record is available as well.

When we scroll down a little you can see the tables further down.

You can see that we have five summary level fact tables over the top of the sales transactions.

In this particular demonstration we have not created partitions and incremental updates of the summary fact tables.

It is obvious that this could be done from the prior blog post.

I will just open up the manage aggregations window on the number three summary table.

It is obvious that they are all set up in a similar way.

You can see that the aggregate navigation has been created for such fields as cost extended amount and discount extended amount.

We will scroll down.

You can see that the aggregate navigation has been created for sale extended amount, sale units and taxes.

Now I will close the manage aggregations window.

Where possible all joins were defined to have integrity guaranteed.

This means that Power BI will generate the joins as inner joins.

The precedence of the fact tables has also been set.

We would suggest that the precedence be the same as the view numbers.

We have tested this.

Power BI really does generate the SQL to get the data from the highest precedence summary table that satisfies the query.

So as you drill down in the data and need more detailed data, Power BI will go to the level of summary required.

One thing we wanted to note for you is this.

In our experiments we found that either all dimension tables needed to be dual or database only.

We found that if we had some dimension tables as database only and some as dual that we got invalid results.

Now we are not saying that this is a bug in Power BI.

There could very well have been some mistake we were making.

Of course you do not want to load your whole customer table and all demographics table into Power BI.

These two tables are very large in large retailers.

You want them to be in the database only.

You can see in the summary table number four that the marital status field is in this table.

So if you wanted to do analysis by marital status you can use that summary table.

You will notice that in the sales transaction table that fields like marital status or age band are not in it.

This is because on the sales transaction table we have a field called customer demographics detailed key.

This customer demographics detailed key links to the all demographics table.

The all demographics table does not contain one row for each customer.

The all demographics table contains one row for each combination of demographics that exist in the customer database.

In the past we used to generate things like the all demographics tables to be all possible combinations of all attributes regardless of whether anyone in the customer database matched that combination.

One of the many features of how we write ETL is that we only generate customer analysis dimensions based on what combinations actually exist.

We then check those combinations every day to detect if a new combination has come into existence.

Obviously that is a very compute intensive operation which is why it was not done years ago.

One las thing we would like to show you in this demonstration is the report view.

So we will just go to report view now.

As you can see in the top right hand corner there is only one table for sales transactions.

The other five tables are hidden.

When I open up the sales transactions you can see that the fields from the detailed level sales transactions are listed.

The aggregation manager will manage the querying of fields that are in the summary level tables.

So if the field sale extended amount is selected then it can go to the summary table.

But if the sale unit amount is selected then it must go to the detailed level transaction records.

The business user has no way to know this as far as we know.

He or she will just see varying response times.

Next.

I would like to make just a few more closing comments.

Our BIDA ETL software manages these multi level summary tables with pretty much zero extra work.

Adding a new summary into production takes less than an hours work.

The summaries are all kept in the same table and so there are no extra objects to create.

We just like to use hand written SQL for summaries and that needs to be updated.

The time to create this ETL is so little there is no real sense in trying to automate it.

There is a control table to update as well.

In these days of SQL Server having data compression and running on SSDs there is no real limit to the number of summaries you could have for a detailed transaction table.

This leaves us with the question of how much data should be put into Power BI data sets and how much should be left in the data warehouse.

You will have to decide that for yourself.

However, Microsoft is pushing Fabric and Power BI can talk directly to Fabric.

Fabric will be so fast that if multi level tables like this are stored in fabric, and not also moved to Power BI datasets, then performance difference will be very small.

This will be especially true where the report is set to open up at a high summary level and then the drill down is through user interactions.

In the past the issue has been that most people have had only one level of summary in the Power BI dataset.

So the report had to be opened and refreshed to the high level it is opened at.

And this refresh was from data stored at the lowest level the report would require the data to be at.

This makes for slow opening times of Power BI dashboards.

By having multiple levels of summaries the highest level of summary can be the level that the report opens at.

In this case the reading of data from Fabric, or even from SQL Server, may be the same, or even faster than opening the same dashboard from a low level summary inside the Power BI dataset.

The obvious advantage is that if Power BI is reading data from the data warehouse directly then there is no need for the Power BI dataset to be refreshed and stored in yet another location.

This level of performance analysis has to be investigated on an individual basis for each company.

The developers of each dashboard will ultimately decide what data will be loaded into the Power BI datasets and how that refresh will occur.

From our perspective it looks more and more like the longer term future is for Power BI to read data directly from both column based databases as well as SQL Server data warehouses with column store indexes.

The reason we think SQL Server will still be very widely used is that we expect standard edition to remain much cheaper than the same volumes and workload on Fabric.

However, we are not experts in that area so please consult Microsoft or better informed partners on the relative costs of SQL Server and Fabric performing the same workload.

Just in case you have not heard.

In SQL Server 2025 we are promised that SQL Server can replicate to Fabric.

We don’t know what editions of SQL Server will have this feature.

But with this feature it makes a great deal of sense to keep ETL on SQL server and to just replicate the finished data to Fabric.

This keeps your entire ETL load on SQL Server and provides you an automatic backup of your fabric data warehouse.

Given your SQL server can then be on premise or in Azure, you have the best range of options to choose from.

In Summary

What we have given you today is an actual Power BI report that contains a fully working example of multi level data model navigation by Power BI.

It is a simple example of sales transactions with five levels of summaries defined for sales transactions.

It is taken from our work on our Business Central Data Warehouse.

You are welcome to learn from this example.

You are very welcome to share it with all your friends.

The more people who see this demonstration the better.

Over the years we have found these sorts of models to be quite a lot easier and cheaper to manage than the equivalent models created by MicroStrategy.

And with that?

I would like to say thank you very much for watching our video today.

I really appreciate your time and attention.

We hope you found this blog post interesting and informative.

We would love to hear your comments so please feel free to send us comments on our contact page.

I wish you a great day.

Thank you.

Sofie.

Your BIDA AI Assistant.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.