BIDA0009 – Navision BI Sales Category

0

Hello and welcome to our latest blog post. We are pleased you have come back to see our latest news.

Our latest news is that we are now able to release details of our BIDA Navision BI Sales Category data models.

We continue to work on the development of the BIDA Navision BI Solution. As we have news of how that is progressing we will be releasing portions of the data models.

So please check back regularly or subscribe to our mailing list to be informed when new posts are put on our web site.

We use our BIDA ETL software to populate our BIDA Navision BI Data Models and the portions are are releasing are “production ready” meaning that you could take these portions of the models we are publishing and put them in to production at your site.

We have written a detailed white paper on the BIDA Navision Sales Category data models and how we have implemented them.

If you would like to download a copy of this detailed white paper showing you exactly how we have designed the BIDA Navison Sales Category Sub Models?

Please just put your best email in the form below and we will email you a link to the download.

BIDA0008 – Navision Staging Area Access Details

0

Hello and welcome to our latest blog post. We are pleased you have come back to see our latest news.

We recently had to build an entirely new staging area for our Navision BI development partner. The reasons we needed to do this are not particularly relevant. It had to be done.

What is very relevant to you as a Navision user is that we were able to build a completely new instance of the staging area in just 4 working days. This was 1,867 tables and 35,591 fields from a Navision 2009 instance.

This new staging area has now been in production for more than a month and it is running smoothly. With the smaller Navision instance to scan for changes the extract processing time is down from around 30 minutes to 5-6 minutes.

We were rather surprised the extract times are so fast but without 7 years of transaction history this is the extract times we are seeing. It takes about another 10 minutes to apply the updates to the staging area. So we are seeing extract and update times to the staging area in the 15-20 minute range depending on the volume of transactions of course.

We have written a short white paper on how such staging areas can be access by a product we happen like called Meta5. We hope you like it!

If you would like to download a copy of this detailed white paper showing you exactly how we are able to access the Navision Staging Areas we build with Meta5?

Please just put your best email in the form below and we will email you a link to the download.

BIDA0007 – Navision Staging Area Construction Details

0

Hello and welcome to our latest blog post. We are pleased you have come back to see our latest news.

Our latest news is that we recently had to build an entirely new staging area for our Navision BI development partner. The reasons we needed to do this are not particularly relevant. It had to be done.

What is very relevant to you as a Navision user is that we were able to build a completely new instance of the staging area in just 4 working days. This was 1,867 tables and 35,591 fields from a Navision 2009 instance.

The first time we did this 3 years ago it took us about 10 days work. Then again that was the first time we had ever undertaken building such a large staging area with our new techniques we had developed. And we had to write two functions for field and table name mappings that took quite some time.

This time it went much more smoothly. All work was completed and the staging area was ready to go in just 4 working days. We did have to do a few other small changes and problems we found as we tested so you could call it 5 working days, or a weeks work.

We have written a detailed white paper on exactly how we did this.

If you would like to download a copy of this detailed white paper showing you exactly how we were able to build a proper, production, incremental extract Navision staging area in just 4 working days?

Please just put your best email in the form below and we will email you a link to the download.

BIDA0006 – MicroSoft Navision BI Solution

0

In this blog post we wish to provide a short explanation of our MicroSoft Navision BI Solution which is under development.

For the last 3+ years we have been providing 24×7 support for our Romanian development partner.

Today this development partner has 280+ retail stores and sells 300,000 to 400,000 items in their stores each day.

The data warehouse is on SQL Server Standard Edition running on a 16 core server for the data in the staging area and data warehouse.

The time span of the data goes back to 2012 and we are able to support analytics for sales and finance back to 2012 based on this data warehouse.

The ERP they use is MicroSofts Navision.

We have also implemented a full staging area taking all the data from their Navision ERP and placing it in to the staging area on the data warehouse machine.

We take incremental extracts from the production system two times per day and we run these through a delta detection process to apply inserts and updates to the staging are to keep the staging area in sync with the production ERP system.

This means the customer has 100% of their ERP production data available in the staging area to query with any tool they please and therefor no extra load is placed on to the production system for ad-hoc and “one time” queries.

The ETL system was developed by another Romanian based company and was written with SQL Server stored procedures.

We have not replaced these stored procedures with our BIDA ETL. However, we have done new development using our BIDA ETL.

We migrated the processing schedules from purely stored procedures across to our BIDA Scheduler to be able to run portions of the ETL in parallel to reduce the over all ETL elapsed time.

Given this three years of experience we naoe have very good skills with Navision.

As a result we have decided to bring to market a Navision Business Intelligence Solution.

Our Navision BI Solution is under development on Navision Version 2009.

We have completed the development of the staging area and the main business entities for retailers.

For example we have the following fact tables completed in development:

  • Customer Ledger Header
  • Customer Ledger Txn
  • Fixed Asset Ledger Txn
  • GL Account Balance
  • GL Account Txn
  • Outlet Sale Header
  • Outlet Sale Txn
  • PO Receipt Header
  • PO Receipt Line
  • Return Header
  • Return Line
  • Sale Txn
  • Store Payment Component Txn
  • Vendor Ledger Header
  • Vendor Ledger Txn

We have also implemented the mechanisms for managing summaries for sales with incremental updates of the summaries.

The proposal for the Navision BI Solution is to move ALL DATA from the Navision ERP rather than to move a limited amount of data.

We will deliver the initial suite of reports in Excel using the Meta5 product because Excel is pervasively used in Navision customers we have talked with.

During our start up phase we are going to limit the number of customers for our Navision BI Solution to five.

We will focus on our first five customers to get them implemented and supported properly while we hire and train some of the “best and brightest” graduates to support our local Romanian customers.

We will soon be releasing marketing materials on Navision BI Solution.

If you are running any version of Navision and you would like to ask us any questions about our planned Navision BI Solution?

Please put your best email address in the form below.

You have space to ask any question you would like on the form as well.

BIDA0005 – Data Analysis for Business Improvements

0

One of the most common complaints that we hear about Business Intelligence systems is:

“We are sure there is more value we can get from the data warehouse but we just can’t seem to get that value out of it. We are not sure why that is.”

One of the problems that we have seen over the years is that there is much more focus on producing “more reports” and less focus on “how to we make more profit using the information we have”.

The reason for this is that the vendors make more revenue by having you buy more client licenses and running more reports against the data warehouse.

So the marketing from vendors is all about increasing the number of users of the data warehouse, not improving your profit.

With our many years of experience we understand many areas of businesses which can be optimised to improve the profitability of companies.

In particular we have a great deal of experience in marketing data warehouses and helping our customers cross sell and upsell products and services to their existing customers.

We are able to use a variety of tools to perform this sort of analysis.

One tool we would mention for your consideration is Meta5. (www.meta5.com).

This is a very useful suit of tools for “power analysts”. Recently Meta5 released a single user version that is affordable for Romanian companies.

The meta5 tools are very good to perform ad-hoc exploratory analysis to find new business opportunities.

If you are interested in discussing our experience in data analysis that actually leads to business improvement and more profit?

Please put your best email address in the form to the right / below.

You have space to ask any question you would like on the form as well.

BIDA0004 – ETL Support – Migration

0

In this blog post we wish to explain our position today as to what we can offer for ETL support or even possible migration of your ETL system to our BIDA ETL software.

As we have stated, we are currently in start up mode and we are now actively searching for the “best and brightest” graduates to hire and train.

However, we have provided 24×7 ETL support for our Romanian development partner for the last 3+ years.

Today this development partner has 280+ retail stores and sells 300,000 to 400,000 items in their stores each day.

The data warehouse is on SQL Server Standard Edition running on a 16 core server for the data in the staging area and data warehouse.

The time span of the data goes back to 2012 and we are able to support analytics for sales and finance back to 2012 based on this data warehouse.

The ERP they use is MicroSofts Navision.

Today, even in our start up mode, we are in the position to offer 24×7 support for Romanian companies running the Navision ERP and who have their existing data warehouse on SQL Server.

Over the 3+ years of our support relationship with this customer we have included 2 hardware upgrades and one SQL Server version upgrade.

We have also implemented a full staging area taking all the data from their Navision ERP and placing it in to the staging area on the data warehouse machine.

We take incremental extracts from the production system two times per day and we run these through a delta detection process to apply inserts and updates to the staging are to keep the staging area in sync with the production ERP system.

This means the customer has 100% of their ERP production data available in the staging area to query with any tool they please and therefor no extra load is placed on to the production system for ad-hoc and “one time” queries.

The ETL system was developed by another Romanian based company and was written with SQL Server stored procedures.

We have not replaced these stored procedures with our BIDA ETL. However, we have done new development using our BIDA ETL.

We migrated the processing schedules from purely stored procedures across to our BIDA Scheduler to be able to run portions of the ETL in parallel to reduce the over all ETL elapsed time.

We wish to make it public that we are in the position where we can offer 24×7 support for Navision data warehouses because we have already been doing this for 3 years.

We have very good skills and experience with Navision and we are in a position to offer full 24×7 support for Navision data warehouses.

Of course, before you would consider any such proposal you will want to get to know us and work with us on some of our earlier service offerings.

We would be pleased to have the opportunity to establish our credibility with you and your peers in your company that we can provide you 24×7 Navision data warehousing support.

If you are running your ETL system for Navision with such tools as SSIS, DataStage, Informatica or stored procedures. Then we are also in a position to be able to migrate your current ETL system across to our BIDA ETL system.

BIDA ETL will reduce your daily elapsed time processing, improve the reliability of your ETL, and reduce your total cost of ownership of your Data Warehouse.

BIDA ETL will do this by reducing the amount of effort in maintenance and support of your ETL system.

If you are running any version of Navision and you would like to ask us any questions about what we are able to offer in this area?

Please put your best email address in the form below.

You have space to ask any question you would like on the form as well.

BIDA0003 – Data Model Improvements

0

The data warehouse data model is the foundation of the BI Solution for any company.

In our many years of experience we consistently see poor data models implemented even by large companies that really should know better.

Poor data models can cause issues for BI systems for many years. And once there are large numbers of reports and applications built over the top of the data models they are very difficult and expensive to migrate to a better set of models.

Poor data models can live on in production for many, many years causing frustration and lost time and money for the users of the system and the developers who support the system.

It is much better to deliver the first version of the data warehouse a little later with better data models than to deliver sooner with poor data models.

With our many years of experience we have developed our own suite of data models which have a large number of specific design features to make them excellent data models.

Here in Romania we do not expect many companies to wish to adopt our data models because most companies that need a data warehouse already have one in production.

Migration to our models would be a large undertaking that only a few companies would be willing to go ahead with.

However, if you have a data warehouse in production and you have questions about what improvements could be made to your data models?

Then we would be happy to provide you the free service of reviewing your data models and preparing for you a list of recommendations for improvements that could be made to your data models.

Of course, this service would be provided at no charge to prospective customers as part of our process of establishing our credibility to you and your peers in your company in the area of data models.

In the near future we will be updating this post to talk about “common data model mistakes”.

If you are on our email list you will be notified by email when we update this post.

If you would like to ask us questions about your data models and your thoughts on what you believe might be issues in your data models?

Then please feel free to put your best email address in the form below.

You also have a space for asking us any question you would like in the form.

BIDA0002 – ETL Performance & Reliability Improvement

0

The two most common descriptions of ETL Systems are:

  • It takes too long
  • It costs too much

These two descriptions apply equally when describing the development of ETL as well as the run times of the ETL subsystem.

If you have described your ETL System as “it takes too long” or it “costs too much”?

Then this blog post is for you.

In this blog post we describe some of the common issues that we have seen with ETL subsystems.

Firstly we will discuss some of the run time issues.

Run Time Issues for ETL

The number one reason why ETL systems are slow in production is redundant processing.

This means that processing is being performed that is not necessary to perform.

The usual cause of redundant processing is that the developers of the ETL system did not know how to write the more complicated code to reduce, or eliminate, redundant processing.

Some examples of redundant processing are as follows.

1. Extracting data that has not changed.

In many cases it is hard for developers to determine what data has changed in an operational system. So, during the development period, the developers simply extract all the data for those extracts where it is difficult to tell what has changed.

With pressures of moving the data warehouse in to production these full extracts often go in to production and remain in production until they are causing too many problems and are fixed.

At the very worst the ETL subsystem should be build to perform “delta detection” prior to data being placed in to the staging area so that the only rows that go in to the staging area are rows that have been inserted or updated. If a row has been deleted the “row deleted flag” should be set on the staging area row. The row itself should not be deleted.

This “delta detection” code can be difficult to write in such tools as DataStage and Informatica which is one more reason why it is very often not written.

2. Applying changes to data that has not changed

The mirror image of extracting data that has not changed is the ETL system is applying changes to data in the data warehouse where the data has not changed.

The very often happens when there is no ability to detect changes in rows properly and so more data than is necessary is extracted, and then this data is applied to the target tables without performing a delta detection process either before the staging area or against the target data warehouse tables.

In these cases the updates are logged. If the rows are deleted and then re-inserted the deletes are logged and the indexes are updated.

In all cases the work that is required causes a large extra, unnecessary, processing load.

3. Resummarising data for summary tables

The #1 thing that can be done to improve query performance in a data warehouse is to create summaries to support the queries that can be answered using aggregated data. The larger the volume of the transaction records the larger the benefit of maintaining summaries to answer some queries.

However, maintaining incremental updates of summaries is not a simple thing to do. And many companies develop ETL systems where the updates of the summary tables or cubes performs a lot of redundant processing.

4. Lack of proper parallelism of ETL processing.

Everyone wants their data delivered to their data warehouse, and cubes and reports, by 9am when the business day starts. One of the biggest complaints we see in ETL run time systems is “our data is not updated by 9am, we have to wait.”

Most ETL tools make it very difficult to most effectively parallelize the processing of batches such that the machine running the ETL can be run at high utilization during ETL processing.

Because it is difficult it is generally not well implemented.

Further, in batches that have parallisation in the batch, when one command in the batch fails it generally required an ETL developer make some form of manual changes to the “code” that runs the batch to complete the batch once the problem is fixed.

This introduces the opportunity for manual errors to be introduced in to that ETL processing. Especially because these changes might be being made in the middle of the night when the developer is not so mentally aware as he might otherwise be.

Development Issues With ETL

We have had many years of ETL development experience across many tools, most notably DataStage and Informatica.

As good as these tools are, the development times of the ETL mappings are relatively slow.

The answer for most companies has been to reduce the cost of development by hiring cheaper people, especially from the Indian outsourcers such as Tata, Wipro, InfoSys and others. The booming IT developer market in India has been largely a result of companies sending development work to India and ETL development work is no different.

As a result of this trend the ETL tools developers responded with creating graphical user interfaces that made it possible for a relatively low skilled, low paid developers to develop ETL mappings.

This has reduced the cost of ETL development but it did not particularly reduce the time required to perform ETL development. It did not improve the quality of ETL development either.

During our start up phase we will support ETL developed in DataStage or Informatica. However, we would openly admit we are no faster than any other good DataStage / Informatica developers. This is because the nature of the tools drives the development time, not the skill of the user of the tools.

We took a different approach and we have our own BIDA ETL software.

We took the approach of “make highly skilled people more productive writing ETL” rather than “enable less skilled people to write the ETL.”

Not only does this reduce the cost of ETL development, it improves the quality of the ETL subsystem because the people doing the development work are more skilled and experienced. It is normal for someone with much more skill and much more experience to do higher quality work than a less experienced person. It’s why we pay them more.

So our approach was to find a way to enable highly skilled, highly paid developers to develop ETL faster, cheaper, better than less skilled people who were paid less. We wanted to find a way where the overall cost of such development was considerably less than hiring the “Indian Outsourcing Company”.

And this is what we have done.

The #1 feature of the ETL tools is that they have a graphical user interface to make the development of the ETL possible by less skilled, lower paid developers.

The #1 feature of our BIDA ETL tool is that we avoided having a graphical user interface so as to make the more highly skilled, more highly paid, developers more productive.

It really is as simple as that.

We know that we can develop and support ETL systems for much less cost than anyone else in Romania can. Indeed, we can build ETL systems so quickly and so cost effectively we can migrate the finished BIDA ETL system to Informatica or DataStage if that is what you want and still perform that development work fast than anyone else.

 

Collapse of Data Structures Plus Processing Algorithms

Because the ETL tools rely on a graphical user interface in order to develop their mappings it is necessary to collapse the data structure and the processing algorithms in to one “program object” which we will call a “mapping”.

Many people do not understand what we mean by that so please allow us to explain.

When you write, say, a fact table mapping, you must define the source, you must define the target, and you must define the lookups for the dimension string keys from the source to the integer keys that will be put on the fact table. All fact table processing with dimensional models looks like this.

In ETL tools you must code those lookups inside the mapping along with any other processing you wish to do for the fact table processing. This means your algorithms for your lookup processing are inside the mapping.

Although this seems obvious and is therefore accepted by everyone, it is a serious issue that slows down development considerably.

Why? Because those same lookups, such as the lookup for dates, times, customers, products, geography, company structure etc are common across MANY fact tables. So the lookups must be coded by the ETL developer each time for each lookup for each fact table.

In a large data warehouse with 50+ fact tables you could be coding upwards of 500 such lookups, all of which have to be tested individually for correctness. And that takes developer time.

The far better way to develop ETL is to separate the data structures from the processing algorithm. This is, in fact, one of the most fundamental development strategies that has been taught for decades.

Always separate the data structures from the algorithms so that you have the algorithm coded only once but you can give the algorithm many data structures to perform it’s processing on.

And this is the approach we have taken. In BIDA ETL the developer does not need to concern himself with the algorithm that will be applied to the data structures he is creating in mappings. The algorithm is separated from the data structures.

This provides a very large productivity boost and that boost is linear to the amount of mappings to be developed.

To explain simply?

With standard ETL tools using a graphical interface the ETL developer must code the algorithm for the ETL processing in to the graphical user interface as well as code the sources and target data structures for processing.

With BIDA ETL the ETL developer only codes the sources and target data structures and processing that is unique to that mapping. He does not code the algorithm for the processing of the data. That is separated and not in the hands of the ETL developer.

This is the fundamental difference that makes BIDA ETL development faster, cheaper, better and more reliable.

Summary

In this blog post we have talked about some of the issues with ETL development.  The topics covered included:

  • Extracting data that has not changed
  • Applying changes to data that has not changed
  • Resummarizing data for summary tables
  • Lack of proper parallelism of ETL processing
  • Collapse of Data Structures Plus Processing Algorithms

For a sizable ETL subsystem these thing can add up to significant elapsed time per ETL batch cycle and result in the BI system not being up to date at 9am in the morning.

Obviously, we have long experience at solving all these problems and many more.

We have implemented many of these solutions in the data warehouse project of our development partner here in Bucharest. We are able to describe them in great detail to you if you would like us to do that.

Best Regards

The BIDA Team!

If you have any of these ETL problems and would like to talk with us about how we solve them in the ETL systems we have implemented?

Please feel free to put your best email address in the form below. You also have a space to ask us any question you would like in the form.

BIDA0001 – SQL Server Manual Partitioning

0

If you are not familiar with the benefits of manually partitioning a data warehouse when using SQL Server, then this post is intended to give you a brief introduction as to the benefits of manual partitioning.

Because we were building sql server data warehouses before the database supported partitioning feature was introduced by Microsoft in SQL Server 2005 we had to implement manual partitioning.

When microsoft implemented database supported partitioning in SQL Server 2005 everyone moved to the database supported partitioning because there was very little extra cost involved.

Enterprise Edition could be purchased as a Client Access License and this was suitable for data warehouses.

All this changed in SQL Server 2012 when Enterprise Edition, the only edition with the partitioning feature, was changed to only be available as a per core license and only for USD17,000+.

USD17,000+ per core for the data warehouse database license is not affordable for a lot of Romanian companies.

So for those Romanian companies who wanted to use SQL Server for your data warehouse and did not know how to implement manual partitioning you were faced with a decision:

  • Buy SQL Server Enterprise Edition for USD17,000+ per core.
  • Implement your data warehouse on Standard Edition without partitioning the large tables.

In talking with a number of Romanian companies who implemented SQL Server as their data warehouse we have found that many of them have implemented Standard Edition with no partitioning of the largest fact tables.

Further, we have often found that the tables are stored in a single file group, often “Primary”, and often in tables with clustered indexes.

These three things all serve to slow down the performance of your data warehouse.

In SQL Server it is best to implement the large fact tables as follows:

  • Partition the tables, usually by month.
  • Place the different partitions for both indexes and data on different disks to spread the workload of queries for recent data over a number of disk drives.
  • To avoid the use of clustered indexes and place the data in to separate files using file groups to direct the placement of the data.

Though this does take some time and effort to set up correctly and to maintain over the years. The major benefits are:

  • Superior performance.
  • Ease of maintenance of the much smaller tables and indexes.
  • The much lower cost of the Standard Edition License either as a per core purchase or a Client Access License purchase.

Obviously, here in Romania, most companies would like to be able to pay the license fee for Standard Edition if they can get “nearly the same” performance as they could get from the Enterprise Edition.

Table partitioning is the single largest performance improvement feature available in Enterprise Edition that is not available in Standard Edition.

Partitioning also improves the ability to manage the growth of the data warehouse over time. As years and years of data builds up in the data warehouse partitioning means only the newer tables are updated. Tables and files are smaller. Indexes are smaller.

The benefits of implementing partitioning are widely documented.

What is not widely documented in how to implement manual partitioning on SQL Server. Certainly MicroSoft has no interest in explaining to you how this is done!

We have written a very detailed white paper to show you how to manually partition tables in SQL Server Standard Edition to get “almost the same” performance as the Enterprise Edition partitioning feature provides.

If you would like a copy of this detailed white paper please just put your best email address in to the form below and we will send you a link where you can download this document.