Development Issues With ETL
We have had many years of ETL development experience across many tools, most notably DataStage and Informatica.
As good as these tools are, the development times of the ETL mappings are relatively slow.
The answer for most companies has been to reduce the cost of development by hiring cheaper people, especially from the Indian outsourcers such as Tata, Wipro, InfoSys and others. The booming IT developer market in India has been largely a result of companies sending development work to India and ETL development work is no different.
As a result of this trend the ETL tools developers responded with creating graphical user interfaces that made it possible for a relatively low skilled, low paid developers to develop ETL mappings.
This has reduced the cost of ETL development but it did not particularly reduce the time required to perform ETL development. It did not improve the quality of ETL development either.
During our start up phase we will support ETL developed in DataStage or Informatica. However, we would openly admit we are no faster than any other good DataStage / Informatica developers. This is because the nature of the tools drives the development time, not the skill of the user of the tools.
We took a different approach and we have our own BIDA ETL software.
We took the approach of “make highly skilled people more productive writing ETL” rather than “enable less skilled people to write the ETL.”
Not only does this reduce the cost of ETL development, it improves the quality of the ETL subsystem because the people doing the development work are more skilled and experienced. It is normal for someone with much more skill and much more experience to do higher quality work than a less experienced person. It’s why we pay them more.
So our approach was to find a way to enable highly skilled, highly paid developers to develop ETL faster, cheaper, better than less skilled people who were paid less. We wanted to find a way where the overall cost of such development was considerably less than hiring the “Indian Outsourcing Company”.
And this is what we have done.
The #1 feature of the ETL tools is that they have a graphical user interface to make the development of the ETL possible by less skilled, lower paid developers.
The #1 feature of our BIDA ETL tool is that we avoided having a graphical user interface so as to make the more highly skilled, more highly paid, developers more productive.
It really is as simple as that.
We know that we can develop and support ETL systems for much less cost than anyone else in Romania can. Indeed, we can build ETL systems so quickly and so cost effectively we can migrate the finished BIDA ETL system to Informatica or DataStage if that is what you want and still perform that development work fast than anyone else.
Collapse of Data Structures Plus Processing Algorithms
Because the ETL tools rely on a graphical user interface in order to develop their mappings it is necessary to collapse the data structure and the processing algorithms in to one “program object” which we will call a “mapping”.
Many people do not understand what we mean by that so please allow us to explain.
When you write, say, a fact table mapping, you must define the source, you must define the target, and you must define the lookups for the dimension string keys from the source to the integer keys that will be put on the fact table. All fact table processing with dimensional models looks like this.
In ETL tools you must code those lookups inside the mapping along with any other processing you wish to do for the fact table processing. This means your algorithms for your lookup processing are inside the mapping.
Although this seems obvious and is therefore accepted by everyone, it is a serious issue that slows down development considerably.
Why? Because those same lookups, such as the lookup for dates, times, customers, products, geography, company structure etc are common across MANY fact tables. So the lookups must be coded by the ETL developer each time for each lookup for each fact table.
In a large data warehouse with 50+ fact tables you could be coding upwards of 500 such lookups, all of which have to be tested individually for correctness. And that takes developer time.
The far better way to develop ETL is to separate the data structures from the processing algorithm. This is, in fact, one of the most fundamental development strategies that has been taught for decades.
Always separate the data structures from the algorithms so that you have the algorithm coded only once but you can give the algorithm many data structures to perform it’s processing on.
And this is the approach we have taken. In BIDA ETL the developer does not need to concern himself with the algorithm that will be applied to the data structures he is creating in mappings. The algorithm is separated from the data structures.
This provides a very large productivity boost and that boost is linear to the amount of mappings to be developed.
To explain simply?
With standard ETL tools using a graphical interface the ETL developer must code the algorithm for the ETL processing in to the graphical user interface as well as code the sources and target data structures for processing.
With BIDA ETL the ETL developer only codes the sources and target data structures and processing that is unique to that mapping. He does not code the algorithm for the processing of the data. That is separated and not in the hands of the ETL developer.
This is the fundamental difference that makes BIDA ETL development faster, cheaper, better and more reliable.