To borrow a metaphor from Winston Churchill, for the mere mortal, the selection of application/platform/enterprise integration tools, is often “a riddle wrapped in a mystery inside an enigma”. If your conversation with your enterprise architect left you disoriented and you feel like the right choice of enterprise integration tools requires mastery of ancient Sumerian incantations, then you’ve come to the right place. Sit back and relax. In this posting, I’ll try to demystify the black art of choosing between ETL and ESB middleware technologies, discuss their operational requisites, and touch on the advantages of each.
ESB and ETL under the Microscope
ETL (Extract-Transform-Load) and ESB (Enterprise Service Bus) share some conceptual and operational similarities but are used structurally for different purposes. ESB is used typically as the mediation layer in SOA implementations, whereas ETL is used as a data transfer/transformation engine from one database/data depository to another.
A good analogy for how/what type of service bus to choose, (ESB, ETL, messaging, queue, etc.), is the motherboard architecture in PCs and laptops. Your ability to add new services to your platform, (i.e. a new data source), is similar to the strategy you would use to plug future hardware into your computer. Two common methods are PCI and USB.
Because PCI sits on the main bus and operates close to the on-board controllers and processor, it can replace existing hardware, (like the default video card, or even the service bus itself). PCI gives you tremendous performance advantages, but to use it you have to master many OS and hardware nuances and deal with some low level plumbing to boot. On the other hand, if you have a laptop, the motherboard architecture doesn’t support PCI, (that’s what makes the laptop portable), so the most common way to expand it is via USB. Due to its ‘universal’ nature, USB doesn’t provide the same low level system integration as PCI, but it does solve most common peripheral integration needs.
To the ends user, the end result of the PCI and USB architectures is almost identical. Both allow him to add new hardware to his computer, but each has implementation and core missions which are significantly different. The same applies to ESB, (products like OSB or WebSphere), and business process integration tools, (products like Talend and Ab-Initio). Whereas an ESB is measured by its deep coupling, ETL/BPI tools are measured by their ability to add new data sources and create faster data mappings and transformations.
Viva la petite difference
The differences between ESB and ETL are important when choosing an enterprise integration strategy. If your platform is SOA ready, it has well defined interfaces and you know what the operational and physical boundaries of your solution are, (inside a firewall, on same network, etc.). A full-fledged ESB, (the PCI equivalent), is the most efficient way to go, because it will give you high volume data flow, synchronous messaging, orchestration, and reliable pub/sub capability. But if you are just looking to solve simple integration issues like connecting several legacy applications to your backend, an ESB is overkill and a solution like ETL, (the USB equivalent), would do.
ETL and ESB Use Cases
A good use case for how ESBs should be used is where you have various web services that you want to expose internally and externally. You can host these web services on a ESB as an external interface to allow other Systems/Service to access them. If for some reason, the interface of the Web Services changes, you can handle it via the ESB itself by using transformation, so that external services/systems don’t have to change the service interface. Your ESB will still be able to receive data required by the previous interface, transform it and call a new interface so that no other system/service will be impacted by the web interface changes.
ETL on the other hand has a relatively simple use case selection criteria. If you all you want to do is achieve mostly the following functions, this should be your tool of choice:
- Move and transform data across multiple data sources
- Normalize data (you can also achieve this via an ESB but it won’t be as powerful as via an ETL unless your write some heavy duty wrapper code)
- Improve or replace your batch processing engines
- Improve efficiency and scale of your mass data loading operations.
- ESB Selection Process
There are no hard and fast rules for rolling out an ESB, but it helps if your systems/platforms already have the following capabilities:
- Most of your components speak or understand XML
- You have abstracted interfaces
The platform components have basic SOA capabilities, (services adhere to agreements, are loosely coupled, hide logic from the outside world, are stateless, and are discoverable)
Majority of manual and hard wired business processes like batch jobs have been wrapped in by web services, (preferably using REST)
You need to orchestrate communications between internal applications, (i.e. ERP to CRM) or to build composite applications, (web portal presenting data from various applications)
Integrate information, (not only data), sources from heterogeneous environment, (intranet, extranet, line-of business applications)
Operate on real-time data and service many messages with small data sizes, (i.e. single bank transaction, single order information, etc.)
Now, assuming that you do have many of the above mentioned capabilities and you know that in the future you will need to achieve some of the following objectives, then an ESB is a good fit for your needs:
- Integrate many more applications (from inside and outside of your enterprise)
- Plug more services into our platform, (your solution will constantly evolve)
- Need more core capabilities (communication protocols, real-time support, transactional capabilities, etc.)
- Handle different types of messages (content based routing, message forking, aggregation, etc.)
- Work with SLA based services, (provide audit, tracking, reconciliation, etc.)
- Leverage/coexist with unknown technology stacks and architectures
- Develop a data supply chain or business solutions that require complex orchestrations
A Final Note
One of the biggest challenges of forcing ESB into an existing environment is that ESB functions optimally in OSA based architecture, but it isn’t a prerequisite for it. Due to the fact that most ESBs also provide integration tools, they can encourage architects to cut corners, and instead of taking the time to tear down application silos and rebuild them according to SOA guidelines, they just use the ESB to patch-up these silos and continue to operate them indefinitely. Obviously, this can becomes an attractive proposition when a project is late and under pressure to integrate fast.