Thursday, November 29, 2007

Low Cost CDI from Infosolve, Pentaho and StrikeIron

As I’ve mentioned in a couple of previous posts, QlikView doesn’t have the built-in matching functions needed for customer data integration (CDI). This has left me looking for other ways to provide that service, preferably at a low cost. The problem is that the major CDI products like Harte-Hanks Trillium, DataMentors DataFuse and SAS DataFlux are fairly expensive.

One intriguing alternative is Infosolve Technologies. Looking at the Infosolve Web site, it’s clear they offer something relevant, since two flagship products are ‘OpenDQ’ and ‘OpenCDI’ and their tag line is ‘The Power of Zero Based Data Solutions’. But I couldn't figure out exactly what they were selling since they stress that there are ‘never any licenses, hardware requirements or term contracts’. So I broke down and asked them.

It turns out that Infosolve is a consulting firm that uses free open source technology, specifically the Pentaho platform for data integration and business intelligence. A Certified Development partner of Pentaho, Infosolve has developed its own data quality and CDI components on the platform and simply sells the consulting needed to deploy it. Interesting.

Infosolve Vice President Subbu Manchiraju and Director of Alliances Richard Romanik spent some time going over the details and gave me a brief demonstration of the platform. Basically, Pentaho lets users build graphical workflows that link components for data extracts, transformation, profiling, matching, enhancement, and reporting. It looked every bit as good as similar commercial products.

Two particular points were worth noting:

- the actual matching approach itself seems acceptable. Users build rules that specify which fields to compare, the methods used to measure similarity, and similarity scores required for each field. This is less sophisticated than the best commercial products, but field comparisons are probably adequate for most situations. Although setting up and tuning such rules can be time-consuming, Infosolve told me they can build a typical set of match routines in about half a day. More experienced or adventurous users could even do it for themselves; the user interface makes the mechanics very simple. A half-day of consulting might cost $1,000, which is not bad at all when you consider that the software itself is free. The price for a full implementation would be higher since it would involve additional consulting to set up data extracts, standardization, enhancement and other processes, but cost should still be very reasonable. You’d probably need as much consulting with other CDI systems where you'd pay for the software too.

- data verification and enhancement is done by calls to StrikeIron, which provides a slew of on-demand data services. StrikeIron is worth knowing about in its own right: it lets users access Web services including global address verification and corrections; consumer and business data lookups using D&B and Gale Group data; telephone verification, appends and reverse appends; geocoding and distance calculations; Mapquest mapping and directions; name/address parsing; sales tax lookups; local weather forecasts; securities prices; real-time fraud detection; and message delivery for text (SMS) and voice (IVR). Everything is priced on a per use basis. This opens up all sorts of interesting possibilities.

The Infosolve software can be installed on any platform that can run Java, which is just about everything. Users can also run it within the Sun Grid utility network, which has a pay-as-you-go business model of $1 per CPU hour.

I’m a bit concerned about speed with Infosolve: the company said it takes 8 to 12 hours to run a million record match on a typical PC. But that assumes you compare every record against every other record, which usually isn’t necessary. Of course, where smaller volumes are concerned, this is not an issue.

Bottom line: Infosolve and Pentaho may not meet the most extreme CDI requirements, but they could be a very attractive option when low cost and quick deployment are essential. I’ll certainly keep them in mind for my own clients.


Unknown said...

Hi David,
To clarify the relationship, Subbu & Rich are executives of Infosolve, not Pentaho. Infosolve is a Pentaho Certified Systems Integration partner. We often work together to bring the value and benefits of Pentaho Data Integration open source BI software, their OpenDQ solution, and our mutual services to customers looking for these types of solutions.

Best Regards,
Dave Mohr
Pentaho Corporation

Unknown said...

Thank you for the post. I recently was looking at CDI solutions as well and have come across a number of vendors that are using StrikeIron services as the power behind some of their offerings. Two of interest are crmfusion and ratchetsoft. Crmfusion has SAAS offerings for dupe checking and data supplementation. Ratchetsoft's RatchetX allows user to integrate CDI functions into any software without requiring changes. Both work a look and

Al Goodwin

David Raab said...

Regarding the David Mohr comment: I confused things a bit in the original version of this post. I have since corrected the body text. Sorry for the confusion.

Talend-Community Manager said...

Hi David. I am actually on the lookout for a new Business Intelligence solution and have looked through a few so far.

Pentaho seems to be really a good open source application: I have tried and enjoyed it a lot. I have heard about a few other companies such as Jaspersoft and Talend. They both seem quite interesting as they both are good open source applications.

So far I have been using Talend a little. I enjoy their Talend open studio as it is user-friendly (I especially like the tmap component in Open Studio), has a good GUI and is possible to perform data profiling and integration. The tool also has a good debugging system and an active community (the forum is good!) that has helped me with the application.

Thanks for the post!