Sunday, May 15, 2011

DIGIDAY:TARGET, or, Yogi Berra Meets Data in the Online World.

I was scheduled to attend the DIGIDAY:TARGET conference on May 4 but wasn't able to be there. (Download the conference agenda and presentations.) Happily, my colleague and big-data guru Matt Doering was able to take my place. Here are Matt's thoughts:

Yogi Berra meets data in the online world.

At the recent Digiday:Target conference (Park Central Hotel, NYC, May 4 2011) a moderator posed the question “Which is better: More Data, Consistent Data or Data Expertise”. Not surprisingly there was a wide variety of opinions both from the panel as well as from many attendees I talked to later in the day. Many I listened to were really intrigued and conflicted by this question. To understand the real answer let us first review the pros and cons of the three possible answers.


More Data – Large volumes of data from varied sources.
• Richer data content from any given data source.
• Data sources tend to enrich each other, if properly managed.
• More likely to find the outliers that many times can be the real profit makers.

• Many companies don’t have the resources to handle very large volumes of data.
• Lack of Metadata about data sources.
• No real experience with merging multiple data sources with different element codes and timeframes.
• Data hygiene can be an issue if you are working with a data set that is new to your organization.

Consistent Data – All data conforms to some industry standard. Any data not conforming to the model is discarded or reduced.
• All data is easily understood and documented in a Metadata stack.
• Data hygiene is easy to define and enforce.
• Data processing performance profiles are well understood. This makes it very easy to scope a system or project.

• Let’s admit it; all homogenized milk tastes the same. Where is the differentiation potential?
• In the process of conforming to a standard more detailed data is lost. For example if the industry standard requires that age elements be bucketed into 10 year breaks what happens if for your product offering you need 6.5 year breaks?

Data Expertise – Deep experience with very large data sets.
• Small data, large data, inconsistent data are not a problem. Expertise can handle all these issues.
• These resources understand the role that standardized data plays in data analysis (like a good coat of primer on a wall) but also know that the real value is in what is different.
• Most data experts love to teach so the entire data IQ of you organization increases.
• Able to distinguish between dirty data and gold nuggets.

• These resources can be hard to find. It’s not a matter of having the right degree its more of who they are. Just as simply having a degree in fine arts doesn’t make you an artist a degree in stats doesn’t make you a good data scientist. In fact one of the best data scientists I know never took a stats course.

“Its déjà vu all over again”

Yogi had it right. If, as I strongly believe, data expertise is of critical importance for the media world it’s not the first industry where this is true. A number of industries over the past 25 years have had to deal with the “big data” problem. Early examples of this are the classic CPG scanner data, pharmaceutical detailing data and financial services direct marketing data sets. All these industries faced large and diverse data issues and they all succeeded in overcoming the problem with technique not CPU.

Now it might be tempting to claim that our space generates significantly higher volumes of data or more diverse data, but is that really true? At first this appears to be true, but when you factor in the computing power available at the times it is not that far fetched to say the adjusted data volumes are actually very similar. Keep in mind that the data scientists of those days were working with computers with less horse power and memory then the average iPad used by the majority of attendees at Digiday:Target.

So where do you find this expertise? Look to the industries named above. Membership of the Direct Marketing Association and those who attended the NCDM (National Center for Database Marketing) is a good place to start. Look for people from the telecommunications industry who helped build systems to analyze Call Detail Records (CDRs). Experience in genome sequencing and pairing should grab your attention. Do these people know clicks from conversions? Probably not, but on the other hand for them more data is the breath of life. We need to recruit the talent that is out there into the industry and avoid having to reinvent it “all over again”.

No comments: