Monday, June 28, 2010

Saffron Technology Organizes Data into Memories

Summary: Saffron Technology provides an analytical database that explores relationships among entities and their attributes. It can explore networks, find similarities and guide decisions. Saffron is considerably more flexible than standard semantic engines.

As I was preparing my June 3 review of link analysis vendor Centrifuge Systems, Centrifuge introduced me to their business partners at Saffron Technology. This is an interesting product in its own right.

Saffron describes itself as an “associative memory base product,” a phrase that definitely takes some explaining. In simplest terms, Saffron organizes information into sets of three or four related items.

More specifically, it stores pairs of items in the context of an entity. For example, “Jack-climb-hill”, “Jack-plant-beanstalk”, “Jack-jump-candlestick” and “Jack-build-house” are all part of what Saffron calls Jack’s “memory”.

Related sets can be grouped into "matrices" that share another item. Thus, one matrix could contain “Jack-Jill-up-hill”, “Jack-Jill-fetch-water”, “Jack-Jill-fall-down” and “Jack-Jill-break-crown”, while “Jack-beanstalk-plant-seed”, “Jack-beanstalk-climb-up”, “Jack-beanstalk-steal-harp” and “Jack-beanstalk-kill-giant” form a separate matrix.

Saffron physically prejoins the data in Jack’s memory so it can be accessed easily and shared elements can be stored only once. Jill’s memory is stored separately from Jack’s even though some sets contain the same data. Jill’s memory may also contain sets without Jack (but let’s not tell him). Depending on how the system is configured, the hill could have its own memory as well.

Saffon’s approach lets it handle the subject-verb-object triples used in semantic analysis. But unlike standard semantic triplestores, Saffron is not limited to this structure. Sets could contain all nouns (Jack-Jill-hill), which can be useful even if the precise relationship among the items isn’t known. Or one of the items could be a time dimension. The system also counts of how often each item pair occurs in the source data, supporting statistical as well as semantic analysis.

These features make Saffron substantially more flexible than a semantic system. They also let it work with many more data sources, since reliable subject-verb-object relationships are often unavailable.

This may sound pretty dry, but Saffron's actual applications have been cloak-and-dagger stuff like looking for terrorists and finding roadside bombs. Remember that I was introduced to Saffron by Centrifuge, whose link analysis system is used primarily for law enforcement and security investigations. Saffron’s approach works particularly well with the Centrifuge front-end.

The fundamental advantage of Saffron is that associations among different items are directly accessible for analysis. This lets the system support different types of queries including:

- connections (identifying relationships among items)
- networks (showing how entities connect with each other)
- analogies (finding entities with similar connections)
- classifications (placing similar entities into groups)
- trends (reporting how connections change over time)
- episodes (finding patterns that repeat over time)

Real-world applications extend beyond link analysis to classifications and decisions. For example, the system can select medical treatments for a particular patient or assign suspended loans to different collection processes. These are complex processes. The system must identify entities with similar characteristics, identify the treatments and outcomes for those entities and estimate the likely outcomes of applying different treatments to the current entity.

Saffron's advantage with such applications is that the characteristics, treatments and outcomes are all just items in the data store. There's no need to load them differently or build a causal model of how they interact. In other words, the system needs no inherent assumptions about how the world works. This lets it effortlessly incorporate new data, uncover hidden relationships and react to new situations.

Well, “effortlessly” is a bit of an exaggeration. Saffron does some pretty complicated calculations to decide which entities are most similar and which treatments have the highest expected value (i.e., outcome value x outcome probability). Users have to review results and make judgments to tune the system. But this is still much less work than conventional statistical modeling or rule-based systems.

Saffron’s technology lets users define the types of data to will store, load data into the system, and execute API calls for the different types of analysis (connections, networks, analogies, etc.) Saffron generally relies on external systems to identify entities within source data, classify them into the specified categories, and report their associations to Saffron. Saffron can load structured data as well.

Saffron runs on a 64 bit “soft appliance” that distributes its data over clusters of server drives. The company claims world-record performance at ingesting, storing and accessing triplestore data, as well as compression to about 20 bytes per triple compared with 50 to 150 bytes per triple in other triplestore systems.

But let’s not get too carried away: Saffron works with large but not gigantic databases. Customer systems have been in the one-terabyte range.

Pricing for Saffron is starts at $125,000 per server, where a standard server is an eight core machine with 16 gigabytes of RAM per core. A trial version of the system is also available on the Amazon EC2 cloud as SaffronSierra.

No comments: