Thursday, August 27, 2020

Software Review: BigID for Privacy Data Discovery

Until recently, most marketers were content to leave privacy compliance in the hands of data and legal teams. But laws like GDPR and CCPA now require increasingly prominent consent notifications and impose increasingly stringent limits on data use. This means marketers must become increasingly involved with the privacy systems to ensure a positive customer experience, gain access to the data they need, and ensure they use the data appropriately. 

I feel your pain: it’s another chore for your already-full agenda.  But no one else can represent marketers’ perspectives as companies decide how to implement expanded privacy programs.  If you want to see what happens when marketers are not involved, just check out the customer-hostile consent notices and privacy policies on most Web sites.

To ease the burden a bit, I’m going to start reviewing privacy systems in this blog. The first step is to define a framework of the functions required for a privacy solution.   This gives a checklist of components so you know when you have a complete set. Of course, you’ll also need a more detailed checklist for each component so you can judge whether a particular system is adequate for the task. But let’s not get ahead of ourselves. 

At the highest level, the components of a privacy solution are:

  • Data discovery.  This is searching company systems to build a catalog of sensitive data, including the type and location of each item. Discovery borders on data governance, quality, and identity resolution, although these are generally outside the scope of a privacy system. Identity resolution is on the border because responding to data subject requests (see next section) requires assembling all data belonging to the same person. Some privacy systems include identity resolution to make this possible, but others rely on external systems to provide a personal ID to use as a link.

  • Data subject interactions.  These are interactions between the system and the people whose data it holds (“data subjects”).  The main interactions are to gather consent when the data is collected and to respond to subsequent “data subject access requests” (DSARs) to view, update, export, or delete their data. Consent collection and request processing are distinct processes.  But they are certainly related and both require customer interactions.  So it makes sense to consider them together. They are also where marketers are most likely to be directly involved in privacy programs.

  • Policy definition.  This specifies how each data type can be used.  There are often different rules based on location (usually where the data subject resides or is a citizen, but sometimes where the data is captured, where it’s stored, etc.), consent status, purpose, person or organization using the data, and other variables. Since regulations and company policies change frequently, this component includes processes to identify changes and either automatically adjust rules to reflect them or alert managers that adjustments may be needed.

  • Policy application.  This monitors how data is actually used to ensure it complies with policies, send alerts if something is not compliant, and keep records of what’s done. Marketers may be heavily involved here but more as system users than system managers. Policy application is often limited to assessing data requests that are executed in other systems but it sometimes includes actions such as generating lists for marketing campaigns. It also includes security functions related specifically to data privacy, such as rules for masking of sensitive data or practices to prevent and react to data breaches. Again, security features may be limited to checking that rules are followed or include running the processes themselves. Security features in the privacy system are likely to work with corporate security systems in at least some areas, such as user access management. If general security systems are adequate, there may be no need for separate privacy security features. 

Bear in mind that one system need not provide all these functions.  Companies may prefer to stitch together several “best of breed” components or to find a privacy solution within a larger system. They might even use different privacy components from several larger systems, for example using a consent manager built into a Customer Data Platform and a data access manager built into a database’s core security functions. 

Whew.

Now that we have a framework, let's apply it to a specific product.  We'll start with BigID.

Data Discovery

BigID is a specialist in data discovery. The system applies a particularly robust set of automated tools to examine and classify all types of data – structured, semi-structured, and unstructured; cloud and on-premise; in any language. For identified items, it builds a list showing the application, object name, data type, server, geographic location, and other details. 

Of course, an item list is table stakes for data discovery.  BigID goes beyond this to organize the items into clusters related to particular purposes, such as medical claims, invoices, and employee information. It also draws maps of relations across data sources, such as how the transaction ID in one table connects to the transaction ID in another table (even if the field names are not the same). Other features highlight data sources holding sensitive information, alert users if these are not properly secured from unauthorized access, and calculate privacy risk scores. 

The relationship maps provide a foundation for identity resolution, since BigID can compare values across systems to find matches and use the results to stitch together related records. The system supports fuzzy as well as exact matches and can compare combinations of items (such as street, city, and zip) in one rule.  But the matching is done by reading data from source systems for one person at a time, usually in response to an access request. This means that BigID could assemble a profile of an individual customer but won’t create the persistent profiles you’d see in a Customer Data Platform or other type of customer database. It also can’t pull the data together quickly enough to support real-time Web site personalization, although it might be fast enough for a call center. 

In fact, BigID doesn’t store any data outside of the source systems except for metadata.  So there's no reason to confuse it with a data lake, data warehouse, CRM, or CDP.

Data Subject Interactions

BigID doesn’t offer interfaces to capture consent but does provide applications that let data subjects view, edit, and delete their data and update preferences. When a data access request is submitted, the system creates a case that is sent to other systems or people to execute. BigID provides a workflow to track the status of these cases but won’t directly change data in source systems. 

Policy Definition 

BigID doesn’t have an integrated policy management system that lets users define and enforce data privacy rules. But it does have several components to support the process:

  • "Agreements" let users document the consent terms and conditions associated with specific items. This does not extend to checking the status of consent for a particular individual but does create a way to check whether a consent-gathering option is available for an item.

  • “Business flows” map the movement of data through business processes such as reviewing a resume or onboarding a new customer. Users can document flows manually or let the system discover them in the data it collects during its scan of company systems. Users specify which items are used within a flow and the legal justification for using sensitive items. The system will compare this with the list of consent agreements and alert users if an item is not properly authorized. BigID will also alert process owners if a scan uncovers a sensitive new data item in a source system.  The owner can then indicate whether the business flow uses the new item and attach a justification. BigID also uses the business flows to create reports, required by some regulations, on how personal data is used and with whom it is shared. 

  • “Policies” let users define queries to find data in specified situations, such as EU citizen data stored outside the EU. The system runs these automatically each time it scans the company systems. Query results can create an alert or task for someone to investigate. Policies are not connected to agreements or business flows, although this may change in the future. 

Policy Enforcement

BigID doesn’t directly control any data processing, so it can’t enforce privacy rules. But the alerts issued by the policy, agreement, and business flow components do help users to identify violations. Alerts can create tasks in workflow systems to ensure they are examined and resolved. The system also lets users define workflows to assess and manage a data breach should one occur. 

Technology 

 As previously mentioned, BigID reads data from source systems without making its own copies or changes any data in those systems. Clients can run it in the cloud or on-premises. System functions are exposed via APIs which let the company, clients, or third parties build apps on top of the core product. In fact, the data subject access request and preference portal functions are among the applications that BigID created for itself. It recently launched an app marketplace to make its own and third party apps more easily available to its clients. 

Business 

BigID has raised $146 million in venture funding and reports nearly 200 employees. Pricing is based on the number of data sources: the company doesn’t release details but it’s not cheap. It also doesn’t release the number of clients but says the count is “substantial” and that most are large enterprises.

No comments: