Posted: December 9, 2020

Hello and welcome back to our Data Protection blog.  Thanks to the complexity of the topic, we’re adding a bonus fourth installment/wrap-up and really wanted to get it posted before everyone gets too wrapped-up in holiday shopping and gifts.  So, grab a nice mug of hot cocoa (hopefully with marshmallows) or mulled cider and let’s dig into this complex topic!  We will, however, apologize in advance due to the length of this installment.  There’s a lot of ground to cover.

First and foremost, if you’re just joining us or need a quick refresher, we gave an intro to the topic of Data Protection in Part One, an overview of the pillars that make up the foundation of a Data Protection practice in Part Two and some of the challenges we see customers encounter that can undermine a practice in Part Three.

If you’re like me, and don’t want to bother opening more browser tabs and just need a quick refresher to jog your memory, here are the main points:

Practice Pillars:

  • Discovering and categorizing existing data and its owners
  • Ensuring the right people have access to it and the wrong people don’t
  • Monitoring access to and movement of the data
  • Categorizing and protecting new data appropriately
  • Preventing unauthorized exfiltration


  • Executive sponsorship
  • Government regulations
  • Data categorization
  • Staffing

Given the set of goals and some of the pitfalls we need to plan for, the final question to answer is: “how do I get there?”  Well, I’m glad you asked because we’re here to help.

While all of these things could conceivably be done manually, it’s a bit unrealistic to expect staff to scurry around behind users making sure paper policies are adhered to.  With that in mind, there are several products on the market to facilitate, enforce, and report on compliance.  They’re generally broken down into these categories (which, naturally, don’t line up with the pillars):

  • Data Discovery
  • Data Categorization
  • Tagging
  • Endpoint DLP
  • Network DLP
  • Edge DLP

It should be noted that there is an entire set of products focused on cloud-based data protection (CASB / SASE); discussion of this topic is well outside the scope of this series.  It is, however, a deep rabbit-hole we could investigate if there is sufficient interest.

Before we dig into the details of each of these categories, I will warn you, gentle reader, that this is an area that represents a significant challenge (and time-sink) for us as Solutions Architects.  Beyond the usual question of going with a “best-of-breed” or “platform” approach, there’s also the substantial overlap between products—even those that are best-of-breed.  As an example, a product that is a best-of-breed tagging product probably really should do some data discovery as well.  That will, of course, overlap with a best-of-breed discovery product that may not do a good job of tagging.

Having said that, let’s dig into each of the product categories:

  • Data Discovery: Products in this category are generally installed in an environment and turned loose on your data. Of course, defining the term “data” in this context is pretty important.  Are we just talking about free-form (unstructured) documents?  PDF files?  Scanned images that require OCR?  What about structured data such as relational databases?  Encrypted data can be a challenge here as well.

Blurring lines, products in this category generally pull from the Data Categorization realm and work to recognize “interesting” data, identify and report on what users have access to it and assist in securing it through filesystem or share permissions.

These products may or may not have a means to Tag data for use with the various DLP mechanisms.

  • Data Categorization: This is less of a product category and more of a feature that many products share. We do, however, break this out separately as the capabilities of products that carry this feature can vary greatly.

At its core, this really boils down to string matching and (hopefully) context.  Using tools like regular expressions, products will work to identify strings such as PII (personally-identifiable information) or credit card numbers.  Formatting rules and (in some products) contextual clues can help reduce false positives.

Additionally, most products that categorize data will offer the ability to do custom patterns to help categorize documents with data like customer numbers or references to intellectual property, confidential information, or internal projects.

  • Tagging: Hand-in-hand with categorization and possibly discovery is tagging. In short, this is the process of “fingerprinting” a piece of data in such a way that its categorization can be identified in the future without going through the entire process again.  This can take place as part of the discovery process in an automated fashion (if the product supports it) or at the time of creation as an interactive process with the user.

It should be noted that the tagging process is a modification to the file.  This can either be in the form of document metadata (part of the file’s header) or a change to the visible contents of the file itself (such as a watermark or footer in a Word document).  This also means that the tagging product will be limited in its ability to tag a document based on the document types it supports.  Because of this, the ability to tag structured data (such as a row in a mySQL database) will likely be limited or impossible without application changes.

As tagging is a means to provide a quick way to identify a document’s categorization without having to have the ability to process the document’s contents, it becomes a critical part of and a nice segue to the DLP product categories.

  • Endpoint DLP: When most people think of data protection, DLP is usually what comes to mind. With Endpoint DLP solutions you’re putting a focus on wrapping a layer of protection of sensitive data found on the endpoint – which is arguably the primary tool where business is conducted in your organization. This can be considered a laptop or desktop computer or a mobile device such as an iPhone, Android phone, or tablet. Generally speaking, these tools are the main source of data loss and data theft.

Implementing an endpoint DLP solution can seem daunting to many organizations because of the wide reach and coverage needed to achieve proper functionality. These solutions typically require an agent to be installed on every endpoint in the organization. This agent will also require a certain level of care and feeding – regular updates may be required and the proper tuning of policies will need to be carefully considered in order to achieve a state of “function, not hinderance” to the user.

  • DLP in Motion: A major benefit to endpoint DLP is that its functionality is generally not dependent on your organization’s network to function. Policies are applied to the endpoint from a centrally managed interface and these policies will continue to protect the endpoint whether it’s residing in a company building or on the go anywhere in the world.

With the recent trends of the growing remote workforce it is imperative that your organization’s data remain safe no matter where its physical location may be. As mentioned previously, it is crucial to identify data and assets to determine your risk landscape before applying any solutions or policies to protect them. Typically, solutions put in place to protect data in motion inspects network traffic for sensitive information that your organization has identified and wraps a safeguard around it to prevent it from leaving the bounds of this organization. This type of protection can include data traversing the network, data being transferred to a USB device, or data being uploaded to services like GitHub or Pastebin.

  • Edge DLP: Edge DLP and protecting data in motion often go hand-in-hand and most of the same solutions can be applied to both theories. With edge DLP, data is monitored, detecting, and potentially blocked from leaving the bounds of your organization while it is in motion over a network connection. This technology is generally used to prevent sensitive data from being transferred outside of a corporate network and policies are enforced in a similar fashion to many of the aforementioned solutions – the main difference being network traffic inspection.

Enforceable policies through the edge can include network inspection and will allow, prompt, block, encrypt, reroute, and quarantine sensitive data prior to artifacts prior to exiting stage left into the wrong hands. Protecting data at the edge, along with data on the endpoint, is crucial to ensuring your sensitive data stays within the reigns of your control. Policy enforcement can be as strict or as lax as your organization deems suitable. A conversation we routinely have with our customers who are exploring their data protection strategy usually starts with exactly that – strategizing how we can assist with protecting this data through policy enforcement while still letting your employees have a somewhat seamless workday. Security does not mean convenience – but security does not have to mean hindering your employees to the point of intolerable inconvenience.

Through this series of posts exploring the overall strategy of Data Protection, we have presented multiple means of securing your organization’s artifacts. Although there are many pillars to consider when building out your Data Protection program – we hope we’ve provided some clarity to each of those. Whether you’ve got a finely tuned Data Protection program humming away, or your organization is exploring new and innovative ways to wrangle its data in today’s ever-changing workplace, please reach out to discuss how we can help you keep your data where it belongs – in your hands.


This blog was written by Nick DiPasquale and Brett Wyer, Senior Solutions Architects at Set Solutions.