Login  |  Register  |  Contact

Data Loss Prevention

Thursday, May 20, 2010

Quick Wins with DLP Webcast Next Week

By Rich

Next week I will be giving a webcast to complement my Quick Wins with Data Loss Prevention paper. This is a bit different than when I usually talk about DLP -- it's focused on showing immediate value, while also positioning for long term success.

Like the paper it's sponsored by McAfee. We're holding it at 11am PT on May 25, and you can register by clicking here.

Here's the full description:

Quick Wins with DLP -- How to Make DLP Work for You
Date: May 25, 2010
Time: 11am PDT / 2pm EDT

When used properly, Data Loss Prevention (DLP) provides rapid identification and assessment of data security issues not available with any other technology. However, when not optimized, two common criticisms of DLP are 1) its complexity and 2) the fear of false positives. Security professionals often worry that DLP is expensive and will fail to deliver the expected value.

A little knowledge and some planning go a long way towards a fast, simple, and effective deployment. By taking some straightforward best practice steps, you can realize significant immediate value and security gains without negatively impacting your productivity or wasting valuable resources.

In this webcast you will learn how to:

  • Establish a flexible incident management process
  • Integrate with major infrastructure components
  • Assess broad information usage
  • Set a foundation for future focused efforts and policy tuning

You will also hear how Continuum Health Partners safeguards highly sensitive patient data with McAfee DLP 9. Join us for this informative presentation.

Presenters:

  • Rich Mogull, Analyst & CEO, Securosis, LLC
  • Mark Moroses, Assistant CIO, Continuum Health Partners
  • John Dasher, Senior Director, Data Protection, McAfee

–Rich

Monday, May 03, 2010

Optimism and Cautions on OpenDLP

By Rich

I'm starting to think I shouldn't take vacations. Aside from the Symantec acquisition of PGP and GuardianEdge last week, someone went off and released the first open source DLP tool.

It's called OpenDLP, and version 0.1 is currently available over Google Code. People have asked me for a long time why there aren't any FOSS DLP options out there, and it's nice to finally see someone put in the non-trivial effort and release a tool. DLP isn't easy to create, and Andrew Gavin deserves major credit for kicking off the project.

First, let's classify OpenDLP. It is an agent-based content discovery/data-at-rest tool. You install an agent on endpoints, which then scans local storage and sends results to a central management server. The agent is a C program, and the management server runs on Apache/MySQL. The tool supports regular expressions and scanning of plain text files.

Benefits

  • Free.
  • You can customize the code.
  • Communications are encrypted with SSL.
  • Supports any version of Windows you are likely to run.
  • Includes agent management, and the agent is designed to be non-intrusive.
  • Supports full regular expressions for building policies.

Limitations

  • Scans stored data on endpoints only. Might be usable on Windows servers, but I would test very carefully first.
  • Unable to scan non-plain-text or compressed files, including current versions of Office (the .XXXx XML formats).
  • No advanced content analysis -- regex only, which limits the types of content this will work for.
  • Requires NetBIOS... which some environments ban.
  • I have been told via email (not from a DLP vendor, for the record) that the code may be a bit messy... which I'd consider a security concern.

Thus this is a narrow implementation of DLP -- that's not a criticism, just a definition.

I don't have a large enough environment to give this a real test, but considering that it is a 0.1 version I think we should give it a little breathing space to improve. The to-do list already includes adding .zip file support, for example. I think it's safe to say that (assuming the project gathers support) we will see it improve over time.

In summary, this is too soon to deploy in any production capacity, but definitely worth checking out and contributing to. I really hope the project succeeds and matures.

–Rich

Thursday, April 22, 2010

Whitepaper Released: Quick Wins with Data Loss Prevention

By Rich

Two of the most common criticisms of Data Loss Prevention (DLP) that comes up in user discussions are a) its complexity and b) the fear of false positives. Security professionals worry that DLP is an expensive widget that will fail to deliver the expected value -- turning into yet another black hole of productivity. But when used properly DLP provides rapid assessment and identification of data security issues not available with any other technology.

We don't mean to play down the real complexities you might encounter as you roll out a complete data protection program. Business use of information is itself complicated, and no tool designed to protect that data can simplify or mask the underlying business processes. But there are steps you can take to obtain significant immediate value and security gains without blowing your productivity or wasting important resources.

In this paper we highlight the lowest hanging fruit for DLP, refined in conversations with hundreds of DLP users. These aren't meant to incorporate the entire DLP process, but to show you how to get real and immediate wins before you move on to more complex policies and use cases.

I like this paper, and not just because I wrote it. Short, to the point, with advice on deriving immediate value as opposed to kicking off some costly and complex process. This paper is the culmination of the Quick Wins in DLP blog series I posted, all compiled together with a pretty picture or two.

Special thanks to McAfee for licensing the report.

You can download the paper directly, or visit the landing page, where you can leave comments or criticism, and track revisions.

–Rich

Thursday, April 01, 2010

Hit the Snooze on Lancope’s Data Loss Alarms

By Rich

Update- Lanscope posted some new information positioning this as a compliment, not substitute, to DLP. Looks like the marketing folks might have gotten a little out of control.

I've been at this game for a while now, but sometimes I see a piece of idiocy that makes me wish I was drinking some chocolate milk so I could spew it out my nose in response the the sheer audacity of it all.

Today's winner is Lancope, who astounds us with their new "data loss prevention" solution that detects breaches using a Harry Potter-inspired technique that completely eliminates the need to understand the data. Actually, according to their extremely educational marketing paper, analyzing the content is bad, because it's really hard! Kind of like math. Or common sense.

Lancope's far superior alternative monitors your network for any unusual activity, such as a large file transfer, and generates an alert. You don't even need to look at packets! That's so cool! I thought the iPad was magical, but Lancope is totally kicking Apple's ass on the enchantment front. Rumor is your box is even delivered by a unicorn. With wings!

I'm all for netflow and anomaly detection. It's one of the more important tools for dealing with advanced attacks. But this Lancope release is ridiculous -- I can't even imagine the number of false positives. Without content analysis, or even metadata analysis, I'm not sure how this could possibly be useful. Maybe paired with real DLP, but they are marketing it as a stand-alone option, which is nuts. Especially when DLP vendors like Fidelis, McAfee, and Palisade are starting to add data traffic flow analysis (with content awareness) to their products.

Maybe Lancope should partner with a DLP vendor. One of the weaknesses of many DLP products is that they do a crappy job of looking across all ports and protocols. Pretty much every product is capable of it, but most of them require a large number of boxes with sever traffic or analysis limitations, because they aren't overly speedy as network devices (with some exceptions). Combining one with something like Lancope where you could point the DLP at target traffic could be interesting... but damn, netflow alone clearly isn't a good option.

Lancope, thanks for a great DLP WTF with a side of BS. I'm glad I read it today -- that release is almost as good as the ThinkGeek April Fool's edition!

–Rich

Monday, March 22, 2010

Some DLP Metrics

By Rich

One of our readers, Jon Damratoski, is putting together a DLP program and asked me for some ideas on metrics to track the effectiveness of his deployment. By 'ask', I mean he sent me a great list of starting metrics that I completely failed to improve on.

Jon is looking for some feedback and suggestions, and agreed to let me post these. Here's his list:

  • Number of people/business groups contacted about incidents -- tie in somehow with user awareness training.
  • Remediation metrics to show trend results in reducing incidents -- at start of DLP we had X events, after talking to people for 30 days about incidents we now have Y events.
  • Trend analysis over 3, 6, & 9 month periods to show how the number of events has reduced as remediation efforts kick in.
  • Reduction in the average severity of an event per user, business group, etc.
  • Trend: number of broken business policies.
  • Trend: number of incidents related to automated business practices (automated emails).
  • Trend: number of incidents that generated automatic email.
  • Trend: number of incidents that were generated from service accounts -- (emails, batch files, etc.)

I thought this was a great start, and I've seen similar metrics on the dashboards of many of the DLP products.

The only one I have to add to Jon's list is:

  • Average number of incidents per user.

Anyone have other suggestions?

–Rich

Wednesday, March 17, 2010

LHF: Quick Wins with DLP—the Conclusion

By Rich

In the last two posts we covered the main preparation you need to get quick wins with your DLP deployment. First you need to put a basic enforcement process in place, then you need to integrate with your directory servers and major infrastructure. With these two bits out of the way, it's time to roll up our sleeves, get to work, and start putting that shiny new appliance or server to use.

The differences between a long-term DLP deployment and our "Quick Wins" approach are goals and scope. With a traditional deployment we focus on comprehensive monitoring and protection of very specific data types. We know what we want to protect (at a granular level) and how we want to protect it, and we can focus on comprehensive policies with low false positives and a robust workflow. Every policy violation is reviewed to determine if it's an incident that requires a response.

In the Quick Wins approach we are concerned less about incident management, and more about gaining a rapid understanding of how information is used within our organization. There are two flavors to this approach -- one where we focus on a narrow data type, typically as an early step in a full enforcement process or to support a compliance need, and the other where we cast a wide net to help us understand general data usage to prioritize our efforts. Long-term deployments and Quick Wins are not mutually exclusive -- each targets a different goal and both can run concurrently or sequentially, depending on your resources.

Remember: even though we aren't talking about a full enforcement process, it is absolutely essential that your incident management workflow be ready to go when you encounter violations that demand immediate action!

Choose Your Flavor

The first step is to decide which of two general approaches to take:

  • Single Type: In some organizations the primary driver behind the DLP deployment is protection of a single data type, often due to compliance requirements. This approach focuses only on that data type.
  • Information Usage: This approach casts a wide net to help characterize how the organization uses information, and identify patterns of both legitimate use and abuse. This information is often very useful for prioritizing and informing additional data security efforts.

Choose Your Deployment Type

Depending on your DLP tool, it will be capable of monitoring and protecting information on the network, on endpoints, or in storage repositories -- or some combination of these. This gives us three pure deployment options and four possible combinations.

  • Network Focused: Deploying DLP on the network in monitoring mode provides the broadest coverage with the least effort. Network monitoring is typically the fastest to get up and running due to lighter integration requirements. You can often plug in a server or appliance over a few hours or less, and instantly start evaluating results.
  • Endpoint Focused: Starting with endpoints should give you a good idea of which employees are storing data locally or transferring it to portable storage. Some endpoint tools can also monitor network activity on the endpoint, but these capabilities vary widely. In terms of Quick Wins, endpoint deployments are generally focused on analyzing stored content on the endpoints.
  • Storage Focused: Content discovery is the analysis of data at rest in storage repositories. Since it often requires considerable integration (at minimum, knowing the username and password to access a file share), these deployments, like endpoints, involve more effort. That said, it's scan major repositories is very useful, and in some organizations it's as important (or even more so) to understand stored data than to monitor information moving across the network.

Network deployments typically provide the most immediate information with the lowest effort, but depending on what tools you have available and your organization's priorities, it may make sense to start with endpoints or storage. Combinations are obviously possible, but we suggest you roll out multiple deployment types sequentially rather than in parallel to manage project scope.

Define Your Policies

The last step before hitting the "on" switch is to configure your policies to match your deployment flavor.

In a single type deployment, either choose an existing category that matches the data type in your tool, or quickly build your own policy. In our experience, pre-built categories common in most DLP tools are almost always available for the data types that commonly drive a DLP project. Don't worry about tuning the policy -- right now we just want to toss it out there and get as many results as possible. Yes, this is the exact opposite of our recommendations for a traditional, focused DLP deployment.

In an information usage deployment, turn on all the policies or enable promiscuous monitoring mode. Most DLP tools only record activity when there are policy violations, which is why you must enable the policies. A few tools can monitor general activity without relying on a policy trigger (either full content or metadata only). In both cases our goal is to collect as much information as possible to identify usage patterns and potential issues.

Monitor

Now it's time to turn on your tool and start collecting results.

Don't be shocked -- in both deployment types you will see a lot more information than in a focused deployment, including more potential false positives. Remember, you aren't concerned with managing every single incident, but want a broad understanding of what's going on on your network, in endpoints, or in storage.

Analyze and PROFIT!

Now we get to the most important part of the process -- turning all that data into useful information.

Once we collect enough data, it's time to start the analysis process. Our goal is to identify broad patterns and identify any major issues. Here are some examples of what to look for:

  • A business unit sending out sensitive data unprotected as part of a regularly scheduled job.
  • Which data types broadly trigger the most violations.
  • The volume of usage of certain content or files, which may help identify valuable assets that don't cleanly match a pre-defined policy.
  • Particular users or business units with higher numbers of violations or unusual usage patterns.
  • False positive patterns, for tuning long-term policies later.

All DLP tools provide some level of reporting and analysis, but ideally your tool will allow you to set flexible criteria to support the analysis.

What Did We Achieve?

If you followed this process, by now you've created a base for your ongoing DLP usage while achieving valuable short-term goals. In a short amount of time you have:

  1. Established a flexible incident management process.
  2. Integrated with major infrastructure components.
  3. Assessed broad information usage.
  4. Set a foundation for later focused efforts and policy tuning to support long-term management.

Thus by following the Quick Wins process you can show immediate results while establishing the foundations of your program, all without overwhelming yourself by forcing unprepared action on all possible alerts before you understand information usage patterns.

Not bad, eh?

–Rich

Monday, March 15, 2010

LHF: Quick Wins in DLP, Part 2

By Rich

In Part 1 of this series on Low Hanging Fruit: Quick Wins with DLP, we covered how important it is to get your process in place, and the two kinds of violations you should be immediately prepared to handle. Trust us -- you will see violations once you turn your DLP tool on.

Today we'll talk about the last two pieces of prep work before you actually flip the 'on' switch.

Prepare Your Directory Servers

One of the single most consistent problems with DLP deployments has nothing to do with DLP, and everything to do with the supporting directory (AD, LDAP, or whatever) infrastructure. Since with DLP we are concerned with user actions across networks, files, and systems (and on the network with multiple protocols), it's important to know exactly who is committing all these violations. With a file or email it's usually a straightforward process to identify the user based on their mail or network logon ID, but once you start monitoring anything else, such as web traffic, you need to correlate the user's network (IP) address back to their name.

This is built into nearly every DLP tool, so they can track what network addresses are assigned to users when they log onto the network or a service.

The more difficult problem tends to be the business process; correlating these technical IDs back to real human beings. Many organizations fail to keep their directory servers current, and as a result it can be hard to find the physical body behind a login. It gets even harder if you need to figure out their business unit, manager, and so on.

For a quick win, we suggest you focus predominantly on making sure you can track most users back to their real-world identities. Ideally your directory will also include role information so you can filter DLP policies violations based on business unit. Someone in HR or Legal usually has authorization for different sensitive information than people in IT and Customer Service, and if you have to manually figure all this out when a violation occurs, it will really hurt your efficiency later.

Integrate with Your Infrastructure

The last bit of preparation is to integrate with the important parts of your infrastructure. How you do this will vary a bit depending on your initial focus (endpoint, network, or discovery). Remember, this all comes after you integrate with your directory servers.

The easiest deployments are typically on the network side, since you can run in monitoring mode without having to do too much integration. This might not be your top priority, but adding what's essentially an out of band network sniffer is very straightforward. Most organizations connect their DLP monitor to their network gateway using a SPAN or mirror port. If you have multiple locations, you'll probably need multiple DLP boxes and have to integrate them using the built-in multi-system management features common to most DLP tools.

Most organizations also integrate a bit more directly with email, since it is particularly effective without being especially difficult. The store-and-forward nature of email, compared to other real-time protocols, makes many types of analysis and blocking easier. Many DLP tools include an embedded mail server (MTA, or Mail Transport Agent) which you can simply add as another hop in the email chain, just like you probably deployed your spam filter.

Endpoint rollouts are a little tougher because you must deploy an agent onto every monitored system. The best way to do this (after testing) is to use whatever software deployment tool you currently use to push out updates and new software.

Content discovery -- scanning data at rest in storage -- can be a bit tougher, depending on how many servers you need to scan and who manages them. For quick wins, look for centralized storage where you can start scanning remotely through a file share, as opposed to widely distributed systems where you have to manually obtain access or install an agent. This reduces the political overhead and you only need an authorized user account for the file share to start the process.

You'll notice we haven't talked about all the possible DLP integration points, but instead focused on the main ones to get you up and running as quickly as possible. To recap:

  • For all deployments: Directory services (usually your Active Directory and DHCP servers).
  • For network deployments: Network gateways and mail servers.
  • For endpoint deployments: Software distribution tools.
  • For discovery/storage deployments: File shares on the key storage repositories (you generally only need a username/password pair to connect).

Now that we are done with all the prep work, in our next post we'll dig in and focus on what to do when you actually turn DLP on.

–Rich

Thursday, March 11, 2010

Low Hanging Fruit: Quick Wins with Data Loss Prevention

By Rich

Two of the most common criticisms of DLP that comes up in user discussions are a) its complexity and b) the fear of false positives. Security professionals worry that DLP is an expensive widget that will fail to deliver the expected value -- turning into yet another black hole of productivity. But when used properly DLP provides rapid assessment and identification of data security issues not available with any other technology.

I don't mean to play down the real complexities you might encounter as you roll out a complete data protection program. Business use of information is itself complicated, and no tool designed to protect that data can simplify or mask the underlying business processes. However, there are steps you can take to obtain significant immediate value and security gains without blowing your productivity or wasting important resources.

Over the next few posts I'll highlight the lowest hanging fruit for DLP, refined in conversations with hundreds of DLP users. These aren't meant to incorporate the entire DLP process, but to show you how to get real and immediate wins before you move on to more complex policies and use cases.

Establish Your Process

Nearly every DLP reference I've talked with has discovered actionable offenses committed by employees as soon as they turn the tool on. Some of these require little more than contacting a business unit to change a bad process, but quite a few result in security guards escorting people out of the building, or even legal action. One of my favorite stories is the time the DLP vendor plugged in the tool for a lunchtime demonstration on the same day a senior executive decided to send proprietary information to a competitor. Needless to say, the vendor lost their hard drives that day, but they didn't seem too unhappy.

Even if you aren't planning on moving straight to enforcement mode, you need to put a process in place to manage the issues that will crop up once you activate your tool. The kinds of issues you need to figure out how to address in advance fall into two categories:

  • Business Process Failures: Although you'll likely manage most business process issues as you roll out your sustained deployment, the odds are high some will be of such high concern they will require immediate remediation. These are often compliance related.
  • Egregious Employee Violations: Most employee-related issues can be dealt with as you gradually shift into enforcement mode, but as in the example above, you will encounter situations requiring immediate action.

In terms of process, I suggest two tracks based on the nature of the incident. Business process failures usually involve escalation within security or IT, possible involvement of compliance or risk management, and engagement with the business unity itself. You are less concerned with getting someone in trouble than stopping the problem.

Employee violations, due to their legal sensitivity, require a more formal process. Typically you'll need to open an investigation and immediately escalate to management while engaging legal and human resources (since this might be a firing offense). Contingencies need to be established in case law enforcement is engaged, including plans to provide forensic evidence to law enforcement without having them walk out the door with your nice new DLP box and hard drives. Essentially you want to implement whatever process you already have in place for internal employee investigations and potential termination.

In our next post we'll focus more on rolling out the tool, followed by how to configure it for those quick wins I keep teasing you with.

–Rich

Monday, February 01, 2010

Pragmatic Data Security: Discover

By Rich

In the Discovery phase we figure where the heck our sensitive information is, how it's being used, and how well it's protected. If performed manually, or with too broad an approach, Discovery can be quite difficult and time consuming. In the pragmatic approach we stick with a very narrow scope and leverage automation for greater efficiency. A mid-sized organization can see immediate benefits in a matter of weeks to months, and usually finish a comprehensive review (including all endpoints) within a year or less.

Discover: The Process

Before we get into the process, be aware that your job will be infinitely harder if you don't have a reasonably up to date directory infrastructure. If you can't figure out your users, groups, and roles, it will be much harder to identify misuse of data or build enforcement policies. Take the time to clean up your directory before you start scanning and filtering for content. Also, the odds are very high that you will find something that requires disciplinary action. Make sure you have a process in place to handle policy violations, and work with HR and Legal before you start finding things that will get someone fired (trust me, those odds are pretty darn high).

You have a couple choices for where to start -- depending on your goals, you can begin with applications/databases, storage repositories (including endpoints), or the network. If you are dealing with something like PCI, stored data is usually the best place to start, since avoiding unencrypted card numbers on storage is an explicit requirement. For HIPAA, you might want to start on the network since most of the violations in organizations I talk to relate to policy violations over email/web/FTP due to bad business processes. For each area, here's how you do it:

  • Storage and Endpoints: Unless you have a heck of a lot of bodies, you will need a Data Loss Prevention tool with content discovery capabilities (I mention a few alternatives in the Tools section, but DLP is your best choice). Build a policy based on the content definition you built in the first phase. Remember, stick to a single data/content type to start. Unless you are in a smaller organization and plan on scanning everything, you need to identify your initial target range -- typically major repositories or endpoints grouped by business unit. Don't pick something too broad or you might end up with too many results to do anything with. Also, you'll need some sort of access to the server -- either by installing an agent or through access to a file share. Once you get your first results, tune your policy as needed and start expanding your scope to scan more systems.
  • Network: Again, a DLP tool is your friend here, although unlike with content discovery you have more options to leverage other tools for some sort of basic analysis. They won't be nearly as effective, and I really suggest using the right tool for the job. Put your network tool in monitoring mode and build a policy to generate alerts using the same data definition we talked about when scanning storage. You might focus on just a few key channels to start -- such as email, web, and FTP; with a narrow IP range/subnet if you are in a larger organization. This will give you a good idea of how your data is being used, identify some bad business process (like unencrypted FTP to a partner), and which users or departments are the worst abusers. Based on your initial results you'll tune your policy as needed. Right now our goal is to figure out where we have problems -- we will get to fixing them in a different phase.
  • Applications & Databases: Your goal is to determine which applications and databases have sensitive data, and you have a few different approaches to choose from. This is the part of the process where a manual effort can be somewhat effective, although it's not as comprehensive as using automated tools. Simply reach out to different business units, especially the application support and database management teams, to create an inventory. Don't ask them which systems have sensitive data, ask them for an inventory of all systems. The odds are very high your data is stored in places you don't expect, so to check these systems perform a flat file dump and scan the output with a pattern matching tool. If you have the budget, I suggest using a database discovery tool -- preferably one with built in content discovery (there aren't many on the market, as we'll mention in the Tools section). Depending on the tool you use, it will either sniff the network for database connections and then identify those systems, or scan based on IP ranges. If the tool includes content discovery, you'll usually give it some level of administrative access to scan the internal database structures.

I just presented a lot of options, but remember we are taking the pragmatic approach. I don't expect you to try all this at once -- pick one area, with a narrow scope, knowing you will expand later. Focus on wherever you think you might have the greatest initial impact, or where you have known problems. I'm not an idealist -- some of this is hard work and takes time, but it isn't an endless process and you will have a positive impact.

We aren't necessarily done once we figure out where the data is -- for approved repositories, I really recommend you also re-check their security. Run at least a basic vulnerability scan, and for bigger repositories I recommend a focused penetration test. (Of course, if you already know it's insecure you probably don't need to beat the dead horse with another check). Later, in the Secure phase, we'll need to lock down the approved repositories so it's important to know which security holes to plug.

Discover: Technologies

Unlike the Define phase, here we have a plethora of options. I'll break this into two parts: recommended tools that are best for the job, and ancillary tools in case you don't have a budget for anything new. Since we're focused on the process in this series, I'll skip definitions and descriptions of the technologies, most of which you can find in our Research Library

Recommended Tools

  1. Data Loss Prevention (DLP): This is the best tool for storage, network, and endpoint discovery. Nothing else is nearly as effective.
  2. Database Discovery: While there are only a few tools on the market, they are extremely helpful for finding all the unexpected databases that tend to be floating around most organizations. Some offer content discovery, but it's usually limited to regular expressions/keywords (which is often totally fine for looking within a database).
  3. Database Activity Monitoring (DAM): A couple of the tools include content discovery (some also include database discovery). I only recommend DAM in the discover phase if you also intend on using it later for database monitoring -- otherwise it's not the right investment.

Ancillary Tools

  1. IDS/IPS/Deep Packet Inspection: There are a bunch of different deep packet inspection network tools -- including UTM, Web Application Firewalls, and web gateways -- that now include basic regular expression pattern matching for "poor man's" DLP functionality. They only help with data that fits a pattern, they don't include any workflow, and they usually have a ton of false positives. If the tool can't crack open file attachments/transfers it probably won't be very helpful.
  2. Electronic Discovery, Search, and Data Classification: Most of these tools perform some level of pattern matching or indexing that can help with discovery. They tend to have much higher false positive rates than DLP (and usually cost more if you're buying new), but if you already have one and budgets are tight they can help.
  3. Email Security Gateways: Most of the email security gateways on the market can scan for content, but they are obviously limited to only email, and aren't necessarily well suited to the discovery process.
  4. FOSS Discovery Tools: There are a couple of free/open source content discovery tools, mostly projects from higher education institutions that built their own tools to weed out improper use of Social Security numbers due to a regulatory change a few years back.

Discover: Case Study

Frank from Billy Bob's Bait Shop and Sushi Outlet decides to use a DLP tool to help figure out where any unencrypted credit card numbers might be stored. He decides to go with a full suite DLP tool since he knows he needs to scan his network, storage, servers in the retail outlets, and employee systems.

Before turning on the tool, he contacts Legal and HR to set up a process in case they find any employees illegally using these numbers, as opposed to the accidental or business-process leaks he also expects to manage. Although his directory servers are a little messy due to all the short-term employees endemic to retail operations, he's confident his core Active Directory server is relatively up to date, especially where systems/servers are concerned.

Since he's using a DLP tool, he develops a three-tier policy to base his discovery scans on:

  1. Using the one database with stored unencrypted numbers, he creates a database fingerprinting policy to alert on exact matches from that database (his DLP tool uses hashes, not the original values, so it isn't creating a new security exposure). These are critical alerts.
  2. His next policy uses database fingerprints of all customer names from the customer database, combined with a regular expression for generic credit card numbers. If a customer name appears with something that matches a credit card number (based on the regex pattern) it generates a medium alert.
  3. His lowest priority policy uses the default "PCI" category built into his DLP tool, which is predominantly basic pattern matching.

He breaks his project down into three phases, to run during overlapping periods:

  1. Using those three policies, he turns on network monitoring for email, web, and FTP.
  2. He begins scanning his storage repositories, starting in the data center. Once he finishes those, he will expand the scans into systems in the retail outlets. He expects his data center scan to go relatively quickly, but is planning on 6-12 months to cover the retail outlets.
  3. He is testing endpoint discovery in the lab, but since their workstation management is a bit messy he isn't planning on trying to install agents and beginning scans until the second year of the project.

It took Frank about two months to coordinate with other business/IT units before starting the project. Installing DLP on the network only took a few hours because everything ran through one main gateway, and he wasn't worried about installing any proxy/blocking technology.

Frank immediately saw network results, and found one serious business process problem where unencrypted numbers were included in files being FTPed to a business partner. The rest of his incidents involved individual accidents, and for the most part they weren't losing credit card numbers over the monitored channels.

The content discovery portion took a bit longer since there wasn't a consistent administrative account he could use to access and scan all the servers. Even though they are a relatively small operation, it took about 2 months of full time scanning to get through the data center due to all the manual coordination involved. They found a large number of old spreadsheets with credit card numbers in various directories, and a few in flat files -- especially database dumps from development.

The retail outlets actually took less time than he expected. Most of the servers, except at the largest regional locations, were remotely managed and well inventoried. He found that 20% of them were running on an older credit card transaction system that stored unencrypted credit card numbers.

Remember, this is a 1,000 person organization... if you work someplace with five or ten times the employees and infrastructure, your process will take longer. Don't assume it will take five or ten times longer, though -- it all depends on scope, infrastructure, and a variety of other factors.

–Rich

Monday, June 01, 2009

The State of Web Application and Data Security—Mid 2009

By Rich

One of the more difficult aspects of the analyst gig is sorting through all the information you get, and isolating out any inherent biases. The kinds of inquiries we get from clients can all too easily skew our perceptions of the industry, since people tend to come to us for specific reasons, and those reasons don't necessarily represent the mean of the industry. Aside from all the vendor updates (and customer references), our end user conversations usually involve helping someone with a specific problem -- ranging from vendor selection, to basic technology education, to strategy development/problem solving. People call us when they need help, not when things are running well, so it's all too easy to assume a particular technology is being used more widely than it really is, or a problem is bigger or smaller than it really is, because everyone calling us is asking about it. Countering this takes a lot of outreach to find out what people are really doing even when they aren't calling us.

Over the past few weeks I've had a series of opportunities to work with end users outside the context of normal inbound inquiries, and it's been fairly enlightening. These included direct client calls, executive roundtables such as one I participated in recently with IANS (with a mix from Fortune 50 to mid-size enterprises), and some outreach on our part. They reinforced some of what we've been thinking, while breaking other assumptions. I thought it would be good to compile these together into a "state of the industry" summary. Since I spend most of my time focused on web application and data security, I'll only cover those areas:

image

When it comes to web application and data security, if there isn't a compliance requirement, there isn't budget -- Nearly all of the security professionals we've spoken with recognize the importance of web application and data security, but they consistently tell us that unless there is a compliance requirement it's very difficult for them to get budget. That's not to say it's impossible, but non-compliance projects (however important) are way down the priority list in most organizations. In a room of a dozen high-level security managers of (mostly) large enterprises, they all reinforced that compliance drove nearly all of their new projects, and there was little support for non-compliance-related web application or data security initiatives. I doubt this surprises any of you.

"Compliance" may mean more than compliance -- Activities that are positioned as helping with compliance, even if they aren't a direct requirement, are more likely to gain funding. This is especially true for projects that could reduce compliance costs. They will have a longer approval cycle, often 9 months or so, compared to the 3-6 months for directly-required compliance activities. Initiatives directly tied to limiting potential data breach notifications are the most cited driver. Two technology examples are full disk encryption and portable device control.

PCI is the single biggest compliance driver for web application and data security -- I may not be thrilled with PCI, but it's driving more web application and data security improvements than anything else.

The term Data Loss Prevention has lost meaning -- I discussed this in a post last week. Even those who have gone through a DLP tool selection process often use the term to encompass more than the narrow definition we prefer.

It's easier to get resources to do some things manually than to buy a tool -- Although tools would be much more efficient and effective for some projects, in terms of costs and results, manual projects using existing resources are easier to get approval for. As one manager put it, "I already have the bodies, and I won't get any more money for new tools." The most common example cited was content discovery (we'll talk more about this a few points down).

Most people use DLP for network (primarily email) monitoring, not content discovery or endpoint protection -- Even though we tend to think discovery offers equal or greater value, most organizations with DLP use it for network monitoring.

Interest in content discovery, especially DLP-based, is high, but resources are hard to get for discovery projects -- Most security managers I talk with are very interested in content discovery, but they are less educated on the options and don't have the resources. They tell me that finding the data is the easy part -- getting resources to do anything about it is the limiting factor.

The Web Application Firewall (WAF) market and Security Source Code Tools markets are nearly equal in size, with more clients on WAFs, and more money spent on source code tools per client -- While it's hard to fully quantify, we think the source code tools cost more per implementation, but WAFs are in slightly wider use.

WAFs are a quicker hit for PCI compliance -- Most organizations deploying WAFs do so for PCI compliance, and they're seen as a quicker fix than secure source code projects.

Most WAF deployments are out of band, and false positives are a major problem for default deployments -- Customers are installing WAFs for compliance, but are generally unable to deploy them inline (initially) due to the tuning requirements.

Full drive encryption is mature, and well deployed in the early mainstream -- Full drive encryption, while not perfect, is deployable in even large enterprises. It's now considered a level-setting best practice in financial services, and usage is growing in healthcare and insurance. Other asset recovery options, such as remote data destruction and phone home applications, are now seen as little more than snake oil. As one CISO told us, "I don't care about the laptop, we just encrypt it and don't worry about it when it goes missing".

File and folder encryption is not in wide use -- Very few organizations are performing any wide scale file/folder encryption, outside of some targeted encryption of PII for compliance requirements.

Database encryption is hard, and not widely used -- Most organizations are dissatisfied with database encryption options, and do not deploy it widely. Within a large organization there is likely some DB encryption, with preference given to file/folder/media protection over column level encryption, but most organizations prefer to avoid it. Performance and key management are cited as the primary obstacles, even when using native tools. Current versions of database encryption (primarily native encryption) do perform better than older versions, but key management is still unsatisfactory. Large encryption projects, when initiated, take an average of 12-18 months.

Large enterprises prefer application-level encryption of credit card numbers, and tokenization -- When it comes to credit card numbers, security managers prefer to encrypt it at the application level, or consolidate numbers into a central source, using representative "tokens" throughout the rest of the application stack. These projects take a minimum of 12-18 months, similar to database encryption projects (the two are often tied together, with encryption used in the source database).

Email encryption and DRM tend to be workgroup-specific deployments -- Email encryption and DRM use is scattered throughout the industry, but is still generally limited to workgroup-level projects due to the complexity of management, or lack of demand/compliance from users.

Database Activity Monitoring usage continues to grow slowly, mostly for compliance, but not quickly enough to save lagging vendors -- Many DAM deployments are still tied to SOX auditing, and it's not as widely used for other data security initiatives. Performance is reasonable when you can use endpoint agents, which some DBAs still resist. Network monitoring is not seen as effective, but may still be used when local monitoring isn't an option. Network requirements, depending on the tool, may also inhibit deployments.

My main takeaway is that security managers know what they need to do to protect information assets, but they lack the time, resources, and management support for many initiatives. There is also broad dissatisfaction with security tools and vendors in general, in large part due to poor expectation setting during the sales process, and deliberately confusing marketing. It's not that the tools don't work, but that they're never quite as easy as promised.

It's an interesting dilemma, since there is clear and broad recognition that data security (and by extension, web application security) is likely our most pressing overall issue in terms of security, but due to a variety of factors (many of which we covered in our Business Justification for Data Security paper), the resources just aren't there to really tackle it head-on.

–Rich

Tuesday, February 10, 2009

Do You Use DLP? We Should Talk

By Rich

As an analyst, I've been covering DLP since before there was anything called DLP. I like to joke that I've talked with more people that have evaluated and deployed DLP than anyone else on the face of the planet. Yes, it's exactly as exciting as it sounds.

But all those references were fairly self-selected. They've either been Gartner clients, or our current enterprise clients, that were/are typically looking for help in product selection or dealing with some sort of problem. Many of the rest are vendor-supplied references. This combination skews the conversations towards people picking products, people with problems, or those a vendor think will make them look good.

I'm currently working on an article for Information Security magazine on "Real-World DLP", and I'm hunting for some new references to expand that field a bit. If you are using DLP, successfully or not, and are willing to talk confidentially, please drop me a line. I'm looking for real-world stories, good and bad. If you are willing to go on the record, we're also looking for good quote sources. The focus of the article is more on implementation than selection, and will be vendor-neutral.

To be honest, one reason I'm putting this out in the open is to see if my normal reference channels are skewed. It's time to see how our current positions and assumptions play out on the mean streets of reality.

Of course I'll be totally pissed if I've been wrong this entire time and have to retract everything I've ever written on DLP.

**Update - Oh yeah, my email address is rmogull, that is with two 'L's, at securosis dot com. Please let me know.

–Rich

Thursday, December 04, 2008

Analysis Of The Microsoft/RSA Data Loss Prevention Partnership

By Rich

By the time I post this you won't be able to find a tech news site that isn't covering this one. I know, since my name was on the list of analysts the press could contact and I spent a few hours talking to everyone covering the story yesterday. Rather than just reciting the press release, I'd like to add some analysis, put things into context, and speculate wildly. For the record, this is a big deal in the long term, and will likely benefit all of the major DLP vendors, even though there's nothing earth shattering in the short term.

As you read this, Microsoft and RSA are announcing a partnership for Data Loss Prevention. Here are the nitty gritty details, not all of which will be apparent from the press release:

  • This month, the RSA DLP product (Tablus for you old folks) will be able to assign Microsoft RMS (what Microsoft calls DRM) rights to stored data based on content discovery. The way this works is that the RMS administrator will define a data protection template (what rights are assigned to what users). The RSA DLP administrator then creates a content detection policy, which can then apply the RMS rights automatically based on the content of files. The RSA DLP solution will then scan file repositories (including endpoints) and apply the RMS rights/controls to protect the content.
  • Microsoft has licensed the RSA DLP technology to embed into various Microsoft products. They aren't offering much detail at this time, nor any timelines, but we do know a few specifics. Microsoft will slowly begin adding the RSA DLP content analysis engine to various products. The non-NDA slides hint at everything from SQL Server, Exchange, and Sharepoint, to Windows and Office. Microsoft will also include basic DLP management into their other management tools.
  • Policies will work across both Microsoft and RSA in the future as the products evolve. Microsoft will be limiting itself to their environment, with RSA as the upgrade path for fuller DLP coverage.

And that's it for now. RSA DLP 6.5 will link into RMS, with Microsoft licensing the technology for future use in their products. Now for the analysis:

  • This is an extremely significant development in the long term future of DLP. Actually, it's a nail in the coffin of the term "DLP" and moves us clearly and directly to what we call "CMP"- Content Monitoring and Protection. It moves us closer and closer to the DLP engine being available everywhere (and somewhat commoditized), and the real value in being in the central policy management, analysis, workflow, and incident management system. DLP/CMP vendors don't go away- but their focus changes as the agent technology is built more broadly into the IT infrastructure (this definitely won't be limited to just Microsoft).
  • It's not very exciting in the short term. RSA isn't the first to plug DLP into RMS (Workshare does it, but they aren't nearly as big in the DLP market). RSA is only enabling this for content discovery (data at rest) and rights won't be applied immediately as files are created/saved. It's really the next stages of this that are interesting.
  • This is good for all the major DLP vendors, although a bit better for RSA. It's big validation for the DLP/CMP market, and since Microsoft is licensing the technology to embed, it's reasonable to assume that down the road it may be accessible to other DLP vendors (be aware- that's major speculation on my part).
  • This partnership also highlights the tight relationship between DLP/CMP and identity management. Most of the DLP vendors plug into Microsoft Active Directory to determine users/groups/roles for the application of content protection policies. One of the biggest obstacles to a successful DLP deployment can be a poor directory infrastructure. If you don't know what users have what roles, it's awfully hard to create content-based policies that are enforced based on users and roles.
  • We don't know how much cash is involved, but financially this is likely good for RSA (the licensing part). I don't expect it to overly impact sales in the short term, and the other major DLP vendors shouldn't be too worried for now. DLP deals will still be competitive based on the capabilities of current products, more than what's coming in an indeterminate future.

Now just imagine a world where you run a query on a SQL database, and any sensitive results are appropriately protected as you place them into an Excel spreadsheet. You then drop that spreadsheet into a Powerpoint presentation and email it to the sales team. It's still quietly protected, and when one sales guy tries to email it to his Gmail account, it's blocked. When he transfers it to a USB device, it's encrypted using a company key so he can't put it on his home computer. If he accidentally sends it to someone in the call center, they can't read it. In the final PDF, he can't cut out the table and put it in another document. That's where we are headed- DLP/CMP is enmeshed into the background, protecting content through it's lifecycle based on central policies and content and context awareness.

In summary, it's great in the long term, good but not exciting in the short term, and beneficial to the entire DLP market, with a slight edge for RSA. There are a ton of open questions and issues, and we'll be watching and analyzing this one for a while.

As always, feel free to email me if you have any questions.

–Rich

Wednesday, July 23, 2008

Best Practices For Endpoint DLP: Use Cases

By Rich

We've covered a lot of ground over the past few posts on endpoint DLP. Our last post finished our discussion of best practices and I'd like to close with a few short fictional use cases based on real deployments.

Endpoint Discovery and File Monitoring for PCI Compliance Support

BuyMore is a large regional home goods and grocery retailer in the southwest United States. In a previous PCI audit, credit card information was discovered on some employee laptops mixed in with loyalty program data and customer demographics. An expensive, manual audit and cleansing was performed within business units handling this content. To avoid similar issues in the future, BuyMore purchased an endpoint DLP solution with discovery and real time file monitoring support.

BuyMore has a highly distributed infrastructure due to multiple acquisitions and independently managed retail outlets (approximately 150 locations). During initial testing it was determined that database fingerprinting would be the best content analysis technique for the corporate headquarters, regional offices, and retail outlet servers, while rules-based analysis is the best fit for the systems used by store managers. The eventual goal is to transition all locations to database fingerprinting, once a database consolidation and cleansing program is complete.

During Phase 1, endpoint agents were deployed to corporate headquarters laptops for the customer relations and marketing team. An initial content discovery scan was performed, with policy violations reported to managers and the affected employees. For violations, a second scan was performed 30 days later to ensure that the data was removed. In Phase 2, the endpoint agents were switched into real time monitoring mode when the central management server was available (to support the database fingerprinting policy). Systems that leave the corporate network are then scanned monthly when the connect back in, with the tool tuned to only scan files modified since the last scan. All systems are scanned on a rotating quarterly basis, and reports generated and provided to the auditors.

For Phase 3, agents were expanded to the rest of the corporate headquarters team over the course of 6 months, on a business unit by business unit basis.

For the final phase, agents were deployed to retail outlets on a store by store basis. Due to the lower quality of database data in these locations, a rules-based policy for credit cards was used. Policy violations automatically generate an email to the store manager, and are reported to the central policy server for followup by a compliance manager.

At the end of 18 months, corporate headquarters and 78% or retail outlets were covered. BuyMore is planning on adding USB blocking in their next year of deployment, and already completed deployment of network filtering and content discovery for storage repositories.

Endpoint Enforcement for Intellectual Property Protection

EngineeringCo is a small contract engineering firm with 500 employees in the high tech manufacturing industry. They specialize in designing highly competitive mobile phones for major manufacturers. In 2006 they suffered a major theft of their intellectual property when a contractor transferred product description documents and CAD diagrams for a new design onto a USB device and sold them to a competitor in Asia, which beat their client to market by 3 months.

EngineeringCo purchased a full DLP suite in 2007 and completed deployment of partial document matching policies on the network, followed by network-scanning-based content discovery policies for corporate desktops. After 6 months they added network blocking for email, http, and ftp, and violations are at an acceptable level. In the first half of 2008 they began deployment of endpoint agents for engineering laptops (approximately 150 systems).

Because the information involved is so valuable, EngineeringCo decided to deploy full partial document matching policies on their endpoints. Testing determined performance is acceptable on current systems if the analysis signatures are limited to 500 MB in total size. To accommodate this limit, a special directory was established for each major project where managers drop key documents, rather than all project documents (which are still scanned and protected at the network). Engineers can work with documents, but the endpoint agent blocks network transmission except for internal email and file sharing, and any portable storage. The network gateway prevents engineers from emailing documents externally using their corporate email, but since it's a gateway solution internal emails aren't scanned.

Engineering teams are typically 5-25 individuals, and agents were deployed on a team by team basis, taking approximately 6 months total.

These are, of course, fictional best practices examples, but they're drawn from discussions with dozens of DLP clients. The key takeaways are:

  1. Start small, with a few simple policies and a limited footprint.
  2. Grow deployments as you reduce incidents/violations to keep your incident queue under control and educate employees.
  3. Start with monitoring/alerting and employee education, then move on to enforcement.
  4. This is risk reduction, not risk elimination. Use the tool to identify and reduce exposure but don't expect it to magically solve all your data security problems.
  5. When you add new policies, test first with a limited audience before rolling them out to the entire scope, even if you are already covering the entire enterprise with other policies.

–Rich

Thursday, July 17, 2008

Best Practices for Endpoint DLP: Part 5, Deployment

By Rich

In our last post we talked about prepping for deployment- setting expectations, prioritizing, integrating with the infrastructure, and defining workflow. Now it's time to get out of the lab and get our hands dirty.

Today we're going to move beyond planning into deployment.

  1. Integrate with your infrastructure: Endpoint DLP tools require integration with a few different infrastructure elements. First, if you are using a full DLP suite, figure out if you need to perform any extra integration before moving to endpoint deployments. Some suites OEM the endpoint agent and you may need some additional components to get up and running. In other cases, you'll need to plan capacity and possibly deploy additional servers to handle the endpoint load. Next, integrate with your directory infrastructure if you haven't already. Determine if you need any additional information to tie users to devices (in most cases, this is built into the tool and its directory integration components).
  2. Integrate on the endpoint: In your preparatory steps you should have performed testing to be comfortable that the agent is compatible with your standard images and other workstation configurations. Now you need to add the agent to the production images and prepare deployment packages. Don't forget to configure the agent before deployment, especially the home server location and how much space and resources to use on the endpoint. Depending on your tool, this may be managed after initial deployment by your management server.
  3. Deploy agents to initial workgroups: You'll want to start with a limited deployment before rolling out to the larger enterprise. Pick a workgroup where you can test your initial policies.
  4. Build initial policies: For your first deployment, you should start with a small subset of policies, or even a single policy, in alert or content classification/discovery mode (where the tool reports on sensitive data, but doesn't generate policy violations).
  5. Baseline, then expand deployment: Deploy your initial policies to the starting workgroup. Try to roll the policies out one monitoring/enforcement mode at a time, e.g., start with endpoint discovery, then move to USB blocking, then add network alerting, then blocking, and so on. Once you have a good feel for the effectiveness of the policies, performance, and enterprise integration, you can expand into a wider deployment, covering more of the enterprise. After the first few you'll have a good understanding of how quickly, and how widely, you can roll out new policies.
  6. Tune policies: Even stable policies may require tuning over time. In some cases it's to improve effectiveness, in others to reduce false positives, and in still other cases to adapt to evolving business needs. You'll want to initially tune policies during baselining, but continue to tune them as the deployment expands. Most DLP clients report that they don't spend much time tuning policies after baselining, but it's always a good idea to keep your policies current with enterprise needs.
  7. Add enforcement/protection: By this point you should understand the effectiveness of your policies, and have educated users where you've found policy violations. You can now start switching to enforcement or protective actions, such as blocking, network filtering, or encryption of files. It's important to notify users of enforcement actions as they occur, otherwise you might frustrate them u ecessarily. If you're making a major change to established business process, consider scaling out enforcement options on a business unit by business unit basis (e.g., restricting access to a common content type to meet a new compliance need).

Deploying endpoint DLP isn't really very difficult; the most common mistake enterprises make is deploying agents and policies too widely, too quickly. When you combine a new endpoint agent with intrusive enforcement actions that interfere (positively or negatively) with people's work habits, you risk grumpy employees and political backlash. Most organizations find that a staged rollout of agents, followed by first deploying monitoring policies before moving into enforcement, then a staged rollout of policies, is the most effective approach.

–Rich

Wednesday, July 16, 2008

Upcoming Webcast- DLP and DAM Together

By Rich

On July 29th I'll be giving a webcast entitled Using Data Leakage Prevention and Database Activity Monitoring for Data Protection. It's a mix of my content on DLP, DAM and Information Centric security, designed to show you how to piece these technologies together.

It's sponsored by Tizor, and you can register here (the content, as always, is my independent stuff). Here's the description:

When it comes to data security, few things are certain, but there is one thing that very few security experts will dispute. Enterprises need a new way of thinking about data security, because traditional data security methods are just not working. Data Leakage Prevention (DLP) and Database Activity Monitoring (DAM) are two fundamental components of the new security landscape. Predicated on the need to "know" what is actually happening with sensitive data, DLP and DAM address pressing security issues. But despite the value that these two technologies offer, there is a great deal of confusion about what these technologies actually do and how they should be implemented. At this webinar, Rich Mogull, one of today"s most well respected security experts, will clear up the confusion about DLP and DAM. Rich will discuss: * The business problems created by a lack of data centric security * How these problems relate to today"s threats and technologies * What DLP and DAM do and how they fit into the enterprise security environment * Best practices for creating a data centric security model for your organization

–Rich