Data Security Lifecycle 2.0By Rich
We reference this content a lot, so I decided to compile it all into a single post. This is the original content, including internal links, and has not been re-edited.
Four years ago I wrote the initial Data Security Lifecycle and a series of posts covering the constituent technologies. In 2009 I updated it to better fit cloud computing, and it was incorporated into the Cloud Security Alliance Guidance, but I have never been happy with that work. It was rushed and didn’t address cloud specifics nearly sufficiently.
Adrian and I just spent a bunch of time updating the cycle and it is now a much better representation of the real world. Keep in mind that this is a high-level model to help guide your decisions, but we think this time around we were able to identify places where it can more specifically guide your data security endeavors.
(As a side note, you might notice I use “data security” and “information-centric security” interchangeably. I think infocentric is more accurate, but data security is more recognized, so that’s what I tend to use.)
If you are familiar with the previous model you will immediately notice that this one is much more complex. We hope it’s also much more useful. The old model really only listed controls for data in different phases of the lifecycle – and didn’t account for location, ownership, access methods, and other factors. This update should better reflect the more complex environments and use cases we tend to see these days.
Due to its complexity, we need to break the new Lifecycle into a series of posts. In this first post we will revisit the basic lifecycle, and in the next post we will add locations and access.
The lifecycle includes six phases from creation to destruction. Although we show it as a linear progression, once created, data can bounce between phases without restriction, and may not pass through all stages (for example, not all data is eventually destroyed).
- Create: This is probably better named Create/Update because it applies to creating or changing a data/content element, not just a document or database. Creation is the generation of new digital content, or the alteration/updating of existing content.
- Store: Storing is the act committing the digital data to some sort of storage repository, and typically occurs nearly simultaneously with creation.
- Use: Data is viewed, processed, or otherwise used in some sort of activity.
- Share: Data is exchanged between users, customers, and partners.
- Archive: Data leaves active use and enters long-term storage.
- Destroy: Data is permanently destroyed using physical or digital means (e.g., cryptoshredding).
These high-level activities describe the major phases of a datum’s life, and in a future post we will cover security controls for each phase. But before we discuss controls we need to incorporate two additional aspects: locations and access devices.
Locations and Access
In our last post we reviewed the Data Security Lifecycle, but other than some minor wording changes (and a prettier graphic thanks to PowerPoint SmartArt) it was the same as our four-year-old original version.
But as we mentioned, quite a bit has changed since then, exemplified by the emergence and adoption of cloud computing and increased mobility. Although the Lifecycle itself still applies to basic, traditional infrastructure, we will focus on these more complex use cases, which better reflect what most of you are dealing with on a day to day basis.
One gap in the original Lifecycle was that it failed to adequately address movement of data between repositories, environments, and organizations. A large amount of enterprise data now transitions between a variety of storage locations, applications, and operating environments. Even data created in a locked-down application may find itself backed up someplace else, replicated to alternative standby environments, or exported for processing by other applications. And all of this can happen at any phase of the Lifecycle.
We can illustrate this by thinking of the Lifecycle not as a single, linear operation, but as a series of smaller lifecycles running in different operating environments. At nearly any phase data can move into, out of, and between these environments – the key for data security is identifying these movements and applying the right controls at the right security boundaries.
As with cloud deployment models, these locations may be internal, external, public, private, hybrid, and so on. Some may be cloud providers, other traditional outsourcers, or perhaps multiple locations within a single data center.
For data security, at this point there are four things to understand:
- Where are the potential locations for my data?
- What are the lifecycles and controls in each of those locations?
- Where in each lifecycle can data move between locations?
- How does data move between locations (via what channel)?
Now that we know where our data lives and how it moves, we need to know who is accessing it and how. There are two factors here:
- Who accesses the data?
- How can they access it (device & channel)?
Data today is accessed from all sorts of different devices. The days of employees only accessing data through restrictive applications on locked-down desktops are quickly coming to an end (with a few exceptions). These devices have different security characteristics and may use different applications, especially with applications we’ve moved to SaaS providers – who often build custom applications for mobile devices, which offer different functionality than PCs.
Later in the model we will deal with who, but the diagram below shows how complex this can be – with a variety of data locations (and application environments), each with its own data lifecycle, all accessed by a variety of devices in different locations. Some data lives entirely within a single location, while other data moves in and out of various locations… and sometimes directly between external providers.
This completes our “topographic map” of the Lifecycle. In our next post we will dig into mapping data flow and controls. In the next few posts we will finish covering background material, and then show you how to use this to pragmatically evaluate and design security controls.
Functions, Actors, and Controls
In our last post we added location and access attributes to the Data Security Lifecycle. Now let’s start digging into the data flow and controls.
To review, so far we’ve completed our topographic map for data:
This illustrates, at a high level, how data moves in and out of different environments, and to and from different devices. It doesn’t yet tell us which controls to use or where to place them. That’s where the next layer comes in, as we specify locations, actors (‘who’), and functions:
There are three things we can do with a given datum:
- Access: View/access the data, including copying, file transfers, and other exchanges of information.
- Process: Perform a transaction on the data: update it, use it in a business processing transaction, etc.
- Store: Store the data (in a file, database, etc.).
The table below shows which functions map to which phases of the lifecycle:
Each of these functions is performed in a location, by an actor (person).
Essentially, a control is what we use to restrict a list of possible actions down to allowed actions. For example, encryption can be used to restrict access to data, application controls to restrict processing via authorization, and DRM storage to prevent unauthorized copies/accesses.
To determine the necessary controls; we first list out all possible functions, locations, and actors; and then which ones to allow. We then determine what controls we need to make that happen (technical or process). Controls can be either preventative or detective (monitoring), but keep in mind that monitoring controls that don’t tie back into some sort of alerting or analysis merely provide an audit log, not a functional control.
This might be a little clearer for some of you as a table:
Here you would list a function, the actor, and the location, and then check whether it is allowed or not. Any time you have a ‘no’ in the allowed box, you would implement and document a control.
Tying It together
In essence what we’ve produced is a high-level version of a data flow diagram (albeit not using standard programming taxonomy). We start by mapping the possible data flows, including devices and different physical and virtual locations, and at which phases in its lifecycle data can move between those locations. Then, for each phase of the lifecycle in a location, we determine which functions, people/systems, and more-granular locations for working with the data are possible. We then figure out which we want to restrict, and what controls we need to enforce those restrictions.
This looks complex, but keep in mind that you aren’t likely to do it for all data within an entire organization. For given data in a given application/implementation you’ll be working with a much more restrictive subset of possibilities. This clearly becomes more involved with bigger applications, but practically speaking you need to know where data flows, what’s possible, and what should be allowed, to design your security.
In a future post we’ll show you an example, and down the road we also plan to produce a controls matrix which will show you where the different data security controls fit in.