3 July 2014

How Do We Deal with a Flood of Data?

Line handlers await the arrival of the Virginia class attack submarine USS Hartfordphoto by MC2 Peter D. Blair/U.S. Navy
by Isaac R. Porche III, Bradley Wilson, Erin-Elizabeth Johnson

U.S. Navy intelligence, surveillance, and reconnaissance (ISR) functions have become critical to national security over the past two decades. Within the Navy, there is a growing demand for ISR data from drones and other sources that provide situational awareness, which helps Navy vessels avoid collisions, pinpoint targets, and perform a host of other mission-critical tasks.

Despite the battle-tested value of ISR systems, however, the large amount of data they generate has become overwhelming to Navy analysts. As the Intelligence Science Board wrote in 2008, referring to the entire Department of Defense, “the number of images and signal intercepts are well beyond the capacity of the existing analyst community, so there are huge backlogs for translators and image interpreters, and much of the collected data are never reviewed.” In the coming years, as the Navy acquires and fields new sensors for collecting data, this “big data challenge” will continue to grow. Indeed, if the Navy continues to field sensors as planned but does not change the way it processes, exploits, and disseminates information, it will reach an ISR “tipping point”—the moment at which intelligence analysts are no longer able to complete a minimum number of exploitation tasks within given time constraints—as soon as 2016.

How Big Is Big?
To understand how big “big data” is, think about the volume of information contained in the Library of Congress, the world's largest library. All of the information in the Library of Congress could be digitized into 200 terabytes, or 200 trillion bytes. Now consider the fact that the Navy currently collects the equivalent of a Library of Congress' worth of data almost every other day.

Technically, the amount of data that can be stored by traditional databases is unlimited. The more data being collected and shared, however, the more difficult mining, fusing, and effectively using the data in a timely manner becomes. In the Navy, where analysts use data to create information that informs decision making, this challenge is particularly troublesome. All data and information collected by the Navy is potentially useful, but processing this information and deriving useful knowledge from it is severely taxing the analytical capabilities of the Navy's personnel and networks. As the Navy acquires and fields new sensors for collecting data, this difficulty will grow.

Increasingly unable to process all of its own data, the Navy has little hope—if nothing changes—of exploiting all of the potentially useful data in the greater digital universe, which is billions of terabytes large and constantly growing. Commercial, government, and other sources, such as Twitter, GeoEye, and the National Oceanic and Atmospheric Administration, to name but a few, create hundreds of terabytes of potentially useful data every day. But how much of it can be made useful to the Navy?

A Big Data Opportunity

ISR systems are highly valued in the Navy—and across the military—for good reason. The data collected provide commanders with information on enemy positions and activities. They enable warfighters to locate targets with precision. They provide vital information about the location of friendly forces. Former Air Force Deputy Chief of Staff for ISR Lt. Gen. David A. Deptula (Ret.) has predicted that ISR will “lead in the fight” in 2020. He also has suggested that “ISR is currently moving from a supporting capability to the leading edge of national security operations.”

Like other services, the Navy sees data collected through ISR as essential to situational awareness—a vital technological advantage. The Navy hopes to realize the Office of the Director of National Intelligence's definition of big data: the enabling of “mass analytics within and across data…to enable information integration.” The Navy's ISR cycle (consisting of tasking, collection, processing, exploitation, and dissemination) is not undertaken for its own sake but with a clear, vital objective: providing the fleet with situational awareness. In military operations, knowledge is power. In the Navy, it is situational awareness—derived, in part, from ISR data—that gives commanders that power by helping them answer four critical questions: Where am I? Where are my friends? Where is the enemy? Where is everyone else?

An inability to answer any of these four questions can be disastrous. Consider the case of USS Hartford (SSN 768), a submarine that collided with USS New Orleans (LPD 18), an amphibious transport ship, in the Strait of Hormuz in 2009. The accident left 15 Sailors injured, thousands of gallons of diesel spilled, and $100 million in damage. In a Navy Times report on the incident, a senior Navy officer attributed part of the blame to analysts' inability to discern among a number of radar contacts: “There were a whole lot of watchstanders that failed to recognize the sensor data presented to them.”

As this example demonstrates, situational awareness is critical to naval operations, and the Navy needs to improve its ability to make sense of the data that growing numbers, and growing varieties, of sensors provide. Indeed, as the Intelligence Science Board reported in 2008, “integrating data from different sensors and platforms” could “dramatically enhance” geolocation and other important tasks. So what, exactly, is preventing the Navy from reaping the benefits of ISR-provided data?

We're going to find ourselves in the not too distant future swimming in sensors and drowning in data.

— Retired Air Force Lt. Gen. David A. Deptula

Barriers to Benefitting From Big Data

Today, as little as 5 percent of the data collected by ISR platforms actually reaches the Navy analysts who need to see it. In the case of analysts working afloat on ships, a large part of the problem is attributable to extremely slow download times caused by bandwidth and connectivity limitations. Analysts face other challenges to the timely consumption of data, including having to share access to communications pipelines with other organizations and having to download multiple pieces of large data (such as high-resolution images) to find exactly what they need. Most of the time, analysts do not have the luxury of receiving the “right” data in a timely fashion.

Today's analysts also face a wide variety of data streaming in from different platforms and sensors—data they must integrate (or fuse) to ensure accurate, comprehensive situational awareness. Their workstations comprise multiple screens, each showing different streams of data and each loaded with different suites of tools. In many cases, the applications, databases, and operating systems underlying these tools are produced by different vendors and are not interoperable. Sailors told us they are overwhelmed as they struggle to master the functions provided by each tool in the suite at their workstations. Another challenge is the existence of multiple and often mutually exclusive security domains (different classification levels). Some ISR platforms are designed to feed all of their data into a specific database that resides in a specific, isolated security domain, regardless of whether all the individual pieces of data collected by that platform really need to be classified at that particular level.

For analysts, this means that searching for a single piece of data can require multiple networks to access multiple databases—a dampener on productivity and a dangerous situation, given that achieving accurate situational awareness requires integrating data from multiple sources in a timely fashion. Common wisdom among analysts is that they spend 80 percent of their time looking for the right data and only 20 percent of their time looking at the right data.

One Option: Dynamically Managing Workloads

Despite the anticipated growth in incoming data, the Navy has no plans to increase the number of analysts it employs. One option for ensuring that Navy analysts are better able to cope with big data is dynamically managing their workloads. Today, the Navy's intelligence specialists are, for the most part, working on “local tasks,” since task allocation tends to be based on which analysts are nearby, or statically assigned, rather than on who is available to accept new tasking. The main disadvantage of today's fixed, geographically based tasking arrangements is that intelligence specialists in one location can become quickly overwhelmed with tasks that need not necessarily be assigned to them but that, because of the local tasking model, come their way by default.

What if the Navy were to consider implementing a regional or even global tasking model instead? In these models, tasks would be automatically shared and allocated within regions, or globally in the latter case, based on who is available to accept new tasking.

RAND researchers developed a model of intelligence specialist productivity and, using a year of operational data, found that the regional and global tasking models improve intelligence specialist productivity. However, this is true only to a certain extent. As the number of ISR sensors and platforms increases, all three models eventually dip down, revealing that imagery analysts simply will not be able to keep up with all of the imagery coming their way, no matter how we balance their workloads.

Implementing a regional or global tasking model may buy the Navy a short-term improvement in analyst productivity, but, clearly, changes to how workloads are managed are not, on their own, a viable long-term solution. More comprehensive alternatives to solving the big data challenge are therefore required.

illustration of data flooding from a printer

This study looked at a baseline and three alternatives for handling ISR data: the addition of applications; consolidation using an existing Army architecture; and a cloud-based solution that leverages GovCloud.

Alternatives for Dealing with Big Data

To be complete, a solution to the Navy's big data challenge must involve changes along all of the following four dimensions: people; tools and technology; data and data architectures; and demand and demand management. In conducting an analysis of alternatives for the Distributed Common Ground System–Navy Increment 2 (a system intended to help the Navy address the influx of data), we developed three potential alternatives (described above in Figure 1). Relative to the baseline, each increases the Navy's ability to better manage and use the rising flood of ISR data. All three alternatives assume that the Navy begins to dynamically manage analysts' workloads and that sensors are cued smartly.

How Well Do the Alternatives Perform?

Modeling and simulation reveal that all three alternatives outperform the baseline when it comes to finding the greatest number of targets in the smallest amount of time—a performance metric that indicates how quickly a commander can be made aware of the targets around his or her area of command. The baseline results in the lowest percentage of targets found when using data of a single intelligence type. Alternatives 1, 2, and 3 outperform the baseline, with alternative 3 (cloud) resulting in the greatest number of targets found most quickly.

A similar result is found when looking at the percentage of targets found across time given data of multiple intelligence types. In this case, analysts are fusing data from two or more intelligence sources—a process that improves the accuracy or “veracity” of a commander's situational awareness. Once again, alternatives 1, 2, and 3 outperform the baseline, but alternatives 2 and 3 offer significant improvements over both the baseline and alternative 1.

Recommendations

A solution to the Navy's big data challenge must involve changes along all four dimensions. This means that the Navy needs more than just new tools—it needs an approach to integrate them and make them more interoperable. The Navy also needs more than an adjustment in the number of analysts at each site—it needs to manage analyst workload dynamically. And the Navy should do more than just increase the number of distinct intelligence sources that are available—it needs a means to make them easy to find.

We recommend that the Navy pursue alternative 3—a cloud strategy similar to those adopted by Google, the intelligence community, and other large organizations grappling with big data's challenges and opportunities. This alternative offers significant potential performance improvements despite some technical and schedule risk. It is also (arguably) the alternative most amenable to future changes in information technology tools and applications.

We also recommend that the Navy adopt the intelligence community's cloud approach, designing its next generation of ISR tools and systems to work with the National Security Agency's distributed cloud concept (i.e., the Intelligence Community GovCloud). This information architecture should be sufficient to meet the growing volumes of data and thus enable viable tasking, collection, processing, exploitation, and dissemination operations in the future—even in a disconnected, interrupted, and low-bandwidth environment. Integrating and leveraging a distributed cloud architecture will enable some reach-back for analysis and help analysts cope with the increasing variety and volume of data, thereby improving their ability to help commanders make better decisions. Although alternative 3 involves an increased reliance on personnel and analytic capability “from the rear,” the Navy should embrace this dependency in order to reap the full benefits of the cloud solution.

Isaac R. Porche III is a senior engineer at the RAND Corporation. His areas of expertise include cybersecurity, network, and communication technology; ISR; information assurance; big data; cloud computing; and computer network defense. Bradley Wilson is an information systems analyst at the RAND Corporation. His areas of expertise include software engineering; software architecture; cybersecurity; modeling and simulation; network and communication technology; ISR; and biometrics. Erin-Elizabeth Johnson is a communications analyst at the RAND Corporation. Her areas of expertise include policy writing; strategic communications; communications planning; data storytelling and information visualization; and graphic design.

Note: This article is adapted from Isaac R. Porche III, Bradley Wilson, Erin-Elizabeth Johnson, Shane Tierney, and Evan Saltzman, Data Flood: Helping the Navy Address the Rising Tide of Sensor Information (Santa Monica, Calif.: RAND Corporation, 2014). It contains copyrighted material. The full report can be accessed at www.rand.org/t/RR315.

This article appeared on futureforce.navylive.dodlive.mil on June 16, 2014.

No comments: