Big Data: Need for Exploiting Potential for Indian Navy

Published SP’s Military Year Book 2018, Pg-97

“Data really powers everything that we do.”

Jeff Weiner, CEO, LinkedIn

The oceans are complex mediums whose nature provides ample opportunity for an enemy to avoid detection—weather, sea states, and coastal land mass all present considerable challenges to modern sensors. The oceans are the world’s foremost (and most unregulated) highway, home to a vast and wide variety of international neutral shipping that poses no apparent threat. Main pillars of maritime trade and transit are Safety and security at sea. However, peacetime economic use of the seas is also subject to sea piracy, accidents at sea, oil spillage, illicit trade in drugs, arms and humans, environmental damage etc. which are avoidable drains on the economies of seafaring nations. Maritime events that could potentially affect India are not the only wide-ranging element of maritime domain awareness (MDA), it is also essential that threats be identified as they evolve during peace times. The global nature of MDA activities occurring overseas and in foreign ports is very much a part of MDA. Its core is applying the vessel tracking process to a layered defence model centred on the coastline of India, the ultimate goal of which is to detect potential threats as early and as far away from the Indian coastline as possible. Oceans thus demand a much higher level of MDA than that required in a conventional naval conflict.

Strategic aspects of MDA require a broad perspective and capabilities at the highest levels of analysis, intelligence, and policy. National-security operations in the ocean take place globally and often require continuous, near real-time monitoring of the environment using tools such as autonomous sensors, targeted observations, and adaptive modelling. This requires advance sensor and technology capability, particularly for autonomous & persistent observations. Developing this data network requires new methodologies that address gaps in data collection, sharing, and interoperability of technologies, and it should permit the integration of existing research into operational systems.

Asia-Pacific is a vast region and therefore data generation and collection is a humongous and costly task. The coverage and resolution provided by manned resources and satellites remain grossly deficient considering the large area, the time needed, and a multitude of tasking requirements. This gap can be plugged by utilizing the autonomous Aerial, surface and underwater systems. These could provide persistence, mobility, and real-time data.

  “…[t]he main advantage of using drones is precisely that they are unmanned. With the operators safely tucked in air-conditioned rooms far away, there’s no pilot at risk of being killed or maimed in a crash. No pilot to be taken captive by enemy forces. No pilot to cause a diplomatic crisis if shot down in a “friendly country” while bombing or spying without official permission” 

-Medea Benjamin, 2013

In essence, the autonomous unmanned systems provide the advantages of large area coverage, prolonged deployment, low risk, much lower acquisition & operating costs, direct tasking and near real-time data reporting. In case of surface and underwater systems however, the transit times are higher than the Aerial systems.

As an illustration, on the Automatic Information System (AIS) for monitoring vehicular traffic and its associated data structures, it is estimated that nearly 20 million positions per day are available with respect to satellite and radio data. The fact remains that the bulk data analytics has not matured enough to permit its complete exploitation. Further, in its current form, AIS data is susceptible to hacking, falsification, and manipulation. This falls in the regime of Big data analytics which is the process of examining large and varied data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions. This provides extensive opportunities for Big Data Analytics in the maritime domain which may turn out bigger and better than those available on the land.

Big Data Management

Currently, Intelligence, Surveillance, and Reconnaissance (ISR) programs fall into three major categories of National Level, Joint Military Intelligence Level and the Tactical Level. The Tactical ISR effort provides direct support to military operations. However, in the recent years’ distinction between three types of ISR effort are blurring.

The importance of information and intelligence for the navy is apparent from the fact that it is fundamental to the planning of any naval operation in peace or war. It forms the core component of the military kill chain. It includes the steps of target detection and identification, despatch of force or weapon to track it, decision making, and final command to destroy it. It is also known as F2T2EA cycle or the Find, Fix, Track, Target, Engage and Assess cycle. ISR is the key determinant of detection and identification of the target. The success of the mission depends upon the correct analysis of the available information and its further dissemination. during the planning stage. Today the methods of collecting information have changed due to quick availability of combat information of high quality and reliability from a varied array of sensors and sources.

The basic information that is required by any commander in today’s networked warfare is an accurate position of his own units, the location of the enemy and his reserves, the location of supporting units and placing of other miscellaneous assets. Thus, crucial to any mission is ‘situational awareness’, which comprises tasking, collection, processing, exploitation, and dissemination. Embedded in the ISR is Communication without which no mission can be accomplished. Digital communication systems, internet, and mobile devices have revolutionized the amount of data generation. The term Big Data in the navy thus refers to a whole gamut of information available from sensors, video imagery, mobile phone, signals intelligence, and electronic warfare interceptions to satellite images. Data is being collected at unprecedented levels.

To understand and react to real-time tactical situations commanders have to manage and control big data environment comprising of, historical or point-in-time data, transactional and ad-hoc use of the system. The navy has been collecting data at humongous levels since the induction of unmanned vehicles with sensors. The data heaps cannot be analysed in a traditional manner. They require dedicated data scientists and development of different software tools to exploit extracted information for mission planning. Major issues faced by the navies today involve the availability of ever-increasing volumes of sensor data from integral sources like UAVs and other national assets. The US ARGUS ground surveillance system collects more than 40 Gigabytes of information per second. Spy satellites deployed by countries also generate gigabytes of geospatial data. It has become increasingly important for naval officials to make sense of the vast amount of data that they are producing. A simple full day UAV mission can provide upwards of 10 terabytes of data of which only about 5% is analysed and the rest stored.

Ward and Barker have provided a comprehensive definition of Big Data stating that- “Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to NoSQL, MapReduce and machine learning”.

One thing is clear that big data pertains to the large-scale collection of data due to a reduction in costs of data collection and storage as well as a surge in sources of data like cameras, sensors, and geospatial location technologies. Both types of data collection viz in analogue form (images from cameras, mobiles, video recorders, wearable devices, which can be converted to digital) and digital form (emails, internet browsing and capture, location devices) are currently available in various formats and require data fusion. The collection and processing are demanding speeds, which are near real time, thus pushing data analysis to its current limits. For example, mapping services, medical care and machine operations require near-immediate responses for them to be safe and effective. Technologies for the handling of big data as well as their management are witnessing an unprecedented demand.

A complete perspective into big data requires pinning down of a few important characteristics peculiar to big data. Since big data originates from a tangible source, a large part of it may be of no value. Further, data in its raw form may not be amenable to analysis. An information retrieving mechanism is required to extract relevant information and convert it into a structured form before it can be analysed. This is a demanding technology as thoroughness and accuracy are required to be ensured at this stage. Data analysis is not simply identification, location, or citing of data, it is an interdisciplinary endeavour involving pure mathematics, statistics, and computer science etc. For data to be of value to a decision maker it has to be made understandable and interactive in near real time. The software needs designing in a manner that it is user-friendly since the user may not be having an in-depth understanding of big data systems and algorithms. Predictive analytics is an important area of big data mining that deals with extracting information from data and using it to predict trends and behaviour patterns. Fundamentally, predictive analytics depends upon capturing relationships between explanatory variables & the predicted variables from past occurrences and exploiting them to predict the unknown. Predictive analytics tools have become sophisticated to dissect data problems and present findings using simple charts, graphs, and numbers that indicate the likelihood of possible outcomes. The accuracy and usability of results, however, depends on the level of data analysis and the quality of assumptions.

Rapid technological advances in sensor based, smart, and networked combat systems is pushing the navies to adopt commercially available emerging technologies and adapt them for its use. The advent of big data is driving the armed forces to shift the integrated decision-making support systems to architecture and analytics of big data. The financial crunch faced by navies in leading countries implies even more dependence upon technology due to the reduced manpower. This, in turn, has led other nations to adopt a wait and watch strategy by which they would go in for the best available solution adopted by leading navies.

The data analysts are restricted by the download speeds of data depending upon their locations. Untagged data leads to the downloading of similar data from other sources by the analyst to firm up their conclusions. Many times, the communication lines are shared or may not be continuously available thereby increasing delays in the analysis. Providing a comprehensive situational awareness is dependent upon the accuracy and integration of data received from multiple types of sensors as well as intelligence sources. The screens and software tools do not have interoperability as of now. Due to security considerations, ISR data from different sources is stored in different locations with varying access levels, this leads to incomplete analysis. Single network domain providing access to data at multiple levels of security classification is not yet available. Analysts currently spend only 20 per cent of their time looking at correct data, whereas 80 per cent of the time is spent looking for the correct data.

Some of the companies working in this field with the US military which provide a common operating picture or COP are given in succeeding examples.

-Modus Operandi takes big data, infuses it with expert knowledge, and creates a common framework for easy identification of patterns. The data is embedded into an underlying graphics structure and is amenable to complex queries. It can detect patterns and output different types of visualizations, like maps and timelines etc.

-Palantir Technologies, is known for its software Palantir Gotham that is used by counter-terrorism analysts at offices in the United States Intelligence Community and United States Department of Defense, fraud investigators at the Recovery Accountability and Transparency Board, and cyber analysts at Information Warfare Monitor (responsible for the Ghost Net and the Shadow Network investigation).

-SAP’s Hana platform, provides real-time analytics and applications platform for real-time big data that offers varying layers of security. It offers predictive, forecasting and calculation solutions and stitches together maintenance failure codes and document notes.

-To tackle the problem and analyse data in real time, Oracle has created a new-engineered system to handle big data operations. The company brought together its hardware with Cloudera’s Hadoop, enabling patching of multiple layers of the big data architecture.

-United Data Architecture of Teradata is a comprehensive big data solution, which aims to bring data needed for analytics across the entire organization into one place to create a single version of enterprise data. For example, capturing minute-by-minute maintenance data in the field including potential new sources of big data.

-DigitalEdge by Leidos is a scalable, pre-integrated, flexible, and pluggable data management platform that allows rapid creation and management of near real-time big data applications. Leidos’s Scale2Insight (S2i) is a solution that supports large complex data environments with multiple disparate sensors collecting information on different parts of the data ecosystem.

-SYNTASA delivers analytical applications focused on the behaviour of visitors to internal government web sites. The analytics, which is built on an open source big data platform, determine the behavioural trends of visitors in order to improve the use of information by government analysts.

Indian Context

In India, the Department of Science and Technology under the Ministry of Science and Technology and Earth Sciences has been tasked to develop Big Data Analytics ecosystem. DST has identified important areas for development of BDA ecosystem in India. Creation of the HR talent pool is the first requirement. This will require the creation of an industry-academia partnership to groom the talent pool in universities as well as the development of a strong internal training curriculum to advance analytical depth. The Big Data Analytics programme has five steps:

-to promote and foster big data science, technology and applications in the country and to develop core generic technologies, tools and algorithms for wider applications in Government.

-to understand the present status of the industry in terms of market size, different players providing services across sectors, SWOT of industry, policy framework and present skill levels available.

-to carry out market landscape survey for assessing the future opportunities and demand for skill levels in the next ten years.

-to bridge the skill level and policy framework gaps.

-to evolve a strategic road map and micro level action plan clearly defining roles of various stakeholders such as government, industry, academia and others with clear timelines and outcome for the next ten years.

National Data Sharing and Accessibility Policy (NDSAP) 2012 of DST is designed to promote data sharing and enable access to government-owned data. Big Data Analytics infrastructure development in India is being steered by the C-DAC (Centre for Development of Advanced Computing), Ministry of Electronics and Information Technology (MeitY). State of the art hardware system and networking environment has already been created by the C-DAC at its various facilities. C-DAC’s research focus in cloud computing includes design and development of open source cloud middleware; virtualization and management tools; and an end to end security solution for the cloud. A number of applications in C-DAC are being migrated to cloud computing technology. C-DAC regularly conducts Training on “Hadoop for Big Data Analytics” and “Analytics using Apache Spark” for various agencies including Defence.

Indian Navy has a robust naval network with thousands of computers connected to it. This naval network ensures information availability/ processing, communication services, service facilitation platforms, multi-computing platforms, resources/information sharing, data warehousing, and so on. However, Cyber Security and Network Integrity are crucial to protecting the naval network from data theft, denial of service, malicious viruses/ trojans attacks, single point failure, data & network integrity loss, and active/ passive monitoring.

Indian Navy has Naval Unified Domain NUD or Enterprise Intranet, which is the backbone of Indian Navy. All communications, internal to enterprises, are through NUD only. It offers secure, isolated, fast and reliable connectivity across navy. NUD network operates only on controlled data (no unknown data from other applications as it happens over the internet) which can be easily segregated and analysed. But personnel working on NUD often need to transfer data from the internet to NUD and vice-versa, which may lead to security breaches of NUD. Further, physical guarding of NUD network lines against Men-in-the-Middle Attack is a complex task since Naval units are located at different geographical locations. There is a possibility of passive monitoring, active monitoring, certificates replications etc. These attacks can be carried out by sophisticated software and hardware technologies such as via a mirror port or via a network tap.

It can be seen that the applicability of big data analytics in the context of Indian Navy is very much in line with the developed forces in the world. There exists a requirement of big data analytics in the fields of intelligence, operations, logistics, mobilization, medical, human resources, cybersecurity and counterinsurgency/ counter-terrorism for the Indian Navy. There is also the associated requirement to acquire the predictive capability to anticipate specific incidents and suggest measures by analysing historical events.

However, due to the nascent nature of big data analytics, its awareness is limited to a small number of involved agencies in the Navy. The benefits of big data in operational scenario decision making while safeguarding accuracy and reliability have not yet been internalized. Big data projects even at pilot scales may not be available currently. In the present situation, decision makers are not clear about the capability of big data, costs, benefits, applicability or the perils if any of not adopting big data.

Big data holds enormous potential in Naval Context to make the operations of Navy more efficient across the entire spectrum of its activity. The research and development necessary for the analysis of big data is not restricted to a single discipline and requires an interdisciplinary approach. The challenges at all stages of the analysis include scaling, heterogeneity, lack of structure, privacy, error handling, timeliness, origins, and visualization. Big data holds enormous potential to make the operations of armed forces more efficient across the entire spectrum of their activity. Computer scientists need to tackle issues pertaining to inferences, statisticians have to deal with algorithms, scalability and near real-time decision making. Involvement of mathematicians, visualizers, social scientists, psychologists, domain experts and most important of all the final users, the Navy, is paramount for optimal utilization of big data analytics. The involvement and active participation of national agencies, the private sector, public sector, and armed forces would ensure full exploitation of the potential of big data for the Indian Navy.