Today, technological advancements and digitalization of the business world have taken place at an unanticipated pace. Unfortunately, this technology and digitalization have come with the increased risk of cyber threats. Big data analytics is considered to be the perfect solution to protect organizations and their data from cyber-attacks. Cyber-attack criminals are using sophisticated methods and tools to attack companies. These companies are mostly those that have been in operation for many years. Companies all over the world have applied various strategies to prevent cyber-attacks. However, the main challenges facing this endeavor have been the large volume of data and scalability. As a result, there is a need to come up with more effective strategies to be safe against cyber-attacks. In other words, companies must rethink how they respond to cybersecurity threats. This paper looks at how we can use data analytics to improve cloud security.
Using Data Analytics to Improve Cloud Security
In simple terms, big data may be defined as the large-scale data analysis and management technologies that are beyond the ability of the conventional techniques used to process data. There exists a difference between big data and traditional technologies. This differentiation can be done in three significant ways. First, big data and traditional technologies are different in terms of the amount (or volume) of data. Second, the two are different in terms of the rates of data transmission and generation (velocity). Lastly, big data is different from the other conventional technologies in terms of the types of unstructured and structured data (variety).
Today, people create approximately 2.5 quintillion bytes of data daily. There has been an increase in the rates of data creation, and this has seen the creation of 90% of the data in the past two years alone. As a result of the accelerated production of data and information, there has been a need to create newer technologies to effectively analyze such massive sets of data (Big Data Working Group, 2013).
This big data can be incorporated to change security analytics by offering new opportunities and tools to leverage the large amounts of both unstructured and structured sets of data. In that regard, it is prudent to define the concept of big data analytics. In the simplest terms, it implies the entire process of mining and analyzing big sets of data. Big data analytics may bring forth business and operational knowledge at an unexpected specificity and scale. The urgent need to critically analyze and also leverage the trend data obtained by enterprises is among the major driving forces when analyzing big data. There have been technological developments in the analysis, processing, as well as storage of big sets of data. For example, in recent years, there has been rapidly declining costs of data storage, including CPU power. The cost-effectiveness and flexibility of data centers, as well as cloud computing for elastic storage and computation, have also witnessed immense advancements.
There has also been the introduction and development of modern frameworks, including Hadoop. These new frameworks have enabled the users of data to take full advantage of the shared computing mechanisms storing large amounts of data via parallel and flexible processing. These advancements have resulted in the difference witnessed between big data analytics and traditional analytics.
Today, big data analysis can be used to address the various challenges facing cybersecurity. There has been growth in the complexity of IT networks. As a result, there has also been fast growth in the level of complexity and inventiveness of cybersecurity attacks, as well as threats. For example, between June and November 2016, close to 1 billion malware-related events took place. The estimated total cost of cyber crimes is up to the tune of $1 billion. Also, 99% of all computers in the world are prone to cyber threats and attacks. The image below shows the total costs of cyber-related crimes in selected seven countries:
There have been numerous efforts to combat cybersecurity threats and risks. As the malware threats and attacks continue to rise both in complexity and volume, it has become more challenging for the traditional analytical infrastructure and tools to efficiently keep up. The first challenge to combating cybersecurity threats has to do with data volumes or quantity. For example, on a single day at SophosLabs, about 300,000 new files that are potentially harmful that need to be analyzed are reported. The other challenge revolves around scalability. SQL based infrastructure and tooling do not scale well, and it is also expensive to maintain.
Data analytics is considered to be the perfect path to achieve cybersecurity. A company has to protect itself from all kinds of threats or cyber-attacks. However, a potential attacker only requires a single successful try. With such odds, a company cannot just attempt to prevent cyber-attacks from occurring. It is paramount to detect as well as respond to threats fast effectively. This is known as the PDR paradigm implying to the Prevention, Detection, and Responding to threats. This is where the concept and application of data analytics come in. Organizations and other key analyst companies have recognized that various issues can be easily overcome through the use of data analytics. Analyst companies have continued to write reports as well as advice their various clients about the effect of big data analytics on cybersecurity in various industries. For example, the CDC states that cloud and also big data analytics can keep off the various cyber threats that target health institutions. Companies and businesses are being actively involved and investing heavily in combating data breaches. For example, companies are identifying the anomalies in device behavior, network, including any abnormalities in contractor and employee behavior. Companies are also assessing network vulnerabilities and even risks.
Big data has been significantly changing the general analytics environment. More precisely, data analytics may be carefully leveraged to enhance situational awareness as well as information security. For example, data analytics may be incorporated to analyze the log files, analyze financial transactions, as well as analyze network anomalies to identify any defects or suspicious activities. It is also possible to use data analytics to correlate various data sources into a more coherent view. Big data operationalization has multiple benefits. This implies that just detecting the potential risks is never enough. PDF approach translates to preventing, detecting, and responding to threats. However, the real value of big data emanates from driving actions from the business teams. One needs operationalization capability that can easily sift through the data, identify the existing right signals, and then initiate the most appropriate move (Datameer, 2018).
Big data is significantly improving cybersecurity. Big data and analytics have shown great promise towards the effectiveness of cybersecurity. For example, according to 90 percent of the respondents from MeriTalk’s new US government study, there has been a significant decline in the number and incidence of security breaches. Also, 84 percent of the participants stated that they had applied big data to prevent cyber-attacks successfully. Keeping up with the volume of data has been a vital concern. However, there exist various challenges as the new cybersecurity threats keep popping up daily. Some of these challenges have to do with an overwhelming volume of data, lack of the right systems, as well as stale data by the time it reaches the cybersecurity manager.
If big data is poorly mined for purposes of improving cybersecurity, it can be ineffective for threat analysis. The metadata may be available, buts it may prove difficult to obtain maximum benefits from it. As a result, the problem could be identifying the right people who are well-versed with mining data for trends. Cybersecurity requires actionable intelligence as well as the risk management that is more prevalent in big data analysis. As such, it is advisable to have the necessary tools that can effectively analyze large sets of data. However, the secret lies in automating various tasks. This automation will ensure that any required data is readily available, and also, the required analysis is dispatched to the right individuals early enough. This, in turn, will enable data analysts to conveniently classify the cyber threats and risks without the usual extensive delays that might make the data in question irrelevant to the existing attacks. (SentinelOne, 2016).
The business world has witnessed massive digitalization. However, this digitalization has come with the increased risk of cyber-attacks. The good news is that big data analysis may be used to offer the required protection against a wide range of cyber-attacks. There have been highly complicated attack methods applied by cybercriminals and a growing role of the malicious insiders in some of the recent incidences of a security breach. This is a clear indication that the conventional approaches to ensuring information security are no longer effective and cannot, therefore, keep up. Companies, therefore, must rethink their cybersecurity approaches and concepts. Analytics is considered to be a pivotal element to leverage cyber resilience. This rethinking is necessary based on the increasingly advanced and also persistent attacks. At the core of big data analytics is improved detection. Detection is the starting point to effectively deal with cyber threats and attacks (BiSurvey.com, 2020).
The data-guided information security can be traced back to the detection of bank frauds and the anomaly-based tampering detection systems. Today, the detection of fraud is the most common use of data analytic methods. For many decades, credit card firms have rolled out fraud detection strategies. Unfortunately, the customized mechanisms used to extract big data for purposes of detecting fraud was not adequately economical to perfectly adapt to other fraud detecting applications. Today, off the counter, big data techniques and tools are mainly focusing on analytics for purposes of fraud detection in insurance, healthcare, among other areas.
A few years ago, it was difficult to analyze system events or even the logs for forensics. It was also a challenge to detect intrusion. There are various reasons why conventional approaches fail to deliver the necessary tools to fully support large scale and long-term data analysis. First, the storing or retaining of huge data quantities were not feasible in economic terms. Therefore, a lot of event logs, as well as other recorded computer activities, were easily deleted and lost after a certain fixed duration. Second, the performing of some complex queries or even analytics on huge and structured sets of data was highly inefficient. This was mainly because the traditional tools never leveraged on the big data technologies. Third, the various traditional data analysis tools were not adequately for analyzing and also managing unstructured sets of data. Therefore, the traditional analysis tools presented rigid and defined schemas. The big data tools, such as regular expressions and pig Latin scripts, can be used to query data in some of the most flexible formats. Lastly, big data systems usually incorporate various cluster computing infrastructure. As such, the systems remain more available and reliable. The systems also offer a guarantee that all the queries in the specific system have been processed adequately to full completion.
The analysis and storage of large and heterogeneous data sets are happening at an unexpected speed and also scale. This has been made possible by the new big data analysis technologies, for instance, databases that are related to the Hadoop ecosystem. These different technologies will, in turn, transform the security analytics in various ways. For example, there will be a transformation in the collection of data on a large scale from multiple internal company sources, as well as externally, for instance, the vulnerability database. There will be a transformation in the performing of more in-depth analytics on various data. There will be a more consolidated perspective of security-related information. Lastly, it will be possible to achieve real-time analyzing of streaming sets of data. It is, however, crucial to note that big data analytics still need system architects, as well as analysts. This will make it possible to obtain a more profound understanding of their existing system, to effectively configure the tools of data analysis.
There exist various ways in which we can use big data analytics to enhance security. The first use is network security. Today companies such as Zions Ban corporation are using the Hadoop clusters and other business intelligence mechanisms to quickly analyze more data in contrast to the conventional SIEM tools. In the company’s experience, the amount of data, as well as the frequency analysis of various events, are excessive for the conventional SIEMS to effectively handle alone. For instance, when using the traditional systems, it would take between 20 minutes to one hour to search from a month’s data load. However, by using the new Hadoop system to run queries with Hive, similar results can be obtained in approximately a minute. The security information warehouse that drives the implementation has various benefits for users. The users can extract useful and relevant security-related data from diverse sources such as security devices and also firewalls. The users can also extract information from business processes, website traffic, as well as from other daily transactions. This introduction of many disparate data sets and unstructured data into a single analytical framework is among the major promises of big sets of data.
Big data analytics may also be widely used for enterprise events analytics. Today, an enterprise will routinely collect enormous amounts of security-relevant data, such as people section events, various network events, or even software application events, for multiple reasons, such as the need for post hoc forensics analysis and regulatory compliance. Sadly, such a high data volume can potentially overwhelm the enterprise. An enterprise can hardly store the data, leave alone use it to do anything useful. For instance, it is projected that a large business enterprise such as HP can produce about 1 trillion events each day. This translates to approximately 12 million events every second. Those numbers are expected to grow as the enterprise runs more software, deploys more devices, hires more employees, or even enables event logging in more data sources.
The existing data analytical strategies cannot function effectively at this large scale, and the result will be a lot of false positives that their overall efficacy will be undermined. This issue will worsen as the enterprise moves to cloud architecture and continue collecting much more data. This will have a negative impact because as more data is collected, the data will lead to less actionable information. Recently, there has been researching at HP, whose goal is to move towards a situation n where more data results in better analytics as well as more actionable information. To achieve this, systems, as well as algorithms, have to be structured and also implemented to easily identify any actionable security-related information from the vast data sets. As a result, the false-positive rates will be lowered to levels that are easily manageable. In this situation, collecting more data will translate to more value from such data. However, it will be crucial first to solve multiple challenges and then realize the real capability of big data analytics. These common challenges include privacy, legal, and other technical matters regarding scalable data visualization, analysis, storage, transport, or collection. Despite the various drawbacks, the team at the HP lab has managed to address multiple big data analytics for security issues. Other enterprises can, therefore, borrow from the efforts of HP to use big data analytics for enterprise events analytics. This will, in turn, translate to enhanced security.
Big data analytics can also be used for advanced persistent threat detection. An advanced persistent threat refers to a targeted attack against any physical system or an asset of high value. Compared to the mass spreading risky malware such as trojans, viruses, or worms, the APT cyber-attackers will work in a low and slow mechanism. Low mode maintains a low profile in the network. On the other hand, the slow mode provides for a long execution time. The APT attackers avoid triggering alerts by leveraging stolen user credentials or even zero-day exploits. As a result, this kind of attack can happen over a long period, while the target enterprise is still unaware.
APTs are some of the most severe threats to information security that companies face today. The basic objective of the APT is to steal the IP from a target company. The APT will then gain complete access to confidential and sensitive user data or even access some of the strategic business data that may be later used for illegal insider trading, data poisoning, embarrassment, blackmail, financial gain, or also disrupting the company’s business. APTs are mostly utilized by motivated, well-financed, and highly skilled cyber-attackers who target sensitive data from specific enterprises. Today APTs ere becoming more advanced and sophisticated in both the technologies and methods used. This is particularly their ability to use the employees in the target organization to anonymously penetrate the existing IT mechanisms by using various social engineering strategies. The users will often be tricked to open a spear-phishing message that is customized for each target victim, such as PUSH messages, SMS, and emails. The attackers will then download and install a specially designed malware that might contain zero-day exploits.
The effective detection of threats heavily relies on the knowledge and expertise of the human data analysts to build secure, customized signatures and also conduct manual investigations. The process is not scalable, hard to generalize, as well as labor-intensive. Big data analytics is a practical approach to detecting APTs. However, there exists a problem in the form of the massive amounts of data that must be sifted in the search for any anomalies. This data is usually extracted from diverse ever-rising sources of information that must be audited first. This process makes the detection task more difficult. Based on the large volumes of data, the conventional network perimeter defense system can end up being ineffective in the detection of targeted cyber-attacks. This is because such conventional systems are not easily scalable to the enterprise networks that are ever-increasing in size. As a result, there is a need for a new and more effective approach. Most organizations collect their data relating to user hosts’ and users’ activity within an organization’s existing network, as logged by VPN users, intrusion detection systems, domain controllers, web proxies, and firewalls.
Technology has had immense benefits, such as the digitalization of the business world. Despite the various benefits, companies are still facing a significant risk of cyber-attacks. For example, companies have suffered the immense loss of data at the hands of cyber-attack criminals. To solve this problem, companies have turned to big data analytics. Today, there has been a growing adoption of mobile and cloud services. As a result, there has been the emergence of more sophisticated tools and methods used by modern cybercriminals. For many years, companies have relied on traditional tools, but they have proven ineffective. This calls for an urgent rethinking of the concepts that companies have on cybersecurity. Companies and businesses must move past the pure prevention approach and employ the PRD strategy. The PDR paradigm entails preventing, detecting, and responding. By using big data analytics, it will be possible to improve cloud security.
References
Big Data Working Group. (2013). Big data analytics for security intelligence. Cloud Security Alliance, 1-22. Retrieved from https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Big_Data_Analytics_for_Security_Intelligence.pdf
BiSurvey.com (2020). Big Data Security Analytics: A Weapon Against Rising Cyber Security Attacks? Retrieved from https://bi-survey.com/big-data-security-analytics
Datameer (2018). Challenges to Cyber Security and how Big Data Analytics Can Help. Retrieved from https://www.datameer.com/blog/challenges-to-cyber-security-and-how-big-data-analytics-can-help/
SentinelOne (2016). How Big Data is Improving Cyber Security. Retrieved from https://www.csoonline.com/article/3139923/how-big-data-is-improving-cyber-security.html