Malware Analysis Done Right

The reality facing the cybersecurity industry today is as soon as network defenders develop a new way to spot malware, cyberadversaries are quick to find a way to circumvent it. With the number of cyberattackers growing every day, the time elapsed between deploying a protection and a bad actor finding a way around it grows ever shorter.

In a previous column, I examined how certain characteristics of common malware analysis environments potentially allow virtual machine-aware threats to spot them and either actively avoid detection or simply remain dormant.

Thankfully, the cybersecurity industry has developed different methods for analyzing malware, each with its own set of strengths and weaknesses. This means that, while the use of just one malware analysis method could leave a network exposed, implementing multiple analysis methodologies in the right order can give security teams a higher probability of preventing malware from penetrating the network, even malware samples that haven’t been previously identified. So, let’s take a moment to review the types of malware analysis available today and see how, when implemented in series, they allow security teams to handle the vast majority of threats automatically, freeing up team resources to actively hunt more advanced threats.

But first, a quick word about threat data: all of these malware analysis techniques rely on a threat intelligence data stream to train the algorithms and models, so their performance improves with access to more high-fidelity data. With a steady input of information on new threat vectors, malware families, and adversary playbooks and attacks campaigns, security systems and teams make more informed decisions about defending their networks. Without access to a critical mass of threat intelligence information, cybersecurity becomes a game of guesswork and hopeful thinking that’s bound to fail at some point, not an operating model anyone would be confident in recommending.

Static Analysis

Meant to be the first line of defense in a malware analysis environment, “static analysis” involves breaking down an unknown file into its component parts for examination without detonating the file. Through static analysis, the system can determine if the file has any potential markers or patterns that would indicate it is malware (for example, embedded executable scripts or calls to connect to an unknown or suspect server). Static analysis is an incredibly quick and accurate way to detect known malware and variants, which makes up the bulk of attacks typically seen launched against organizations.

Machine Learning Analysis

Some analysis systems have taken static analysis to the next level, adding support for machine learning.  “Machine learning” may sound like a buzzword, but it involves creating and automating a system to classify malicious behavior into groups (or families). These groups can be used to identify future malicious content without humans needing to build the pattern matches manually. If the similarities between suspicious content are significant enough, the system can automatically create a malware signature and push it to enforcement points throughout the network. As more malware samples are examined and catalogued, the system’s ability to mediate attacks on its own grows over time. In today’s world of commoditized cyberattacks where even unskilled adversaries can conduct attack campaigns, machine learning-enabled analysis is one of the best methods security teams have to handle the thousands of threat alerts networks receive daily.

Dynamic Analysis

If a suspect file cannot be handled through static analysis, it needs to be examined in greater detail by detonating it and observing the resulting host and network behavior. Generically referred to as “dynamic analysis,” it typically involves forwarding a suspicious sample to a VM-based environment and then activating it in a highly controlled environment (aka “sandboxing”), so its behavior can be observed and intelligence extracted. In cases of advanced VM-aware malware that can spot when it’s being deployed in a virtual environment, bare metal analysis may be required. Dynamic analysis is particularly good at finding zero-day exploits in malware.

Since static and machine learning analysis both require some degree of prior familiarity with the malware being analyzed, it’s difficult for them to identify truly novel malicious activity. The challenge with dynamic analysis is scalability; it requires massive compute, storage and automation to do it right. That said, if both static and machine learning analysis have already occurred, they’ve likely already identified and mediated the bulk of malware to be found. Employing dynamic analysis only when needed as part of a cloud-based automated system effectively removes the burdens of scale and manual effort.

view counter

Scott Simkin is a Senior Manager in the Cybersecurity group at Palo Alto Networks. He has broad experience across threat research, cloud-based security solutions, and advanced anti-malware products. He is a seasoned speaker on an extensive range of topics, including Advanced Persistent Threats (APTs), presenting at the RSA conference, among others. Prior to joining Palo Alto Networks, Scott spent 5 years at Cisco where he led the creation of the 2013 Annual Security Report amongst other activities in network security and enterprise mobility. Scott is a graduate of the Leavey School of Business at Santa Clara University.

Previous Columns by Scott Simkin: