====================
== Alert Overload ==
====================
Tales from a SOC analyst

Getting Started with Malware Analysis and Reverse

Getting Started with Malware Analysis and Reverse Engineering

This page is from internal enterprise documentation that was created for introducing common malware analysis topics. This is not a copmrehensive document, and many sections link to online resources. Tool selection and documentation is largely based on what is present in the enterprise lab the document was created for.

Contents

Malware Analysis Workflow


    +-------------------------------------+
    |        Malware Analysis Workflow    |
    +-------------------------------------+
                       |
                       v
         +-----------------------------+
         |     Threat Intelligence     |
         +-----------------------------+
         | - Hash Lookups (VT, HA)     |
         | - IP/Domain Reputation      |
         |   (AbuseIPDB, ThreatFox)    |
         | - Sample Pivoting (Any.run) |
         | - YARA/IoC Extraction       |
         +-----------------------------+
                       |
                       v
         +----------------------------+
         |      Static Analysis       |
         +----------------------------+
         | - File Analysis            | <-- Tools: PEStudio, DIE, HashMyFiles
         | - Code Analysis            |
         | - Disassembly              | <-- Tools: Ghidra, IDA Free
         +----------------------------+
                       |
                       v
         +----------------------------+
         |      Dynamic Analysis      |
         +----------------------------+
         | - Network Monitoring       | <-- Tools: Wireshark, INetSim
         | - Process Monitoring       | <-- Tools: ProcMon, ProcExp
         | - API Call Monitoring      | <-- Tools: API Monitor
         | - Logging                  | <-- Tools: Sysmon, PowerShell logs
         +----------------------------+
                       |
                       v
    +-------------------------------------+
    |  Debugging & Reverse Engineering    |
    +-------------------------------------+
    | - IDA                               | <-- Tools: IDA Pro, Ghidra
    | - x64dbg                            | <-- Tools: x64dbg, Scylla
    +-------------------------------------+

Threat Intelligence

Threat intelligence gathering is an important step in scoping the work that needs to be done during the analysis. Often, a sample has defining traits that can be serached for in public threat intel spaces like VirusTotal, Tria.ge, or any other sandboxing/intel platform. By searching for these traits, like hashes, filenames, networking indicators, etc., you can determine if a sample needs to be reversed, categorized as a variant, or simply documented as a known sample.

Hashing

Hashing is a common way to determine if a sample has been seen in other environments before. Getting a file hash for a sample allows you to search VirusTotal or other publi platforms for occurences of the sample file. However, variants (samples that are the same family of malware, but slightly different) will not match hashes, as they are unique values to individual files.

Get a file hash:

# PowerShell
Get-FileHash -Algorithm SHA256 file.name
# Bash
sha256sum file.name

YARA

YARA rules are more in-depth file behavior rules that can be run against any file. If the file behavior matches the defined YARA rule, the file will trigger an alert, or match. In many cases, YARA rules exist for known malware. Running a sample against a list of well-defined YARA rules can provide quick insight into what family the malware belongs to.

#Example YARA rule

import "hash"

rule yara_rule
{
   strings:
      $variablename = "string_value"
   condition:
      $variablename or
      hash.sha256(0, filesize) == "sha256_hash"
}

Resources

Sandboxing and threat intel platforms:

Manual threat tracking and attack flows

Static Analysis

Static analysis involves examining a file or binary without executing it. This allows analysts to safely inspect files without risk of executing malicious code. Often, static analysis and threat intelligence are enough to determine the impact of a malicious file.

General static analysis steps include:

  • File Analysis
  • Code Analysis
  • Disassembly

File Analysis

File Analysis occurs when the file behavior is unknown. Before attempting to execute a file or sample, static file analysis methods should be employed to determine the behavior of the file. In most cases, this means employing string listing, examining magic numbers/headers, and looking at imports and exports. It is generally a good idea to understand what a file might do before executing it.

Strings

Strings.exe is a program that will print all continuous UTF-8 strings present in a file’s raw data. There are multiple iterations of strings for various architectures, operating systems, and language preferences. I am particular to the Get-Strings function of Invoke-FileAnalysis.

Consider the following hex dump:

4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00 
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00 
B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 72 6F 67 72 61 6D 20 63 
61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 6D 6F 64 
65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 C7 BF 79 DA 83 DE 17 89 83 DE 
17 89 83 DE 17 89 00 C2 19 89 82 DE 17 89 CC FC 1E 89 87 DE 17 89 B5 
F8 1A 89 82 DE 17 89 52 69 63 68 83 DE 17 89 00 00 00 00 00 00 00 00 
50 45 00 00 4C 01 03 00 16 6A 88 53 00 00 00 00 00 00 00 00 E0 00 0F 
01 0B 01 06 00 00 F0 00 00 00 30 00 00 00 00 00 00 E0 16 00 00 00 10 
00 00 00 00 01 00 00 00 40 00

While this may look like a random assortment of data, parsing it through strings reveals it is a windows binary.

This program cannot be run in DOS mode
Rich
PE

Headers

Headers, or magic numbers, are the first series of bytes in a file. These bytes determine the type of file. Often, file extensions are misleading. A file can be renamed with any extension, regardless of what the data actually represents. By examining magic numbers, you can get a better sense of what type of file a sample is.

For instance, the following header signifies that a sample is a Windows PE file.

PE Binary Magic Number
4D 5A 90 00

All files beginning with that series of bytes should conform to the PE file structure.

For a more complete list of magic numbers, see List of File Signatures.

Imports/Exports

Binary files use imports when accessing operating system components. In most cases, this is segments of Windows API files and capabilities. For PE files, imports are located in the IAT (Import Address Table - see PE Files for more details on PE structure)

alt text

Exports are functions that originate from within the binary, but are exported for use outside of the binary scope. In most situations that do not involve DLL or similar files, there will only be a start export that indicates to the operating system where the start of the code is.

alt text

Tools

Code Analysis

In some cases, raw code files like JavaScript, Python, or PowerShell, are used to carry out initial access. In these cases, the code can be directly analyzed. In these cases, it is beneficial to be familiar with certain common functions and methods employed by malicious software.

Common Functions

Instead of listing all the various types of functions, techniques, and tools for each language, it’s better to go with a more hands on approach. This section lists some resources for exploring these common functions in various languages.

General samples and code:

For PowerShell based malware:

For Rust based malware:

For Python based malware:

For C based malware:

Tools

Visual Studio Code, or any code editor, are great for viewing text encoded malware files. For compiled binaries, using a decompilation view or raw assembly view can also offer great insight into the code function. Of course, compiled binaries will not decompile to the code used by the threat actor. Instead, it will decompile to a representation of the code, typically using C-like structures (if disassembled). Some tools are better than others, and some languages produce more legible results than others. For more information on this, see File Analysis and File Structures.

Dynamic Analysis

Inverse to static analysis, dynamic analysis involves executing the sample while monitoring the behavior of the sample on a system. This is typically the ‘fun’ part of analysis.

When executing potentially malicious files, ensure that a snapshot is taken prior to the detonation. This allows for easier system restoration when attempting to roll back actions taken by the malware.

Network Monitoring

Network monitoring is an essential component to dynamic analysis. At it’s core, network monitoring serves to capture all network packets exchanged between the malware sample, infected host, and threat actor controlled infrastructure. In more in-depth analysis, it is also possible to simulate endpoints and traffic, allowing for the simluation of certain actions. For example, a more complex sample may require certain data to be returned from a remote server in order to execute. Using advanced simulation tools, it is possible to simulate this server and return the data needed for the sample to progress.

WireShark

WireShark is used to capture network packets. For non-encrypted packets, or for samples that roll their own local encryption, but do not use SSL for communications, WireShark can also display the raw data in eeach packet.

For information on utilizing advanced features of WireShark, please see Chris Greer.

INetSim

INetSim is a network simulation tool that can be used to simulate adversarial infrastructure.

FrausDNS

FrausDNS is a Windows DNS spoofer that can capture DNS requests made by any process. These requests can be isolated and used for analysis purposes.

PolarProxy

PolarProxy is a TLS and SSL inspection tool that decrypts and re-encrypts traffic, saving a copy of the decrypted traffic to a PCAP file. It is very useful for examining the contents of encrypted traffic, such as in Command and Control samples.

Process Monitoring

Process monitoring involves logging process activity like API calls, file and network events, and registry actions. Process monitoring tools like ProcMon run in the background while a sample detonates, capturing all system process data. This data can then be filtered and examained to further understand the impact that a sample has on a system.

ProcessMonitor

Process Monitor is a part of the Sysinternals tool suite. I highly recommend checking out all of the tools, but for this document, we will focus solely on ProcMon and Sysmon (see sysmon for more information).

alt text

Process monitor is a very verbose tool, as it tracks all interactions a process has with the system. To aid in analysis efforts, robust filtering options can be found in the filter panel.

alt text

alt text

Additionally, right clicking the column headers will allow for column customization. This can help focus the results on meaningful details.

alt text

Logging

Taking advantage of native logging capabilities can be a powerful way to monitor malware samples post execution. For PowerShell based malware, PowerShell event logs can be configured to track all module execution and events. This can reveal exactly what code is running, when it’s running, and what it is doing. For binary execution, additional logging can be configured with Sysmon to track further, more in-depth, events.

PowerShell Logs

Please read The problem with PowerShell logging bypasses. This article covers some basic logging configurations and logging bypasses.

Sysmon

Sysmon is a powerful logging tool that provides detailed information about processes, network events, and file changes. It enhances the native event logging and provides much needed security depth. Sysmon logs are viewable in the event viewer, and are great for correlating malware activity.

alt text

The complete list of Sysmon event IDs can be found at the Sysmon MSDN page.

General Event Logs

Other log sources, aside from PowerShell and Sysmon, are also useful tools for malware analysis. Sepcifically, the native security, application, system, and application/service specific logs are useful for analyzing the events that occur during malware execution.

Secrity Event Logs

alt text

API Calls

Monitoring API calls directly with tools like API Monitor is one of the best ways to get a deep understanding of what a sample is doing.

The Windows kernel exposes certain APIs for usermode applications to interact with. It is these API calls made from the usermode application that allow binaries to execute operations like creating files, making network connections, and performing memory operations. By monitoring the API calls made by a process or application, it is possible to directly determine what events are occurring, how they are occurring, and what the result of each operation is.

alt text

alt text

For more information on Windows API and kernel security, go read Windows Security Internals.

API Monitor

API Monitor is a powerful tool for hooking into applications and process threads to monitor API calls.

alt text

In API Monitor, you can select any running process to monitor. You can also launch binaries with API Monitor, to start the monitor on process launch. In the API filter pane, you can select the API modules you would like to monitor for. In most cases, you likely want to select all modules.

alt text

A monitored process will display all API modules accessed, as well as all threads spawned by the process. You can scroll through the summary pane to see individual calls in a timeline. Double-clicking an entry in the summary will direct you to the documentation for that call (sometimes - it often doesn’t work).

File Structures

Understanding binary file structures is important when attempting to reverse engineer or analyze a sample. This section will cover basic file structures and provide tooling for more detailed analysis.

All examples will be using custom binaries designed to request the user’s IP address from ifconfig.me. These examples should demonstrate finding imports and exports, examning code structures, and decompiling in both assembly view and IDA view. Additionally, resources will be linked where available to provide further context for reverse engineering specific binary types.

Afer decompiling and staticly analyzing a binary, you should have a general understanding of what API calls the binary is making and what functions are executing. At this point, you would typically move on to dynamic analysis using API Monitor, debuggers, and other dynamic analysis processes. In some cases, you may want to dive deeper into reversing the code statically. There are good tools for this, like SourceTrail, which you can learn more about in this talk RE//verse 2025: Streamlining Firmware Analysis with Inter-Image Call Graphs and Decomp (Robin David). This section is strictly for high-level overviews.

PE Files

Please read https://alertoverload.com/posts/2024/07/pe-files-and-how-to-create-a-powershell-pe-file-parser/.

NIM Binaries

Seriously, watch this talk Reverse-Engineering Nim Malware.

  • I had some good stuff for this, but I need to find it again.

Rust Binaries

Reverse engineering Rust binaries is more complex of a process compared to C or Nim. Due to the nature of Rust’s compiler, the core functions of the binary are more abstracted compared to the straightforward nature of C. Additionally, decompilers have been working with C language binaries for far longer than Rust. This means that decompilers are significantly better at analyzing C binaries than they are at analyzing Rust binaries.

Note: I didn’t remember that User-Agent strings were a thing when I wrote this, so there’s extra parsing going on in this code compared to the other samples. It doesn’t affect the decompilation process though.

Code

use curl::easy::Easy; // Import the easy curl crate

// main entry point
fn main() {

    let url = "ifconfig.me"; // define the url
    let mut e = Easy::new(); // create an Easy object
    e.url(&url).unwrap(); // execute request on url

    let mut data = Vec::new(); // store data as UTF8 bytes object

    // Write response to data from slice
    {
        let mut t = e.transfer();
        t.write_function(|new_data| {
            data.extend_from_slice(new_data);
            Ok(new_data.len())
        }).unwrap();
        t.perform().unwrap();
    }

    let response = String::from_utf8(data).expect("Data retrieval from curl failed."); // Convert UTF8 bytes to string

    let split_vec: Vec<&str> = response.split("ip_addr:").collect(); // Split the string

    let public_ip = split_vec[1].split("<br>").next().unwrap(); // Split the split string

    println!("Printing out the public IP of this device: {}",public_ip); // print the IP address

    return
}

This code was compiled with: cargo build --release

Decompilation

Loading this binary in IDA reveals very little at first. The main entry point does not contain a significant amount of information. By following the call instructions, we can follow the chain of functions the program is accessing. This can be a good start for getting a general understanding of what a binary is doing.

alt text

Additionally, we can also look at the imports window. This view will show the imported functions from the WinAPI that are used by the binary. In the case of this binary, several imports stick out.

They are:

  • WS2_32 networking imports
  • kernel32 console write and allocation imports
  • CRYPT32 certificate imports

These imports indicate that this code uses a network connection to do something, potentially printing the results to the console. As we know, the binary is using a network connection to get and display data, so this checks out.

alt text

Double-clicking an import will take you to the definition in idata. Right clicking the idata entry will allow you to list all cross-references (xrefs) to the import.

alt text

Examining xrefs is a great way to quickly identify important functions and segments of code.

alt text

Additionally, you can open other subviews, like strings, by clicking the View menu item and navigating to Subviews.

alt text

You can often determine what language a binary was written in based on the remnant strings. In this instance, there are numerous mentions of cargo and Rust functions. Double-clicking a string value will take you to the position in the code the string is being read from.

alt text

Here, we can see the static string we used for the URL.

alt text

Listing xrefs for this value reveals the location of the static definition in IDA view.

alt text

Similarly, if we follow the xrefs for the WriteConsoleW API call, we can find the function that is printing our message to the screen.

alt text

We can also list xrefs while in IDA view. Simply select the function name sub_XXXXXXXX and select xrefs. This will show all addresses where the function is being called.

alt text

Pressing tab in any IDA view pane will open the psuedocode viewer for that function. This decompiles the assembly into a pseudocode C-styled view. This can be useful for translating obscure assembly instructions into something more readable. However, IDA isn’t super great at this, and GHIDRA is probably a better tool for decompilation like this.

alt text

C Binaries

Code


#include <stdio.h>                      // For printf, snprintf
#include <stdlib.h>                     // For malloc, free, etc.
#include "curl/curl.h"                  // For libcurl functions
#include <string.h>                     // For string functions like strlen

size_t callback(void *buf, size_t size, size_t count, void *data){ // libcurl write callback
    size_t total_size = size * count;                              // Calculate the total number of bytes received
    char ip[256];                                                  // Buffer to hold the formatted output
    snprintf(ip, sizeof(ip), "Printing out the public IP of this device: %s", buf); // Format the output string with the IP
    ip[strlen(ip)-1] = '\0';                                       // Strip the newline character from the end
    printf("%s\n", ip);                                            // Print the final message to stdout
    //fwrite(buf, size, count, stdout);                            // (Optional) Directly write raw buffer to stdout
    return total_size;                                             // Return number of bytes handled
}

int main(void)
{
    CURLcode res;                                                  // Variable to store CURL result status
    curl_global_init(CURL_GLOBAL_DEFAULT);                         // Initialize the libcurl environment
    CURL *curl = curl_easy_init();                                 // Create and initialize a CURL handle
    if(curl) {                                                     // Proceed only if the handle is valid
        CURLcode res;                                              // Declare again (technically unnecessary — already declared)
        curl_easy_setopt(curl, CURLOPT_URL, "ifconfig.me");        // Set the URL to fetch (returns public IP)
        curl_easy_setopt(curl, CURLOPT_USERAGENT, "curl/7.68.0");  // Set a user-agent string to avoid HTML responses
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback);   // Set the function to handle incoming data
        res = curl_easy_perform(curl);                             // Perform the HTTP GET request
        curl_easy_cleanup(curl);                                   // Clean up and release the CURL handle
    }
}

Code was compiled using CodeBlocks and Clang with default compiler settings. Libcurl was linked to teh compiler. Comments added to code via chatgpt.

alt text

Decompilation

Reversing C binaries is significantly more straightforward than Rust or Nim. C binaries have been standard for decades. This allows decompilers to generate more readable code. Furthermore, due to the nature of C compilation, functions and code are less obfuscated than other, more modern, languages. This ultimately leads to binaries decompiling to relatively readable formats.

Take the function list for the sample binary for example. The function names are more readable and better presented than the Rust binary. Both binaries perform roughly the same actions, but the C binary decompiles to a much more readable list.

alt text

Similarly, the C binary IDA view is simpler and easier to understand. We can clearly see the main entry function defines the “ifconfig.me” URL, as well as the curl libraries used in the sample.

alt text

There are also less imports used in the C code, and the imports are all from libcurl and kernel32.

alt text

Furthermore, when decompiling C binaries to pseudocode, there is a much better one-to-one translation. Compare these decompiled functions to the main and callback functions in the sample source code. It’s almost an exact match.

alt text

alt text

.Net Assemblies

C# binaries use the .Net framework. This framework covers more than C#, like PowerShell for example, but this section will focus on C# binaries.

Code


using System.Net.Http.Headers; // Use HTTP

using HttpClient client = new(); // Create a new client

client.DefaultRequestHeaders.Add("User-Agent", "curl/7.68.0"); // Set our user agent to curl so we don't have to deal with the parsing 

var ip = await client.GetStringAsync("https://ifconfig.me"); // Make the request and collect result into ip

Console.WriteLine($"Printing out the public IP of this device: {ip}"); // print

This code was compiled to one file with Visual Studio.

alt text

Decompilation

.Net framework samples are simple to reverse. This is because of tools like DnSpy and IlSpy. In this example we will use ILSpy (I like it a little better). However, either tool works great for decompilation.

When adding the binary to ILSpy, the dependencies, config, and source code should be made available. However, depending on compilation methods, this may not always be the case.

alt text

ILSpy will also parse the PE structure and have collapsible sections with the PE headers and strings.

alt text

Navigating to the program section will reveal the source code of the .Net assembly. This is not exactly a word for word source code decompilation. Instead, it is a simplified version of the source code retrieved during analysis. The code will reveal all functions of the binary.

alt text

Every unique component of the .Net binary can be viewed in ILSpy. Furthermore, the source code can be edited and recompiled. This can be useful for re-writing functions during analysis. Often, there will be obfuscation functions that hide the activity of the code. Using ILSpy, you can rewrite these functions to print the deobfuscated code segments instead of executing them. This can be helpful for determining what the obfuscated functions are.

Compiled Visual Basic

Visual Basic comes in 3 main flavors, Visual Basic Script (VBS), Visual Basic 6 (VB6), and Visual Basic .Net (VB). This all vary, sometimes significantly, and each requires a different approach. VBS cannot be natively compiled, and can easily be reveresed using code analysis techniques. VB lives on the .Net framework, and tools like ILSpy will decompile them like any other .Net binary. VB6 is a little different, and tools like VB Decompiler must be used to decompile samples.

In this section I’ll show examples of the same code in all three languages as well as some analysis pointers.

Code

'Visual Basic Script
Dim request
Set request = CreateObject("MSXML2.XMLHTTP")
request.open "GET", "https://ifconfig.me", False
request.setRequestHeader "User-Agent", "curl/7.68.0"
request.send

Dim fso, stdout
Set fso = CreateObject("Scripting.FileSystemObject")
Set stdout = fso.GetStandardStream(1)
stdout.WriteLine "Printing out the public IP of this device: "+request.responseText
'Visual Basic .Net
Imports System.Net

Module Module1
    Sub Main()
        Try
            Dim url As String = "https://ifconfig.me"
            Dim request As HttpWebRequest = CType(WebRequest.Create(url), HttpWebRequest)
            request.Method = "GET"
            request.UserAgent = "curl/7.68.0"

            Using response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
                Using reader As New IO.StreamReader(response.GetResponseStream())
                    Dim responseText As String = reader.ReadToEnd()
                    Console.WriteLine("Printing out the public IP of this device: " & responseText)
                End Using
            End Using
        Catch ex As Exception
            Console.WriteLine("An error occurred: " & ex.Message)
        End Try
    End Sub
End Module
'Visual Basic 6 - I don't have a license for the IDE and I'm not downloading sketchy software off the internet. 
'Just take my word that this is kind of what it should look like. 
'Also, yes, I have seen VB6 samples in the wild. 

Private Sub Command1_Click()
    Dim http As Object
    Set http = CreateObject("MSXML2.XMLHTTP")
    
    On Error GoTo ErrHandler
    
    http.Open "GET", "https://ifconfig.me", False
    http.setRequestHeader "User-Agent", "curl/7.68.0"
    http.Send

    MsgBox "Public IP: " & http.responseText, vbInformation, "IP Info"
    
    Exit Sub

ErrHandler:
    MsgBox "Error: " & Err.Description, vbCritical, "Request Failed"
End Sub

Decompilation

VB .Net binaries can easily be reversed following the .Net Assembly methods.

alt text

In the rare case that you come across a VB6 sample, you can use VB Decompiler Lite to decompile the VB6 code to assembly instructions.

Note: I do not possess an enterprise VB6 license, so I’m demonstrating off of an enterprise tool sample I analyzed a while back.

alt text

Using VB Decompiler Lite, you can also extract forms built into the application.

alt text

Compiled PowerShell

This is a basic writeup on a compiled PowerShell reverse shell. Don’t judge me too much, I was a student when I wrote this.

Link to the PDF - The Iframe was annoying me

Further Research and Learning Materials

At some point, I’d like to come back and fill out more sections dedicated to reverse engineering binaries (both malicious and non-malicious). Until that point, I highly recommend the following resources (mostly malware analysis still, but some are dedicated reverse engineering):