Getting Started with Malware Analysis and Reverse
Getting Started with Malware Analysis and Reverse Engineering
Contents
- Malware Analysis Workflow
- Threat Intelligence
- Static Analysis
- Dynamic Analysis
- File Structures
- Further Research and Learning Materials
Malware Analysis Workflow
+-------------------------------------+
| Malware Analysis Workflow |
+-------------------------------------+
|
v
+-----------------------------+
| Threat Intelligence |
+-----------------------------+
| - Hash Lookups (VT, HA) |
| - IP/Domain Reputation |
| (AbuseIPDB, ThreatFox) |
| - Sample Pivoting (Any.run) |
| - YARA/IoC Extraction |
+-----------------------------+
|
v
+----------------------------+
| Static Analysis |
+----------------------------+
| - File Analysis | <-- Tools: PEStudio, DIE, HashMyFiles
| - Code Analysis |
| - Disassembly | <-- Tools: Ghidra, IDA Free
+----------------------------+
|
v
+----------------------------+
| Dynamic Analysis |
+----------------------------+
| - Network Monitoring | <-- Tools: Wireshark, INetSim
| - Process Monitoring | <-- Tools: ProcMon, ProcExp
| - API Call Monitoring | <-- Tools: API Monitor
| - Logging | <-- Tools: Sysmon, PowerShell logs
+----------------------------+
|
v
+-------------------------------------+
| Debugging & Reverse Engineering |
+-------------------------------------+
| - IDA | <-- Tools: IDA Pro, Ghidra
| - x64dbg | <-- Tools: x64dbg, Scylla
+-------------------------------------+
Threat Intelligence
Threat intelligence gathering is an important step in scoping the work that needs to be done during the analysis. Often, a sample has defining traits that can be serached for in public threat intel spaces like VirusTotal, Tria.ge, or any other sandboxing/intel platform. By searching for these traits, like hashes, filenames, networking indicators, etc., you can determine if a sample needs to be reversed, categorized as a variant, or simply documented as a known sample.
Hashing
Hashing is a common way to determine if a sample has been seen in other environments before. Getting a file hash for a sample allows you to search VirusTotal or other publi platforms for occurences of the sample file. However, variants (samples that are the same family of malware, but slightly different) will not match hashes, as they are unique values to individual files.
Get a file hash:
# PowerShell
Get-FileHash -Algorithm SHA256 file.name
# Bash
sha256sum file.name
YARA
YARA rules are more in-depth file behavior rules that can be run against any file. If the file behavior matches the defined YARA rule, the file will trigger an alert, or match. In many cases, YARA rules exist for known malware. Running a sample against a list of well-defined YARA rules can provide quick insight into what family the malware belongs to.
#Example YARA rule
import "hash"
rule yara_rule
{
strings:
$variablename = "string_value"
condition:
$variablename or
hash.sha256(0, filesize) == "sha256_hash"
}
Resources
Sandboxing and threat intel platforms:
Manual threat tracking and attack flows
Static Analysis
Static analysis involves examining a file or binary without executing it. This allows analysts to safely inspect files without risk of executing malicious code. Often, static analysis and threat intelligence are enough to determine the impact of a malicious file.
General static analysis steps include:
- File Analysis
- Code Analysis
- Disassembly
File Analysis
File Analysis occurs when the file behavior is unknown. Before attempting to execute a file or sample, static file analysis methods should be employed to determine the behavior of the file. In most cases, this means employing string listing, examining magic numbers/headers, and looking at imports and exports. It is generally a good idea to understand what a file might do before executing it.
Strings
Strings.exe is a program that will print all continuous UTF-8 strings present in a file’s raw data. There are multiple iterations of strings for various architectures, operating systems, and language preferences. I am particular to the Get-Strings
function of Invoke-FileAnalysis.
Consider the following hex dump:
4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 B8 00 00 00 00 00 00
00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 B8 00 00 00 0E 1F BA 0E 00
B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 72 6F 67 72 61 6D 20 63
61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 6D 6F 64
65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 C7 BF 79 DA 83 DE 17 89 83 DE
17 89 83 DE 17 89 00 C2 19 89 82 DE 17 89 CC FC 1E 89 87 DE 17 89 B5
F8 1A 89 82 DE 17 89 52 69 63 68 83 DE 17 89 00 00 00 00 00 00 00 00
50 45 00 00 4C 01 03 00 16 6A 88 53 00 00 00 00 00 00 00 00 E0 00 0F
01 0B 01 06 00 00 F0 00 00 00 30 00 00 00 00 00 00 E0 16 00 00 00 10
00 00 00 00 01 00 00 00 40 00
While this may look like a random assortment of data, parsing it through strings reveals it is a windows binary.
This program cannot be run in DOS mode
Rich
PE
Headers
Headers, or magic numbers, are the first series of bytes in a file. These bytes determine the type of file. Often, file extensions are misleading. A file can be renamed with any extension, regardless of what the data actually represents. By examining magic numbers, you can get a better sense of what type of file a sample is.
For instance, the following header signifies that a sample is a Windows PE file.
PE Binary Magic Number
4D 5A 90 00
All files beginning with that series of bytes should conform to the PE file structure.
For a more complete list of magic numbers, see List of File Signatures.
Imports/Exports
Binary files use imports when accessing operating system components. In most cases, this is segments of Windows API files and capabilities. For PE files, imports are located in the IAT (Import Address Table - see PE Files for more details on PE structure)
Exports are functions that originate from within the binary, but are exported for use outside of the binary scope. In most situations that do not involve DLL or similar files, there will only be a start
export that indicates to the operating system where the start of the code is.
Tools
- PEStudio
- Invoke-FileAnalysis
- PEiD
- IDA
- GHIDRA
- GDB
- Hopper
- VB Decompiler
- HxD Hex Editor (cool people use vim)
Code Analysis
In some cases, raw code files like JavaScript, Python, or PowerShell, are used to carry out initial access. In these cases, the code can be directly analyzed. In these cases, it is beneficial to be familiar with certain common functions and methods employed by malicious software.
Common Functions
Instead of listing all the various types of functions, techniques, and tools for each language, it’s better to go with a more hands on approach. This section lists some resources for exploring these common functions in various languages.
General samples and code:
- https://github.com/ytisf/theZoo
- https://vx-underground.org/
- https://bazaar.abuse.ch
- https://github.com/chvancooten/maldev-for-dummies
- https://github.com/Cryakl/Ultimate-RAT-Collection
- GitHub topics are also good, like https://github.com/topics/python-malware.
For PowerShell based malware:
For Rust based malware:
- https://github.com/cxiao/rust-malware-gallery
- https://github.com/Whitecat18/Rust-for-Malware-Development
For Python based malware:
For C based malware:
Tools
Visual Studio Code, or any code editor, are great for viewing text encoded malware files. For compiled binaries, using a decompilation view or raw assembly view can also offer great insight into the code function. Of course, compiled binaries will not decompile to the code used by the threat actor. Instead, it will decompile to a representation of the code, typically using C-like structures (if disassembled). Some tools are better than others, and some languages produce more legible results than others. For more information on this, see File Analysis and File Structures.
Dynamic Analysis
Inverse to static analysis, dynamic analysis involves executing the sample while monitoring the behavior of the sample on a system. This is typically the ‘fun’ part of analysis.
When executing potentially malicious files, ensure that a snapshot is taken prior to the detonation. This allows for easier system restoration when attempting to roll back actions taken by the malware.
Network Monitoring
Network monitoring is an essential component to dynamic analysis. At it’s core, network monitoring serves to capture all network packets exchanged between the malware sample, infected host, and threat actor controlled infrastructure. In more in-depth analysis, it is also possible to simulate endpoints and traffic, allowing for the simluation of certain actions. For example, a more complex sample may require certain data to be returned from a remote server in order to execute. Using advanced simulation tools, it is possible to simulate this server and return the data needed for the sample to progress.
WireShark
WireShark is used to capture network packets. For non-encrypted packets, or for samples that roll their own local encryption, but do not use SSL for communications, WireShark can also display the raw data in eeach packet.
For information on utilizing advanced features of WireShark, please see Chris Greer.
INetSim
INetSim is a network simulation tool that can be used to simulate adversarial infrastructure.
FrausDNS
FrausDNS is a Windows DNS spoofer that can capture DNS requests made by any process. These requests can be isolated and used for analysis purposes.
PolarProxy
PolarProxy is a TLS and SSL inspection tool that decrypts and re-encrypts traffic, saving a copy of the decrypted traffic to a PCAP file. It is very useful for examining the contents of encrypted traffic, such as in Command and Control samples.
Process Monitoring
Process monitoring involves logging process activity like API calls, file and network events, and registry actions. Process monitoring tools like ProcMon run in the background while a sample detonates, capturing all system process data. This data can then be filtered and examained to further understand the impact that a sample has on a system.
ProcessMonitor
Process Monitor is a part of the Sysinternals tool suite. I highly recommend checking out all of the tools, but for this document, we will focus solely on ProcMon and Sysmon (see sysmon for more information).
Process monitor is a very verbose tool, as it tracks all interactions a process has with the system. To aid in analysis efforts, robust filtering options can be found in the filter panel.
Additionally, right clicking the column headers will allow for column customization. This can help focus the results on meaningful details.
Logging
Taking advantage of native logging capabilities can be a powerful way to monitor malware samples post execution. For PowerShell based malware, PowerShell event logs can be configured to track all module execution and events. This can reveal exactly what code is running, when it’s running, and what it is doing. For binary execution, additional logging can be configured with Sysmon to track further, more in-depth, events.
PowerShell Logs
Please read The problem with PowerShell logging bypasses. This article covers some basic logging configurations and logging bypasses.
Sysmon
Sysmon is a powerful logging tool that provides detailed information about processes, network events, and file changes. It enhances the native event logging and provides much needed security depth. Sysmon logs are viewable in the event viewer, and are great for correlating malware activity.
The complete list of Sysmon event IDs can be found at the Sysmon MSDN page.
General Event Logs
Other log sources, aside from PowerShell and Sysmon, are also useful tools for malware analysis. Sepcifically, the native security, application, system, and application/service specific logs are useful for analyzing the events that occur during malware execution.
API Calls
Monitoring API calls directly with tools like API Monitor is one of the best ways to get a deep understanding of what a sample is doing.
The Windows kernel exposes certain APIs for usermode applications to interact with. It is these API calls made from the usermode application that allow binaries to execute operations like creating files, making network connections, and performing memory operations. By monitoring the API calls made by a process or application, it is possible to directly determine what events are occurring, how they are occurring, and what the result of each operation is.
For more information on Windows API and kernel security, go read Windows Security Internals.
API Monitor
API Monitor is a powerful tool for hooking into applications and process threads to monitor API calls.
In API Monitor, you can select any running process to monitor. You can also launch binaries with API Monitor, to start the monitor on process launch. In the API filter pane, you can select the API modules you would like to monitor for. In most cases, you likely want to select all modules.
A monitored process will display all API modules accessed, as well as all threads spawned by the process. You can scroll through the summary pane to see individual calls in a timeline. Double-clicking an entry in the summary will direct you to the documentation for that call (sometimes - it often doesn’t work).
File Structures
Understanding binary file structures is important when attempting to reverse engineer or analyze a sample. This section will cover basic file structures and provide tooling for more detailed analysis.
All examples will be using custom binaries designed to request the user’s IP address from ifconfig.me
. These examples should demonstrate finding imports and exports, examning code structures, and decompiling in both assembly view and IDA view. Additionally, resources will be linked where available to provide further context for reverse engineering specific binary types.
Afer decompiling and staticly analyzing a binary, you should have a general understanding of what API calls the binary is making and what functions are executing. At this point, you would typically move on to dynamic analysis using API Monitor, debuggers, and other dynamic analysis processes. In some cases, you may want to dive deeper into reversing the code statically. There are good tools for this, like SourceTrail, which you can learn more about in this talk RE//verse 2025: Streamlining Firmware Analysis with Inter-Image Call Graphs and Decomp (Robin David). This section is strictly for high-level overviews.
PE Files
Please read https://alertoverload.com/posts/2024/07/pe-files-and-how-to-create-a-powershell-pe-file-parser/.
NIM Binaries
Seriously, watch this talk Reverse-Engineering Nim Malware.
- I had some good stuff for this, but I need to find it again.
Rust Binaries
Reverse engineering Rust binaries is more complex of a process compared to C or Nim. Due to the nature of Rust’s compiler, the core functions of the binary are more abstracted compared to the straightforward nature of C. Additionally, decompilers have been working with C language binaries for far longer than Rust. This means that decompilers are significantly better at analyzing C binaries than they are at analyzing Rust binaries.
Note: I didn’t remember that User-Agent strings were a thing when I wrote this, so there’s extra parsing going on in this code compared to the other samples. It doesn’t affect the decompilation process though.
Code
use curl::easy::Easy; // Import the easy curl crate
// main entry point
fn main() {
let url = "ifconfig.me"; // define the url
let mut e = Easy::new(); // create an Easy object
e.url(&url).unwrap(); // execute request on url
let mut data = Vec::new(); // store data as UTF8 bytes object
// Write response to data from slice
{
let mut t = e.transfer();
t.write_function(|new_data| {
data.extend_from_slice(new_data);
Ok(new_data.len())
}).unwrap();
t.perform().unwrap();
}
let response = String::from_utf8(data).expect("Data retrieval from curl failed."); // Convert UTF8 bytes to string
let split_vec: Vec<&str> = response.split("ip_addr:").collect(); // Split the string
let public_ip = split_vec[1].split("<br>").next().unwrap(); // Split the split string
println!("Printing out the public IP of this device: {}",public_ip); // print the IP address
return
}
This code was compiled with: cargo build --release
Decompilation
Loading this binary in IDA reveals very little at first. The main entry point does not contain a significant amount of information. By following the call
instructions, we can follow the chain of functions the program is accessing. This can be a good start for getting a general understanding of what a binary is doing.
Additionally, we can also look at the imports window. This view will show the imported functions from the WinAPI that are used by the binary. In the case of this binary, several imports stick out.
They are:
- WS2_32 networking imports
- kernel32 console write and allocation imports
- CRYPT32 certificate imports
These imports indicate that this code uses a network connection to do something, potentially printing the results to the console. As we know, the binary is using a network connection to get and display data, so this checks out.
Double-clicking an import will take you to the definition in idata. Right clicking the idata entry will allow you to list all cross-references (xrefs) to the import.
Examining xrefs is a great way to quickly identify important functions and segments of code.
Additionally, you can open other subviews, like strings, by clicking the View menu item and navigating to Subviews.
You can often determine what language a binary was written in based on the remnant strings. In this instance, there are numerous mentions of cargo and Rust functions. Double-clicking a string value will take you to the position in the code the string is being read from.
Here, we can see the static string we used for the URL.
Listing xrefs for this value reveals the location of the static definition in IDA view.
Similarly, if we follow the xrefs for the WriteConsoleW API call, we can find the function that is printing our message to the screen.
We can also list xrefs while in IDA view. Simply select the function name sub_XXXXXXXX
and select xrefs. This will show all addresses where the function is being called.
Pressing tab in any IDA view pane will open the psuedocode viewer for that function. This decompiles the assembly into a pseudocode C-styled view. This can be useful for translating obscure assembly instructions into something more readable. However, IDA isn’t super great at this, and GHIDRA is probably a better tool for decompilation like this.
C Binaries
Code
#include <stdio.h> // For printf, snprintf
#include <stdlib.h> // For malloc, free, etc.
#include "curl/curl.h" // For libcurl functions
#include <string.h> // For string functions like strlen
size_t callback(void *buf, size_t size, size_t count, void *data){ // libcurl write callback
size_t total_size = size * count; // Calculate the total number of bytes received
char ip[256]; // Buffer to hold the formatted output
snprintf(ip, sizeof(ip), "Printing out the public IP of this device: %s", buf); // Format the output string with the IP
ip[strlen(ip)-1] = '\0'; // Strip the newline character from the end
printf("%s\n", ip); // Print the final message to stdout
//fwrite(buf, size, count, stdout); // (Optional) Directly write raw buffer to stdout
return total_size; // Return number of bytes handled
}
int main(void)
{
CURLcode res; // Variable to store CURL result status
curl_global_init(CURL_GLOBAL_DEFAULT); // Initialize the libcurl environment
CURL *curl = curl_easy_init(); // Create and initialize a CURL handle
if(curl) { // Proceed only if the handle is valid
CURLcode res; // Declare again (technically unnecessary — already declared)
curl_easy_setopt(curl, CURLOPT_URL, "ifconfig.me"); // Set the URL to fetch (returns public IP)
curl_easy_setopt(curl, CURLOPT_USERAGENT, "curl/7.68.0"); // Set a user-agent string to avoid HTML responses
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, callback); // Set the function to handle incoming data
res = curl_easy_perform(curl); // Perform the HTTP GET request
curl_easy_cleanup(curl); // Clean up and release the CURL handle
}
}
Code was compiled using CodeBlocks and Clang with default compiler settings. Libcurl was linked to teh compiler. Comments added to code via chatgpt.
Decompilation
Reversing C binaries is significantly more straightforward than Rust or Nim. C binaries have been standard for decades. This allows decompilers to generate more readable code. Furthermore, due to the nature of C compilation, functions and code are less obfuscated than other, more modern, languages. This ultimately leads to binaries decompiling to relatively readable formats.
Take the function list for the sample binary for example. The function names are more readable and better presented than the Rust binary. Both binaries perform roughly the same actions, but the C binary decompiles to a much more readable list.
Similarly, the C binary IDA view is simpler and easier to understand. We can clearly see the main entry function defines the “ifconfig.me” URL, as well as the curl libraries used in the sample.
There are also less imports used in the C code, and the imports are all from libcurl and kernel32.
Furthermore, when decompiling C binaries to pseudocode, there is a much better one-to-one translation. Compare these decompiled functions to the main and callback functions in the sample source code. It’s almost an exact match.
.Net Assemblies
C# binaries use the .Net framework. This framework covers more than C#, like PowerShell for example, but this section will focus on C# binaries.
Code
using System.Net.Http.Headers; // Use HTTP
using HttpClient client = new(); // Create a new client
client.DefaultRequestHeaders.Add("User-Agent", "curl/7.68.0"); // Set our user agent to curl so we don't have to deal with the parsing
var ip = await client.GetStringAsync("https://ifconfig.me"); // Make the request and collect result into ip
Console.WriteLine($"Printing out the public IP of this device: {ip}"); // print
This code was compiled to one file with Visual Studio.
Decompilation
.Net framework samples are simple to reverse. This is because of tools like DnSpy and IlSpy. In this example we will use ILSpy (I like it a little better). However, either tool works great for decompilation.
When adding the binary to ILSpy, the dependencies, config, and source code should be made available. However, depending on compilation methods, this may not always be the case.
ILSpy will also parse the PE structure and have collapsible sections with the PE headers and strings.
Navigating to the program section will reveal the source code of the .Net assembly. This is not exactly a word for word source code decompilation. Instead, it is a simplified version of the source code retrieved during analysis. The code will reveal all functions of the binary.
Every unique component of the .Net binary can be viewed in ILSpy. Furthermore, the source code can be edited and recompiled. This can be useful for re-writing functions during analysis. Often, there will be obfuscation functions that hide the activity of the code. Using ILSpy, you can rewrite these functions to print the deobfuscated code segments instead of executing them. This can be helpful for determining what the obfuscated functions are.
Compiled Visual Basic
Visual Basic comes in 3 main flavors, Visual Basic Script (VBS), Visual Basic 6 (VB6), and Visual Basic .Net (VB). This all vary, sometimes significantly, and each requires a different approach. VBS cannot be natively compiled, and can easily be reveresed using code analysis techniques. VB lives on the .Net framework, and tools like ILSpy will decompile them like any other .Net binary. VB6 is a little different, and tools like VB Decompiler must be used to decompile samples.
In this section I’ll show examples of the same code in all three languages as well as some analysis pointers.
Code
'Visual Basic Script
Dim request
Set request = CreateObject("MSXML2.XMLHTTP")
request.open "GET", "https://ifconfig.me", False
request.setRequestHeader "User-Agent", "curl/7.68.0"
request.send
Dim fso, stdout
Set fso = CreateObject("Scripting.FileSystemObject")
Set stdout = fso.GetStandardStream(1)
stdout.WriteLine "Printing out the public IP of this device: "+request.responseText
'Visual Basic .Net
Imports System.Net
Module Module1
Sub Main()
Try
Dim url As String = "https://ifconfig.me"
Dim request As HttpWebRequest = CType(WebRequest.Create(url), HttpWebRequest)
request.Method = "GET"
request.UserAgent = "curl/7.68.0"
Using response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
Using reader As New IO.StreamReader(response.GetResponseStream())
Dim responseText As String = reader.ReadToEnd()
Console.WriteLine("Printing out the public IP of this device: " & responseText)
End Using
End Using
Catch ex As Exception
Console.WriteLine("An error occurred: " & ex.Message)
End Try
End Sub
End Module
'Visual Basic 6 - I don't have a license for the IDE and I'm not downloading sketchy software off the internet.
'Just take my word that this is kind of what it should look like.
'Also, yes, I have seen VB6 samples in the wild.
Private Sub Command1_Click()
Dim http As Object
Set http = CreateObject("MSXML2.XMLHTTP")
On Error GoTo ErrHandler
http.Open "GET", "https://ifconfig.me", False
http.setRequestHeader "User-Agent", "curl/7.68.0"
http.Send
MsgBox "Public IP: " & http.responseText, vbInformation, "IP Info"
Exit Sub
ErrHandler:
MsgBox "Error: " & Err.Description, vbCritical, "Request Failed"
End Sub
Decompilation
VB .Net binaries can easily be reversed following the .Net Assembly methods.
In the rare case that you come across a VB6 sample, you can use VB Decompiler Lite to decompile the VB6 code to assembly instructions.
Note: I do not possess an enterprise VB6 license, so I’m demonstrating off of an enterprise tool sample I analyzed a while back.
Using VB Decompiler Lite, you can also extract forms built into the application.
Compiled PowerShell
This is a basic writeup on a compiled PowerShell reverse shell. Don’t judge me too much, I was a student when I wrote this.
Link to the PDF - The Iframe was annoying me
Further Research and Learning Materials
At some point, I’d like to come back and fill out more sections dedicated to reverse engineering binaries (both malicious and non-malicious). Until that point, I highly recommend the following resources (mostly malware analysis still, but some are dedicated reverse engineering):
- This blog! I have numerous write-ups, both casual and professional, that cover a variety of different malware families.
- https://www.youtube.com/@lauriewired
- An Introduction to Malware Analysis
- Reverse Engineering Malware with Ghidra
- https://www.youtube.com/@REverseConf
- https://www.youtube.com/@reconmtl
- https://www.youtube.com/@PwnFunction
- FLARE On
- Phrack
- tmp.0ut
- squiblydoo.blog
- VX-Underground Papers
- Many, many, other sites and blogs. I’ll try and update with more as I remember them. Maybe even a dedicated page for links at some point…