11-15-2015, 02:38 AM
At first glance, creating a QuickBMS script seems very difficult. However, creating them is not very difficult! Throughout this tutorial, we'll go through a sample file, figure out the format and write a QuickBMS script in order to extract the contents.
Before we start, let's get prepared:
Download a Hex Editor, and get the sample file that we'll be working on in this tutorial. My personal preference is HxD Editor, it's free and very lightweight, though anyone will do
Open up your hex editor and open the kyp file.
So first thing is to figure out the endianness. The endianness is the way bytes are read from the file. Take the value 0x0c80 for example. In Little Endian, this value would be read as 0x800c, so basically the value is read backwards. Big Endian is 0x0c80, so the value stays the same. So in order to figure out the endianness, there are a few ways to figure this out.
So using one the methods above, we can come to the conclusion that this is a Big Endian format. However we need to set this in our script. QuickBMS defaults to Little Endian so we need to set our script to run in Little.
So what we do is at the top of the script is write out the function, "endian big". This sets our script to run in Big Endianness.
Next, we need to read through the header to get the information we need. A header is a group of information on a file. A header can contain (but not definitely) a file size, a file count (if we're working with a archive) and a magic id. Check out the glossary at the end for details.
The first thing we can see is a string of letters writing out "KYP". This is our magic header, so we want QuickBMS to check if our file is really a .kyp archive. So using the command, "idstring "KYP\x00" ", we can check if our file is a KYP file. "But Struggleeeeeeeeee, why is there a \x00 at the end of our idstring? I thought our magic id was KYP!" Well yes, and no. Our magic id is really "KYP00" However, the 00's are read as a null (or empty value) but are part of our magic id. So tl;dr, our next command to write is "idstring "KYP\x00".
So next is a value that if you read above is our file size. We know this because it's equal to the file's file size in hex.
So this is a 4 byte value, or a long. We read this from the file by writing out this command: "get archiveSize long". 'get' is a command/function in QuickBMS that allows us to read from the value using the different types of variables (Check the glossary if you want to know the variable types.) and read it into our script. So 'get' is the command, 'archiveSize' is the variable that'll hold our value (a bucket if you'd in a sense), and 'long' is the variable type we're reading.
Next value is a bit (heheh) harder to figure out what it is. It looks like a long value (0x00000003), though we don't know what this does. We'll return to this later. However we'll put something there so we can advance our position in the file.
After the unknown value, it looks like this is just a bunch of zeros, so we can skip this. It's a 4 byte null value So we write this in our script: "get null long"
Next few values look very odd and confusing, however we'll walk through it together
So it looks like there are 3 values, each 10 bytes long. Gasp! That's the same amount as our unknown value! So now we know what it is. But what does it represent? Let's read on.
So the first four bytes, looks like a pointer. A pointer is a value that "points" to an offset. So our 4 bytes are an pointer to a file in our archive.
Next we go to our next four bytes. However, once again, we don't know what this is. Let's go to our pointer to see if we can figure this out. So our pointer is 0x40, and we navigate to that offset. This is the start of our file, which has the magic id, 'bres", which is the magic id for .BRRES files. So if we look in the header of the file, we can see a value that is the same as the unknown value. So we can assume that it's a file size.
Next, this next value looks like a line of characters. This looks like the name of something. Perhaps, this is a file name. So the header we've been reading, is for a file. And we can go back to our unknown value and replace the unknown value name with fileCount.
So we have the 3 headers for our files. We can write a loop (or a function) to get the offset of the file, the size of it, and the name. For loops makes it very simple for us to get them. For loops work like this:
So what does this all mean? Well let's look at it. First is the i = 0. i is a new variable, set to be 0. fileCount is the value we figured out earlier. code here is exactly what it means: "we put our code there". next i is very important. next i adds one to our value which equals 0.
So what can we do with this loop? Well we can loop through all of these headers, get the offset, size and filename, which is all that's needed to extract the file. So if you read earlier, we know that those file headers work like this:
Our loop will look like this:
Let's explain this. get pointer long is getting four bytes and storing it as the variable 'pointer'. 'Same with length'. getdstring is a function that gets a string from a file, and reads a set amount of characters. Each file name is a set length of 0x8 or 8 bytes. log is a function that takes an offset of a file, an output name and a length of the file, and writes a new file.
Now we bring it all together!
If we run this in QuickBMS, this will extract the files from the archive and write them to a file on the hard drive. That's the end of the tutorial! If you have any questions or need help, please ask! Or if you have concerns please post! Thanks for reading!
Before we start, let's get prepared:
Download a Hex Editor, and get the sample file that we'll be working on in this tutorial. My personal preference is HxD Editor, it's free and very lightweight, though anyone will do
Open up your hex editor and open the kyp file.
So first thing is to figure out the endianness. The endianness is the way bytes are read from the file. Take the value 0x0c80 for example. In Little Endian, this value would be read as 0x800c, so basically the value is read backwards. Big Endian is 0x0c80, so the value stays the same. So in order to figure out the endianness, there are a few ways to figure this out.
- Look at the file and look for some definite values and compare.
This is the method I use in order to figure out the endianness. Say I find a value that looks like a size. I can compare it to the file's actual size and see if it's correct. Or if I find a pointer to an offset, I can compare it to an actual value. If the value is reversed, then it's a Little Endian file. If it's a complete match, it's a Big Endian file.
- Guess if it's Little or Big.
I really don't recommend this, as this can lead to mistakes. I'd recommend just using the first method.
So using one the methods above, we can come to the conclusion that this is a Big Endian format. However we need to set this in our script. QuickBMS defaults to Little Endian so we need to set our script to run in Little.
So what we do is at the top of the script is write out the function, "endian big". This sets our script to run in Big Endianness.
Code:
endian big
The first thing we can see is a string of letters writing out "KYP". This is our magic header, so we want QuickBMS to check if our file is really a .kyp archive. So using the command, "idstring "KYP\x00" ", we can check if our file is a KYP file. "But Struggleeeeeeeeee, why is there a \x00 at the end of our idstring? I thought our magic id was KYP!" Well yes, and no. Our magic id is really "KYP00" However, the 00's are read as a null (or empty value) but are part of our magic id. So tl;dr, our next command to write is "idstring "KYP\x00".
Code:
endian big
idstring "KYP\x00"
So this is a 4 byte value, or a long. We read this from the file by writing out this command: "get archiveSize long". 'get' is a command/function in QuickBMS that allows us to read from the value using the different types of variables (Check the glossary if you want to know the variable types.) and read it into our script. So 'get' is the command, 'archiveSize' is the variable that'll hold our value (a bucket if you'd in a sense), and 'long' is the variable type we're reading.
Code:
endian big
idstring "KYP\x00"
get archiveSize long
Code:
endian big
idstring "KYP\x00"
get archiveSize long
get unknownValue long
Code:
endian big
idstring "KYP\x00"
get archiveSize long
get unknownValue long
get null long
So it looks like there are 3 values, each 10 bytes long. Gasp! That's the same amount as our unknown value! So now we know what it is. But what does it represent? Let's read on.
So the first four bytes, looks like a pointer. A pointer is a value that "points" to an offset. So our 4 bytes are an pointer to a file in our archive.
Next we go to our next four bytes. However, once again, we don't know what this is. Let's go to our pointer to see if we can figure this out. So our pointer is 0x40, and we navigate to that offset. This is the start of our file, which has the magic id, 'bres", which is the magic id for .BRRES files. So if we look in the header of the file, we can see a value that is the same as the unknown value. So we can assume that it's a file size.
Next, this next value looks like a line of characters. This looks like the name of something. Perhaps, this is a file name. So the header we've been reading, is for a file. And we can go back to our unknown value and replace the unknown value name with fileCount.
Code:
endian big
idstring "KYP\x00"
get archiveSize long
get fileCount long
get null long
Code:
for i = 0 < fileCount
code here
next i
So what can we do with this loop? Well we can loop through all of these headers, get the offset, size and filename, which is all that's needed to extract the file. So if you read earlier, we know that those file headers work like this:
Code:
long PointerToFile -> long sizeOfFile -> string fileName
Code:
for i = 0 < fileCount
get pointer long
get length long
getdstring name 0x8
log name pointer length
next i
Now we bring it all together!
Code:
endian big
idstring "KYP\x00"
get archiveSize long
get fileCount long
get null long
for i = 0 < fileCount
get pointer long
get length long
getdstring name 0x8
log name pointer length
next i