11-13-2018, 04:46 AM
Hi! I'm doing RE work on Luigi's Mansion 2: Dark Moon. This post, as well as a little C# program I created, is dedicated to documenting file formats of this game.
1. data and dict files
Data contains actual data and dict files are like archive headers and contain some basic info about the files.
The basic structure of dict is this(put it in a spoiler because it's too long and messy):
Data structure is very simple. It just contains all the files "glued" together with padding to closest 8-byte boundary every time a file starts.
Dicts have some sort of mystery in that they seem to have dummy and null entries for many files. Even though we can kinda extract them, we should figure out why they exist in the first place.
2. Files inside data/dict archives
(Disclaimer: the subfiles have no file names whatsoever, so the names are used are the same as in my code)
file000 contains information about subfiles in file003, containing offset, length and type This data is usually preceded with 0x01130002 with const length of 0x18, purpose of these is unknown.
file002 probably contains meta info on files stored in file003. For now only purpose of PÓwé(or Powe as I'll be referring to these, because I don't have a way to input diacritic marks on my keyboard) sections is known, they contain info about textures(like resolution, format, mip levels)
The structure of Powe is like this:
Here are a few textures I've dumped with my tool.
3. Models
Models are stored in file003. Stuff can be organised into so called "model groups"(basically models which share the same vertex data, submesh info and other stuff). Each model group contains vertex data section (contains vertex stuff and vertex indices), submesh vertex start pointers (relative to vtx data start), submesh info (has stuff like index count/offset, vertex count, data format, etc.) and some other stuff(bones probably, but I haven't figured those out yet).
About vertex data formats: there are a whole bunch of them...and I really mean A LOT(just look at my code)...
A lot of them are kinda similar. Like most of character models use ShortFloat(IDK how this is properly called) for vertex encoding, while levels use float32. I sadly can't figure out much more than vertex coordinates (can't even find UVs in some cases), so if anyone is willing to and can actually help, please do.
I was able to write a working exporter to wavefront obj for some of the formats, here are a few models I've ripped:
Screenshot of Luigi model just as an example:
(Wegee looks a bit derpy like this Xd)
In the attached archive you can find this model and E. Gadd's in obj (and Luigi in collada, because wavefront doesn't support multiple textures per material). I didn't bother adding other stuff I've tried to rip, mainly because UVs are all over the place and a lot of stuff doesn't work properly.
[attachment=9739]
4. NLOC
NLOC - Next-level LOCalization(?) files contain text strings. You can extract these with RoadrunnerWMC's python script(look for it in his gihub gist)
5. NMLB
NMLB - Binary Luigi's Mansion N-something (reversed because little endian). IDK what it is, but I just simply came across this signature while browsing some(probably cutscene) of the files in the hex editor
6. FENL
FENL - IDK what this stands for. Looks like layout data based on where it's found
7. Audio
The audio is stored in wwiseaudio folder. You might thing that wwise-unpacker might extract these, but it fails. The reason is that despite there files being the standard .bnk and .pck variety. The reason wwise-unpacker fails is that these contain audio compressed with the same compression as BCSTM files (DSP ADPCM in case of LM2).
This is a brief writeup on what I've found. If you want to go a bit more in-depth, then go look at the code of LM2L, a tool I made for extracting files from this game. You can find the source code on GitHub. (To anyone wondering, the OGL viewer is probably not gonna be a working thing because I know nothing about OpenGL and how to program it)
P.S. Bonus: in the code.bin there is a string referring to "DS Horror". Seems like they didn't have ideas on how to name the thing or just used this name as a placeholder until they came up with "DualScream"
1. data and dict files
Data contains actual data and dict files are like archive headers and contain some basic info about the files.
The basic structure of dict is this(put it in a spoiler because it's too long and messy):
Code:
class dict
{
const uint magic = 0xA9F32458; //0x0, seems like file signature(though knowing this game this is probably just another constant or something), always "X$ó©" in ASCII(maybe be different depending on your system codepage, since it uses values outside the "basic latin" portion of ASCII)
const byte unk = 0x4; //0x4
const byte unk2 = 0x1; //0x5
byte isCompressed; //0x6, compression flag, 0x0 - uncompressed, 0x1 - ZLib
const byte unk3 = 0x0; //0x7
uint fileCount; //0x8, file count(this includes file entries which refer to duplicate(like usually the second entry in the file is always a dupe of the first one, refers to the same data location, sometimes with a slightly different length) or empty files)
uint largestCompressedFile; //0xC, seems to be size of the largest file in compressed form, always null if no compression is used
const ushort unk6 = 0x0; //0xE
ushort unk7; //0x10, seems to be similar in many files
//[0x12:0x2B] is const for (probably) all dict files, so I'll just dump the raw bytes here
//02 02 90 25 BD 78 02 03 04 05 00 00 00 00 66 7C F5 04 01 02 03 04 00 00 05 06
byte[fileCount] unk8; //0x2C, some sorta table, length in bytes always matches fileCount
fileTableEntry[fileCount] fileTable; //0x2C+fileCount, contains info about files in the data, see struct below for more info
const string data = ".data\0"; //0x2C+fileCount*0x11, in ASCII
const string debug = ".debug\0"; //0x2C+fileCount*0x11+0x6, in ASCII
//EOF
}
class FileTableEntry
{
uint startOffset; //0x0, offset at which the file starts in the data
uint decompressedLength; //0x4, file length after decompression(present whether the data is compressed or not, so for data without compression this is the file size)
uint compressedLength; //0x8, file length in compressed form(only present if compression is used, other wise equals null)
byte[4] unk; //0xC, probably file attributes
}
Dicts have some sort of mystery in that they seem to have dummy and null entries for many files. Even though we can kinda extract them, we should figure out why they exist in the first place.
2. Files inside data/dict archives
(Disclaimer: the subfiles have no file names whatsoever, so the names are used are the same as in my code)
file000 contains information about subfiles in file003, containing offset, length and type This data is usually preceded with 0x01130002 with const length of 0x18, purpose of these is unknown.
file002 probably contains meta info on files stored in file003. For now only purpose of PÓwé(or Powe as I'll be referring to these, because I don't have a way to input diacritic marks on my keyboard) sections is known, they contain info about textures(like resolution, format, mip levels)
The structure of Powe is like this:
Code:
const size = 0x38; //Not actually in the data, just a constant
uint magic; //0x0, Always PÓwé(unless your windows codepage is different, then you can expect many other variations of this)
uint id; //0x4, texture id, unique for each texture in most cases(though some identical textures in different archives have the same id)
uint unk2; //0x8, is always the same between textures with equal dimentions and texture format
uint idCopy; //0xC, seems to be a copy of id
ushort width; //0x18, texture width
ushort height; //0x1A, texture height
byte mipLevel; //0x1F, unsure which encoding this uses, stored as mipmap_levels*11 for some odd reason
byte texFmt; //0x38, mostly gonna be 0xC for ETC1 and 0xD for ETC1A4, these values are the same as in SDK enums
//Everything else is mostly 0x0 with some exceptions, but I have no idea what those values do as they seem to be the same between most files
(Logos look off on a white background, so try them on something dark/black)
(IDK where this texture could be used, looks more like an early screenshot to me)
(Now these level renders look early and unused to me, but I might be totally wrong)
(also another thing making me think these are early renders is that the king boo preview like this doesn't exist)
(erm...where would you use this texture in the bunker?)
(Yay E Gadd texture)
(And here goes Ruigi!)
(And now the Boo textures)
(IDK where this texture could be used, looks more like an early screenshot to me)
(Now these level renders look early and unused to me, but I might be totally wrong)
(also another thing making me think these are early renders is that the king boo preview like this doesn't exist)
(erm...where would you use this texture in the bunker?)
(Yay E Gadd texture)
(And here goes Ruigi!)
(And now the Boo textures)
Models are stored in file003. Stuff can be organised into so called "model groups"(basically models which share the same vertex data, submesh info and other stuff). Each model group contains vertex data section (contains vertex stuff and vertex indices), submesh vertex start pointers (relative to vtx data start), submesh info (has stuff like index count/offset, vertex count, data format, etc.) and some other stuff(bones probably, but I haven't figured those out yet).
About vertex data formats: there are a whole bunch of them...and I really mean A LOT(just look at my code)...
A lot of them are kinda similar. Like most of character models use ShortFloat(IDK how this is properly called) for vertex encoding, while levels use float32. I sadly can't figure out much more than vertex coordinates (can't even find UVs in some cases), so if anyone is willing to and can actually help, please do.
I was able to write a working exporter to wavefront obj for some of the formats, here are a few models I've ripped:
Screenshot of Luigi model just as an example:
(Wegee looks a bit derpy like this Xd)
[attachment=9739]
4. NLOC
NLOC - Next-level LOCalization(?) files contain text strings. You can extract these with RoadrunnerWMC's python script(look for it in his gihub gist)
5. NMLB
NMLB - Binary Luigi's Mansion N-something (reversed because little endian). IDK what it is, but I just simply came across this signature while browsing some(probably cutscene) of the files in the hex editor
6. FENL
FENL - IDK what this stands for. Looks like layout data based on where it's found
7. Audio
The audio is stored in wwiseaudio folder. You might thing that wwise-unpacker might extract these, but it fails. The reason is that despite there files being the standard .bnk and .pck variety. The reason wwise-unpacker fails is that these contain audio compressed with the same compression as BCSTM files (DSP ADPCM in case of LM2).
This is a brief writeup on what I've found. If you want to go a bit more in-depth, then go look at the code of LM2L, a tool I made for extracting files from this game. You can find the source code on GitHub. (To anyone wondering, the OGL viewer is probably not gonna be a working thing because I know nothing about OpenGL and how to program it)
P.S. Bonus: in the code.bin there is a string referring to "DS Horror". Seems like they didn't have ideas on how to name the thing or just used this name as a placeholder until they came up with "DualScream"