Paper Mario TTYD File Research: samp - Printable Version +- The VG Resource (https://www.vg-resource.com) +-- Forum: Archive (https://www.vg-resource.com/forum-65.html) +--- Forum: July 2014 Archive (https://www.vg-resource.com/forum-139.html) +---- Forum: Other Stuff (https://www.vg-resource.com/forum-6.html) +----- Forum: Questions, Info, and Tutorials (https://www.vg-resource.com/forum-89.html) +----- Thread: Paper Mario TTYD File Research: samp (/thread-24558.html) |
Paper Mario TTYD File Research: samp - Phaze - 02-06-2014 So... you might or might not have noticed the other day that I posted in this thread for about 5 or 10 minutes before I deleted my post. Basically I felt like I was onto something but because I didn't really have much of anything yet, it seemed very awkward and the mounting anxiety compelled me to delete my post until I collected my thoughts for something with marginally more substance. I figured I might as well make a thread of my own here in the hopes that I might find something and thus illuminate the otherwise seemingly undocumented .samp file, specifically the one in Paper Mario: The Thousand Year Door. It's also to basically ask for help from people who might be better at reading patterns in hex files than me if/when I get stuck My main problem is that I have no experience with reverse-engineering file formats so this is my first time. As a result, expect me to stumble over myself trying to figure out basic things like hex representations of integers and floats along with basic header patterns! Who knows, maybe I'll learn something from this. That'll be great if I do! 'o' So in the basics as people figured, there are sound effects in the pmario.samp file located in the TTYD ISO's /sound/proj/ folder. I've confirmed this by importing the pmario.samp file as Raw Data in Audacity. The only thing is that when I've imported the data, there is this awful LOUD, SCREECHY NOISE PERMEATING THE WHOLE FILE SO IT'S NOT USABLE FOR ANYTHING BUT DAMAGING YOUR EARS. On the plus side though, I did noticeably hear Bowser/Mario/Peach's voice clips past the mid point in the noise! Based on other docs I've read, the format might be a custom-ish 4-bit ADPCM format used for other things in the 'cube like streamed audio. Makes sense. The format I used to import in Audacity to hear the sound files in the horrible screeching fashion was 8-bit PCM, Big Endian, Mono @ 11025Hz. VOX ADPCM, Big Endian, Mono @ 22050Hz also produced similar results. Obviously these are unusable for actual game purposes but they were valuable for me since they told me that the sounds exist seemingly uncompressed/unscrambled in the file, one after another. It gives me hope that while splitting up the sounds and programmatically naming them might be a horrible endeavor, I can at least theoretically find and steal some of the ADPCM decoding code from a project like vgmstream to use in an unpacker/extractor. I'll do this if I'm going nowhere with the thing I'm trying to do. Note: The Hex Editor I'm using is HxD which allows me to specify column width for the hex data and the ASCII representation. I'm using a width of 32 instead of the Hex Editor standard of 16 which makes things much clearer for me. I explain things in the wall of text below under this assumption. Anyway... except for the .db(2) files, there are also a bunch of other files all named "pmario" in the same directory with extensions as follows: .samp - Our prize right here, 10.7MB and confirmed by myself to contain (PCM-based?) audio samples. It appears to store the instrument samples for sequenced music in other games, according to some posts I've read. Since TTYD uses streamed music though, it just chucks the sound effects of the whole game into it for shits 'n giggles. I haven't confirmed that it's Gamecube 4-bit ADPCM format but I'll do it eventually. .sdir - Sound directory, a headerless file with a repeating 32-byte structure that encodes information about each sample. See post below for my current understanding of the structure. .slib - No idea, but it's 275KB compared to to the interesting pmario.sdir filesize of 87.6KB. Must be important, it contains many chunks of data or files that start with a 4 byte integer detailing the chunk's length, including the filesize int. .hrf - ??? Contains "HRFi" as the first 4 bytes in the file, has 8 bytes of unknown purpose (didn't seem to create any sensical number as 4-byte ints. Maybe they're several shorts??), a null, then the name of the file "pmario.samp". There's a huge chunk of nulls followed by some other bytes and "pmario_samp-0000000001.669". I doubt this will be of any use. .etbl - Filename table for sound effects. The last byte before a new name seems to increment up to 0xFF and then rolls back to 0x00 for no real reason without changing anything else in the file. I don't get why. There's also junk in the names up to the 30th character that I explain below. There doesn't seem to be any associated offset or length data in here which makes me think other data might reference it by specifying a fixed index (or the current index of the pmario.samp sound) and multiplying it by 32 (length of each name record in bytes), then reading 30 bytes to get the name of the sound. To get SE3_AMB_RIVER1, (index 3, position 2 starting from 0): 2*32 = 64, or offset of 0x40 - 0x60 (0x40+20, length of record[32] in hex) which corresponds with "SE3_AMB_RIVER1.IO_JUMP2..LING2..". Again, junk is explained below. .stbl - Other filename table for sound effects? Not sure why there are two files, might be contextual. On the previously mentioned note, pmario.stbl may come before pmario.etbl judging by the junk in the pmario.etbl file. Also, I figured out what the ".IO_JUMP2..LING2.." junk means... it's basically a null terminator followed by an "after image" of the names that came before it*. If a subsequent name doesn't take up the 30 characters allowed (it seems the last 2 are 'reserved' for a purpose I haven't identified yet), it simply reprints the last used chars on the next line. In a way I guess it's like if you copied the line you typed to the next line and used Insert mode to overwrite it partially. Either way, it doesn't really seem to matter because even if the full string with junk is read in to a 30 byte array (not including null terminator, otherwise 31) for the filename, the premature end of the string will be signified by a null character that ends the string before the junk data enters it. If it's the full 30 chars? The array ends naturally. Kinda simple in retrospect and not a major discovery that deserves me being windbaggy about, but knowing it makes me happier at least. pmario_sound_bgm_txt.db pmario_sound_env_txt.db pmario_sound_env_txt.db2 ^ These all contain some sort of configuration data for the sounds to be played in-game, including streamed music. It appears to use the tbl filenames as the identifier. As far as I can tell, information stored in these files is effectively worthless from our perspective, aside maybe giving slightly more verbose names for the music tracks, apparently. Everyone just renames those to the appropriate in-game name though so even that information is worthless! If you wish to examine these files yourself I can make a zip/7z archive of the proj folder and upload it to Mediafire or something. Either way you should be able to get the same files if you have a copy of TTYD (hopefully one you ripped yourself!) *Example from pmario.etbl, nulls highlighted in red: Example from pmario.stbl, nulls also highlighted red but with a 30char circled in orange. The 31st char seems to increment almost randomly and this features the 32nd character incrementing pointlessly: RE: Paper Mario TTYD File Research: samp - Phaze - 02-07-2014 I found the sample rates! I updated my OP to reflect this change, now I'm pretty much certain that the 32-byte structures in pmario.sdir encode sample information. I just need to figure out what the others are. The two bytes at the start of this structure are likely an index number while I'm still trying to make sense of 0x04-0x07 (incrementing integer) along with 0C-0D. The latter is all 0x3C00 (15360) though. This number seems familiar, anyone got any ideas? Searches imply it could be volume but I have my doubts. I get the feeling that the last 4 bytes are some kind of offset and I'm fairly certain that they point into the huge chunk of random data below that I mentioned previously; I just noticed that the 4 index bytes at 0x1C - 0x1F literally point to the start of the 'random' data past the 0xFFFFFFFF that seems to signify that the entry blocks end while the last set seem to point near the end of the file. This implies that those structures I mentioned earlier starting with 0x0008 are in fact the beginning of what seem to be repeating 40-byte structures. RE: Paper Mario TTYD File Research: samp - Phaze - 02-08-2014 Quote:Haven't gotten around to playing the game yet. I want to get the Dolphin developers to see if maybe they can see what's wrong first. I found this by pure chance in a thread about Dinosaur Planet (N64) and the inclusion of some of its data in a Starfox Adventures demo disc. It doesn't really tell me anything about converting the data to a usable format but at least it confirms that pmario.proj and pmario.pool aren't really what I'm looking for so I figure I might as well drop those from investigation. Also, current details on the sdir 32-byte block format: Code: -------------------------------------------------------- I'm also trying to see if there is any filename index information that I can point to a name in pmario.stbl or pmario.etbl but I haven't found anything concrete yet. I might try to make a program to parse the pmario.samp file using the data in pmario.sdir and retrieve each sample (over 1,200(!) in total, though) but I might see if vgmstream supports converting headerless raw sample data so I can also try converting them to WAV, too. I will do some research on that note, soon. On a semi-related note, here are all of the filenames in the tbl files without the junk following them. Not sure why there's ~300 more filenames in the table compared to the number of sample chunks. This is why I want to find some sort of index data if possible because I could end up with wrong names. On the note of pmario.slib though, I noticed that it also seems to be a huge collection of smaller chunks of data (or perhaps files). The first four bytes of the file appear to be an integer that shows how long the chunk/file (including the 4 byte size) following it is. After each chunk, this 4-bytes-then-data thing repeats, albeit with a differing size. For example, the first four bytes in the file are 0x000FC0 (4032 bytes); sure enough, when you highlight the first 4032 bytes including the 0x000FC0, the length comes out to be 0x0FC0, or 3KB. I have no idea what the data could be yet but at least I figured that one out. The file is ended with a sequence of 0xFFFFFFFF. RE: Paper Mario TTYD File Research: samp - TheShyGuy - 02-08-2014 Quote:Note: The Hex Editor I'm using is HxD I like that Hex editor too since it's nice and simple. But I recommend Hex Edit which allows you to write file templates. check out the images here as an example: http://brawlimports.proboards.com/thread/12/mdl0-template-works-edit-mode edit: file template making tutorial http://www.youtube.com/watch?v=snL0_rfBDNo RE: Paper Mario TTYD File Research: samp - Phaze - 02-08-2014 (02-08-2014, 03:56 PM)TheShyGuy Wrote: Hex Edit Seems neat but I might hold off on that for now. I wasn't aware it existed though, thanks for the info. |