So, getting back to this.
Quote:Thank you so much for helping me ^^ How did you extract the files?
I had no idea how ZDA files work, but the first thing one should do when working with any unknown file format is to
open it with a hex editor. Let's see check out some file:
Right off the bat, you can make out leveldone.xm and getready.xm and some other strings. Let's try adjusting the width of the window so we might align the data in a more readable way:
Much better. Now let's do some analysis.
• The archive is a .zda file. The file's first bytes are 'ZDA'. This could be a file identifier (all other files begin with 'ZDA' too.)
• It appears there are
nine filenames listed. The long after the ZDA identifier is
09 00 00 00. Is this a coincidence?
• The long after the 09000000 and just before the first filename is
E0 01 00 00. Address 0x1E0 in the file looks like a first instance of "real" data instead of just filename definitions. Is this a coincidence?
Let's lay the bytes down in a bit nicer, neater manner:
That's good, but because we have a strong suspicion (because of the 09 00 00 00 being 0x9 and E0 01 00 00 being 0x1E0) that we should read
Little Endian Longs instead of single bytes, let's shuffle the bytes around a bit:
It appears that We've got the first three longs figured out, the last thing to figure out is the three unknown values: value1, value2 and value3. Some things to notice:
• value3 is always bigger than the previous one. This suggests it might be a pointer.
• value2 is always bigger than its accompanying value1. This suggests value2 might be 'compressed filesize' and value1 might be 'decompressed filesize'
So let's check 0x1E0 again, because that's where our data starts. To test our hypothesis about value2 being compressed filesize and select exactly 0x49A9 bytes:
Very interesting. Let's try selecting 0x46CE bytes exactly after it, as instructed by the second header.
I think I get it now! Do you notice anything special?
...
That's right! Every piece of data we've looked so far
begins with $78 (or 'x' in ASCII). That's a telltale sign of zlib compressed data. I knew beforehand that all zlib compressed data always begins with $78 but you would've come to the same conclusion had you googled enough.
But yeah anyway now we know*
• How many files are there inside a ZDA package
• Their filenames
• Their filesizes
• Their addresses
• Their compression method
*) or at least have a pretty good hunch
So let's get to decompression. I had never worked with actually decompressing zlib data before so I googled a bit and found out that Python has zlib decompression methods built in. So all in all writing the decompressor was pretty easy - here's the script I wrote.
LINK
After analysing the resulting files, I also concluded that my hunch about the "decompressed size" value was correct.
So, what next? There still appears to be some kind of compression/encryption in the files, as the XM and WAV files don't work. The BMP files aren't legitimate either. What's going on...?
tbh no clue, it looks like some kind of pseudo-BMP. Variable bit depths, reverse row order, all kinds of typical BMP stuff.
Here are all the decompressed files:
https://www.dropbox.com/sh/o016vwmxl3foy...xqWPa?dl=0
If we've got some kind of BMP in-house expert, maybe they can take a look. I'm out of ideas, sorry.