Well, I had lots of trouble finding the ISO too but eventually I did find it.
The process is somewhat complex and has high prerequisites: starting to decode this type of problem requires intermediate knowledge of proprietary file formats, data compression, programming and getting your way with a hex editor.
But basically what I did was check out the files with a hex editor to get a grasp on what they might be – their headers said TIM2 so I thought they would be TIM2 files. They didn't open in any TIM2 viewer, so I concluded they're compressed.
The compressed file's first $16 bytes were
Code:
01 2C 80 07 54 49 4D 32 04 00 01 00 94 00 04 70
but according to the TIM2 specification, every TIM2's first $16 bytes should be
Code:
54 49 4D 32 04 00 01 00 00 00 00 00 00 00 00 00
So there were apparently
some similarities. Ignoring the first four bytes of the compressed file, they were the same, but the compressed file has
94 00 where the real file should have
00 00 00 00 00 00 00 00. This was my first clue.
As I couldn't figure it out, I booted PCSX2 and ran the game. Immediately when the game starts, you see the "Broccoli" image. I generated a Save State and exited the emulator.
Save states are just snapshots of the console's RAM at runtime, and because I could see the graphics in-game, I knew they were decompressed to RAM. Therefore the RAM (or... save state file) contains an uncompressed TIM2 now.
I opened up the PCSX2 save state in a hex editor only to find it starts with the bytes "PK". I knew that any file that starts with "PK" is a ZIP file, so I renamed the state of a ZIP and decompressed it. There were files and folders inside - it appears that PS2 save states are more complex than say SNES save states.
I didn't know which file I should look into so I opened them all in a hex editor and looked for "TIM2". All returned no results except one, eeMemory.bin. I knew this was my file where the pure TIM2 would reside.
There were many results for TIM2, because the same state holds many other compressed TIM2 files inside it, probably waiting to be decompressed. I searched and searched and amidst all the 94 00-type of TIM2s, I found
one 00 00 00 00 00 00 00 00-type of TIM2. This
had to be the Broccoli image. I copied the bytes out to a new file and started comparing it to the compressed TIM2.
Essentially, at this point I knew what a compressed file looks like and what a decompressed file looks like. All I was left to figure out was "what happens in-between", as in, how does the game transform the compressed 10kb TIM2 to the decompressed 77kb TIM2?
This is a typical analysis method I use when I'm in a situation like this; I load both files' bytes on top of each other and look for similarities and indent the gaps with spaces. Here's a screenshot of my text editor:
On the top row, you see the compressed bytes. On the bottom row, you see the decompressed bytes. I've indented them so they match as much as possible and started writing my notes below.
The 1-byte commands were immediately obvious. $07 means "read 8 bytes as is" and $04 means "read 5 bytes as is" so it can be safely deducted that one-byte commands N mean "read N+1 bytes and print them to output.
I wrote down things I could deduce of the two-byte commands as much as I could, and started looking at the bytes and what they do. What made me think was the 80 07 command, because it output 2C 01 00, three bytes that were already in the output, eight bytes before.
My hypothesis was that these two bytes have the first byte for amount and the second byte for distance+1 (the second byte was 07 and the data was 8 bytes back, it seemed obvious)
I tested this hypothesis by looking at some other 2-byte commands, too, and they all obeyed this rule. Look at 88 33, then look what's $33(+1) bytes before it. This had to be it.
So I started writing a decompressor in Python and got great output. Some parts were wrong, though, as the hex editor told me when I did a binary diff between my output file and the file I snatched from the save state. It appeared to fail on parts where it needed to retrieve data from very far away.
So, my program can't "reach" back far enough to copy the bytes it needs. Why? I made some fixes to my program so it would be more verbose and fair enough, I noticed that on the 2-byte commands it fails on, the second byte was abnormally small, but the last bit of the first byte was set. It then dawned on me that the two-byte commands need to be read as bits, with a split of 6/10 instead of the typical 8/8. Tried that and boom – it works. I got the Broccoli image out and a binary diff between my output and the original file said "these files are identical"
So I started running the script on other files, and got a few others out, but it's still failing on some files. That's where I am now. Gonna try and figure it out later.
EDIT: I see now. The output files aren't "just" TIM2 files, but clusters of many TIM2 files, just glued together. A001 alone contains 11 TIM2s. Some are still fucked up but I'm getting there....
EDIT EDIT EDIT EDIT: I'm prettttttttty sure I nailed everything down now. As a test, I exhaustively checked a huge 14MB file, A021, and got these 159 images out. Every byte in the compressed file has been read, so I'm certain I'm not missing anything (except maybe CLUTs, so if you see some funny colors, they can probably be fixed super easily but you just need to do it manually...) Check these out:
I will get back to this!