07-21-2017, 09:59 PM
This thread is an attempt to collect information about the new texture format used on the Nintendo Switch, BNTX (Binary Texture). It's different from formats used on previous nintendo consoles (like the WiiU and the 3DS, at least afaik).
I started to write a tool to extract textures from the bntx container. Currently, it supports the following formats (list not guaranteed to be up-to-date):
- BC1 (DXT1)
- BC2 (DXT3)
- BC3 (DXT5)
- BC4
- BC5
- RGBA8888
- RGB565
BNTX texture tool: https://github.com/gdkchan/BnTxx
I also made a tool to extract the Swich RomFS. Discussion about said container is out of the scope of this thread, but if anyone is interested, tool can be found here: https://gist.github.com/gdkchan/635187f5...53493f275f.
Overview of the format:
BNTX is basically a texture container. It can contain multiple textures, have a PATRICIA-trie based dictionary that allows quick access to textures using names as key, and also a relocation table that allows the binary to be loaded anywhere in memory and the addresses can be easily converted from relative offsets to absolute pointers.
Sections:
- BNTX Main header, contains pointers to the other sections, and also some lengths
- _STR String table section. First name is always an empty string "\0", used by the root node of the tree.
- _DIC Dictionary using the PATRICIA tree, each node have 16 bytes.
- BRTI Texture information. Contains one for each texture on the file.
- BRTD Texture data. BNTX contains only one of this section with all textures inside. Textures are aligned into 0x800 bytes blocks, and the 16 bytes header comes before the data.
- _RLT Relocation table, it's the last section on the file and contains the addresses for all pointers inside the file.
All the sections that starts with _ can be ignored if one just whiches to extract textures, because all data can be obtained from other sections too. The BRTD header can also be ignored because the only information it contains is the length of the data section (which is only useful if you're going to read it into memory and use the buffer directly).
Swizzling:
Switch textures uses swizzling, the DXT compressed textures have swizzling applied to the address of the 4x4 tiles, and on non-compressed textures, it is used on the address of each pixel. On the tile address, the bits from the X and Y coordinates are distributed using this pattern: yyyy x yy x y. However, after certain point it seems to use linear addressing, and this point is when either the numbers of available bits are over, or when the biggest tile size (which is 4/8/16/32x128, see below for details on the width/X pad) is hit. Take this information with a grain of salt, since it's not guaranteed (and most likely isn't) accurate.
Anyway, here is a real example (from a 512x512 dxt5 texture) that maybe can help you better understand the address format:
x x x x x y y y y x y y x y 0 0 0 0
Note that the entire address have 18 bits, which is the size of 512 * 512 - 1 = 0x3ffff. Since we're talking about dxt5 textures here, the lower 4 bits are the address inside the 16 bytes tile data block, this one is linear and you don't need to worry about it.
It will keep the pattern for the biggest tile size that can still fit inside the texture (see exception below). Which is 4x128 on a 2048x2048 (512x512 tiles) dxt5 texture for example, or 4x16 on a dx5 texture with something like 30x21 tiles. It decides whenever to round up or down based on the wasted height. Below you can find pseudo-code that shows how it's calculated:
For a 30x21 textura for example, a 4x16 block is used, while a 30x22 texture uses 4x32 blocks, and the texture data needs to be padded accordingly.
It's currently unknown if this only applies to compressed textures or all types, or even if this is accurate to what the hardware does, but this worked for all observed textures so far.
The "real swizzling" only seems to take place starting at bit 4. So, for example, dxt5 have 4 bits on the address for addressing inside the 16 bytes tile. On dxt1 on the other hand, each tile only uses 8 bytes, so only the lower 3 bits are used for addressing inside the tile. The extra "0" is filled with x. So, following the above swizzle, we have for dxt1:
... y y y y x y y x y x 0 0 0
And, for rgba8888, each pixel uses 4 bytes, so only the lower 2 bits are used for addressing inside the pixel color. We therefore have:
... y y y y x y y x y x x 0 0
For rgb565/rgb5551 and 16 bits formats:
... y y y y x y y x y x x x 0
and so on...
My current theory is that this was done to make hardware implementation simpler, maybe, since swizzling takes place at the same position, but since this is not my field I could be totally wrong here.
You can find a most likely shite implementation of the above swizzle here:
https://github.com/gdkchan/BnTxx/blob/ma...zleAddr.cs
Note that the upper bytes of the address uses linear addressing. So you need to calculate it as x + y * remaining_width, and shift the result to place it at the top bits. This is necessary for non-power of 2 textures.
Observed texture data width seems to be padded. On tiled textures, it seems to be padded so that the width is always a multiple of 4 (and 8 for 64 bits formats?). For RGBA8888, it seems to be padded to be a multiple of 16, and on RGB565/L8A8 a multiple of 32.
Some textures:
Those are some textures extracted from Puyo puyo tetris, the game I'm using to do this research, and also one of the few games that interests me on the Switch currently:
Any suggestion for improvement, correction or new information is welcomed.
TODO list:
- Figure out how non-compressed textures are swizzled (they seems to be encoded into 8x8 tile blocks but i'm not sure yet).
- Add support for more formats
- Support cubemap textures
I started to write a tool to extract textures from the bntx container. Currently, it supports the following formats (list not guaranteed to be up-to-date):
- BC1 (DXT1)
- BC2 (DXT3)
- BC3 (DXT5)
- BC4
- BC5
- RGBA8888
- RGB565
BNTX texture tool: https://github.com/gdkchan/BnTxx
I also made a tool to extract the Swich RomFS. Discussion about said container is out of the scope of this thread, but if anyone is interested, tool can be found here: https://gist.github.com/gdkchan/635187f5...53493f275f.
Overview of the format:
BNTX is basically a texture container. It can contain multiple textures, have a PATRICIA-trie based dictionary that allows quick access to textures using names as key, and also a relocation table that allows the binary to be loaded anywhere in memory and the addresses can be easily converted from relative offsets to absolute pointers.
Sections:
- BNTX Main header, contains pointers to the other sections, and also some lengths
- _STR String table section. First name is always an empty string "\0", used by the root node of the tree.
- _DIC Dictionary using the PATRICIA tree, each node have 16 bytes.
- BRTI Texture information. Contains one for each texture on the file.
- BRTD Texture data. BNTX contains only one of this section with all textures inside. Textures are aligned into 0x800 bytes blocks, and the 16 bytes header comes before the data.
- _RLT Relocation table, it's the last section on the file and contains the addresses for all pointers inside the file.
All the sections that starts with _ can be ignored if one just whiches to extract textures, because all data can be obtained from other sections too. The BRTD header can also be ignored because the only information it contains is the length of the data section (which is only useful if you're going to read it into memory and use the buffer directly).
Swizzling:
Switch textures uses swizzling, the DXT compressed textures have swizzling applied to the address of the 4x4 tiles, and on non-compressed textures, it is used on the address of each pixel. On the tile address, the bits from the X and Y coordinates are distributed using this pattern: yyyy x yy x y. However, after certain point it seems to use linear addressing, and this point is when either the numbers of available bits are over, or when the biggest tile size (which is 4/8/16/32x128, see below for details on the width/X pad) is hit. Take this information with a grain of salt, since it's not guaranteed (and most likely isn't) accurate.
Anyway, here is a real example (from a 512x512 dxt5 texture) that maybe can help you better understand the address format:
x x x x x y y y y x y y x y 0 0 0 0
Note that the entire address have 18 bits, which is the size of 512 * 512 - 1 = 0x3ffff. Since we're talking about dxt5 textures here, the lower 4 bits are the address inside the 16 bytes tile data block, this one is linear and you don't need to worry about it.
It will keep the pattern for the biggest tile size that can still fit inside the texture (see exception below). Which is 4x128 on a 2048x2048 (512x512 tiles) dxt5 texture for example, or 4x16 on a dx5 texture with something like 30x21 tiles. It decides whenever to round up or down based on the wasted height. Below you can find pseudo-code that shows how it's calculated:
Code:
//Note: Perform rounding only if number is NOT a power of 2 already, otherwise the code below can be ignored entirely.
height_rounded_up = pow2_round_up(height)
height_rounded_down = pow2_round_down(height)
IF height <= height_rounded_down + height_rounded_down / 3 THEN
height = height_rounded_down
ELSE
height = height_rounded_up
END IF
For a 30x21 textura for example, a 4x16 block is used, while a 30x22 texture uses 4x32 blocks, and the texture data needs to be padded accordingly.
It's currently unknown if this only applies to compressed textures or all types, or even if this is accurate to what the hardware does, but this worked for all observed textures so far.
The "real swizzling" only seems to take place starting at bit 4. So, for example, dxt5 have 4 bits on the address for addressing inside the 16 bytes tile. On dxt1 on the other hand, each tile only uses 8 bytes, so only the lower 3 bits are used for addressing inside the tile. The extra "0" is filled with x. So, following the above swizzle, we have for dxt1:
... y y y y x y y x y x 0 0 0
And, for rgba8888, each pixel uses 4 bytes, so only the lower 2 bits are used for addressing inside the pixel color. We therefore have:
... y y y y x y y x y x x 0 0
For rgb565/rgb5551 and 16 bits formats:
... y y y y x y y x y x x x 0
and so on...
My current theory is that this was done to make hardware implementation simpler, maybe, since swizzling takes place at the same position, but since this is not my field I could be totally wrong here.
You can find a most likely shite implementation of the above swizzle here:
https://github.com/gdkchan/BnTxx/blob/ma...zleAddr.cs
Note that the upper bytes of the address uses linear addressing. So you need to calculate it as x + y * remaining_width, and shift the result to place it at the top bits. This is necessary for non-power of 2 textures.
Observed texture data width seems to be padded. On tiled textures, it seems to be padded so that the width is always a multiple of 4 (and 8 for 64 bits formats?). For RGBA8888, it seems to be padded to be a multiple of 16, and on RGB565/L8A8 a multiple of 32.
Some textures:
Those are some textures extracted from Puyo puyo tetris, the game I'm using to do this research, and also one of the few games that interests me on the Switch currently:
Any suggestion for improvement, correction or new information is welcomed.
TODO list:
- Figure out how non-compressed textures are swizzled (they seems to be encoded into 8x8 tile blocks but i'm not sure yet).
- Add support for more formats
- Support cubemap textures