The VG Resource

Full Version: Puggsoy's guide to bytes n stuff
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Please read this before reading the lesson!

Hey everyone! So as I explained in this thread, I wanted to make a thread where I teach people stuff that's useful if you want to do things like investigate file formats or hack ROMs. This is that thread!

Before we start I should point out a few things. First off, this is not your typical tutorial thread. Rather than me just explaining some steps and hoping you understand, it'll be more of a two-way learning experience; sort of like an online classroom.
Every week, I'll post a "lesson", where I cover a specific topic. Then, for that week people can discuss the topic, ask questions, and so forth. You can say whether or not you understood my explanation, if I missed something out, or if you need help with a specific point. This will hopefully make it more helpful and fun for both sides!

Secondly, please read the entire lesson! Even if you get tripped up at a certain point in a lesson, not every part relies on understanding the previous parts. Sometimes you might even find that a later part of a lesson helps understand a previous point. In the end if you've read a lesson and there are still parts you're not confident about then I can try clarify.

The third point is that, I'm not going to try aim these lessons towards any particular use. While my personal expertise is in file formats, and the content will naturally lean in that direction, it should also be useful for other things like ROM hacking, or just something cool to know that might be useful in the future.

Lastly, this thread is for anybody. If you're interested, just join in! I'm going to try make this followable even if you have zero experience in this sort of thing, so don't be shy! This thread is based around feedback and discussion, so the more people the better!

I think that's all, so without further ado, let's begin our first lesson!



Lesson 1: Bits

Every file on a computer is made out of bytes. Documents, images, programs, all of them are a whole bunch of bytes. And bytes are just numbers, really. But before you can start to understand how these numbers make up files, you need to learn how bytes are made up of bits.

Now a computer is a pretty simple thing. At it's core, it can only really understand one of two states: on and off. This state is called a bit. This seems way too little to be able to do anything with, but let's start small. With these states we can already represent two numbers: off is 0, and on is 1. But of course, we want to counter higher than 1. How do we do that?

Well, let's just think about how we count. We've got 10 digits; 0 to 9. We can count up to 9, and if we need to go higher what we do is we go back to 0, then add a 1 to the left of it, which results in 10. Since the first place can't go any higher, we reset it and just increase the next one. Then we can count from 10 to 19, and then we do the same; set the first place to 0, and increase the second one, which gives us 20.
This continues going on until we reach 99. The first place can't go any higher, so we reset to 0 and increase the second. The second can't go any higher either, so we reset to 0 and increase the third. So we get 100.

So this is pretty straightforward right? Of course it is, you've counted like this all your life. And actually, computers count like this as well. The main difference is that we have ten digits; that is, we count in base ten. Computers on the other hand, as we've seen, only have two digits; they count in base two, also called binary. But how they count is exactly the same way. I'll show you, let's count in binary.

So you go 0, then 1. Welp, we've run out of digits already! No worries, we can just reset and increase the second place. So we get 10. Wait, what?
Maybe we need some notation to make it clearer. What I'm going to do is when I'm representing a number in binary, I'll precede it with "b", and when I'm representing it in base ten I'll just not precede it with anything. The "b" doesn't actually mean anything, it's just there so you know it's a binary number, not a base ten number.
So let's start again. We go b0, then b1. Running out of digits, we increase the second place and get b10. Now, this has a value of 2! The number after 1 is 2, but in binary the digit 2 doesn't exist, so we have to add an extra place.

Make sense? Let's count a bit higher, so we can see how this progresses:

2 = b10
3 = b11
4 = b100
5 = b101
6 = b110
7 = b111
8 = b1000
9 = b1001
10 = b1010
11 = b1011
etc.

It's just like counting in our regular base ten, but just without the digits 2-9. You just increase at the first place, if you can't go further you reset and increase the second place. If the second place can't go further, you reset and increase the third place, and so on.
If it doesn't quite make sense and it just looks like I'm shifting ones and zeroes around, feel free to say. That's how the thread works!

Now you can see that each place in binary is a bit. So we can represent numbers with a bunch of bits! Neat! As you may know a byte is made up of 8 bits. Often when talking about binary you talk about it in the context of bytes (as we will in this thread), so you want to show all 8 bits, even if they're not all necessary to represent the number. So you would display the number 0 like this:

b0000 0000

and 5 (b101) like this

b0000 0101

By the way, I'm separating every 4 bits because that makes it slightly easier to read. Most people do this, much like how some people separate large numbers by 3 digits (e.g. 1,234,567). (In case you're curious, 4 bits is called a nibble, but you don't really need to know that yet Tongue)

Since a byte has 8 bits, it can hold whatever numbers we can fit into 8 places in binary. These are all numbers from 0 to 255, inclusive.

0 = b0000 0000
255 = b1111 1111

Since we're including 0, this means we have 256 different numbers. If you've done any kind of game ripping or hacking this might be a familiar number, and this is why. You only need a single byte to point to any specific colour in a 256 colour palette, for instance. But anyway, we'll probably talk more about that in a later lesson.

This next part is kinda mathematical. Nothing too complex I hope, but I can understand if maths isn't your strong suit or it's just been a while. In this case you just say what you're struggling with and I can try help Smile
Now the places in base ten go up by powers of 10; 1, 10, 100, 1000, and so on. In fact, there's a certain way you calculate a number written in base ten. For instance look at the number 453. What we do is we give each place a number from right to left, starting at zero:

[Image: 93Xdoif.png]

Now what we do for each digit is we take the base (10), raise it to the power of the place, then multiply that by the digit. Then you add those together and you have your value.

[Image: 7s9BzoF.png]

This is 400 + 50 + 3 (anything to the power of 0 is 1) which, as you know, is 453. This seems super trivial and pointless in base ten, but it works exactly the same way in binary.

Say we have b0110 0001. Since we're always looking at 8 bits there's gonna be 8 places, from 0 to 7. We also tend to call these the "#th" bits, like 0th bit, 1st bit, etc.

[Image: VKeS8pL.png]

Now we go about it the same way. We take the base (2), raise it to the power of the place, then multiply that by the digit, and add them together.

[Image: VBKRSsW.png]

In this case we only have the digits 1 and 0. We can ignore the 0 bits (since those will always give 0), and the 1 bits we can skip the multiplying step (since it's just multiplying by 1) so we just get:

[Image: mhi55kX.png]

And 64 + 32 + 1 = 97. Tada! That's how you read a binary number. Look where the positions of the 1s, find out those powers of 2 (2 to the power of the bit position) and then just add them together.

Of course your powers of 2 might not be as intuitive as your powers of 10. After all, the latter is just putting zeroes at the end Tongue It's not too hard though, you just start at 1 and double until your reach your number. After a while you'll know these off by heart too.

[Image: J8Q2jpE.png]

Might be familiar if you've played 2048, that increases in powers of 2 Wink

So yeah, that's about it for this lesson! You know how to count in binary and how to read a binary number. You don't need to be able to do this in your head by the way, what's important is that you just understand the concepts. While working with bits may not come up very often, it's a really good place to start and gives you the fundamental understanding of how bytes are put together.
In the next lesson we'll zoom out and focus more on bytes themselves.
Awesome basic thread from an awesome non-basic member of the site. Love it puggsoy! Keep it up!Smile
Good initiative, and easily accessible.
good job pug, really easy to read and enteartaning. Perfect for beginners
Everyone saying how easily they grasp it, and here I come along to say most of it went over my head. Tongue

I understood a few parts but like... IDK, I can't really pinpoint stuff. Like I get about how you have 1s and 0s and have to shift 'em around... That's about it. I just know I am not grasping it very well overall.

EDIT: Nevermind I gots it. Cute
The shifting is just mimicking the "carry over" algorithm we do when the sum is over 10, such as:

Code:
012
+19 -> 2+9=11, it carries 1 to the tens digit and 1 to the units digit
---
031
However, in base 10 we have 10 symbols to use (0 to 9) and in base 2 we have only 2 symbols to use (0 and 1). In other words, every time the sum would equal to "2" in binary, you carry the digit over.
I think also processor working (instructions, registers...) lesson can help understand bits.
Lesson 2: Bytes and Hexadecimal Numbers

Now that we've covered bits, lets talk about bytes. Chances are you'll be working with bytes more often than bits, so even if you didn't completely understand the previous lesson, you should be just fine. (It definitely helps if you did, though!)

Now as you know, a byte is made up of 8 bits, and the maximum of this is b11111111, which is 255. So each byte can hold a number between 0 and 255, that's 256 different numbers. As I mentioned, this is why 256 is a number you see come up a lot when dealing with things like image palettes.

Not too hard right? And really, there isn't much more to a byte than that! Like I said at the very start, all files in a computer are made up of these, each holding a number between 0 and 255. However, these numbers can be interpreted by programs (word processors, image viewers, etc) and then presented to us in a way we can understand them in a way that makes sense.

Let's see how this works with something really basic: text files. Each character in a text file is actually a byte, which a text program (such as Notepad) displays to you as a character. For example, a byte with a value of 97 is displayed as the character a. Other letters are the same, and in a somewhat orderly fashion. 98 is b, 99 is c, and so on. This is called a character encoding, that is, the encoding is a system that the text program uses to know which numbers correspond to which characters.
For instance, if I had a text file containing the text "VG Resource", the file would be made up of these bytes (which I'm just representing as regular numbers):

86 71 32 82 101 115 111 117 114 99 101

The first byte, 86, is V. 71 is G, 32 is the space character, 82 is R, and so on.

Some people might expect me to mention ASCII right about now. But actually, the most common standards used are ANSI (which is technically an extended ASCII) and UTF-8, since the original ASCII only uses 7 bits. In any case, all three standards use the same numbers for most common characters so it doesn't really matter for now. If you don't know what I'm talking about don't worry, just know that there are different types of character encodings (where certain numbers might be different characters) that can be used but they're all pretty similar, and in this case it doesn't matter.

Anyway, that's one way that you can see how bytes can be interpreted as text. How about an image? Well, suppose we have a file that contains the following bytes:

4 3 0 1 2 3 1 2 3 4 2 3 4 5

Let's also say that there's a list of six colours at the end of the file. Don't worry right now about how these colours are stored, just that we can access and display them. Here's the list:

[Image: zEMbnRz.png]

The file has the following format. The first byte is the width of our image, and the second is the height. In this case these are 4 and 3, which means it's a 4x3 image (very small, but good for explanation purposes).
Every byte after that represents a pixel of a colour from our list of six (starting at zero). This means that any pixel byte with the value of 0 is a magenta pixel, 1 is a cyan pixel, and so on. So let's take our bytes and draw them out as pixels (zoomed in a bit, so you can see them properly):

[Image: 3NyRahd.png]

But wait a second, our image is 4x3, not 12x1! Well, the way we deal with that is simply to start on a new row once we reach the image width:

[Image: ql9oZ7p.png]

If this was a bit hard to follow, remember you can always ask for a more thorough explanation Smile In any case, this is just to showcase how an image viewing program would go through the bytes in this type of file to display an image. You don't have to remember this format specifically (it's just something I made up, albeit based on existing formats), you're good as long as you understand the concept of how a bunch of bytes can interpreted as an image.

So now you've seen some simple examples of how files are made up of bytes, and how programs go about reading and displaying them as useful information. However, to take a look at the bytes of real files people usually use a hex editor. Bytes in a hex editor are displayed in base 16, also known as hexadecimal (or hex for short). So before we can dive into using one, we need to understand how hex works first.

This is kinda similar to learning binary, the concepts are the same, but instead of having only two digits, we now have sixteen digits. But wait a second. In binary we could just use the first two regular digits, 0 and 1. We don't have any digits past 9 though! Well, to compensate we just use letters. So, the first nine digits are the same as base ten, we just go from 0 to 9. However, what we would write as 10 in base ten, we write as A in hex. 11 will be B, 12 will be C, and so on until we reach 15, which is F. Then, to go to 16, we do what we do when we run out of digits in any case; we set the first place to 0, increase the second place, and so we get 10.
Now before we go any further, I should introduce some notation again. Much like we used "b" to denote a binary number, I'm going to use "0x" to denote a hexadecimal number. The reason I'm using that (rather than something like "h") is just because that's how hexadecimal numbers are denoted in programming languages, as well as other places. Some people do use "$", it doesn't really matter as long as you know what it means. In any case, I'll be using "0x" from here on out.

Now that's established, here's a list of numbers counting up in hex so you can get a bit of a feel for it.

0 = 0x0
1 = 0x1
2 = 0x2
...
8 = 0x8
9 = 0x9
10 = 0xA
11 = 0xB
12 = 0xC
13 = 0xD
14 = 0xE
15 = 0xF
16 = 0x10
17 = 0x11
18 = 0x12

And so on. It works like any other base, it just has 16 digits. Just like base two or base ten, once it runs out it increases the next place and starts over. So when you get to 0x1F you'll increment it and get 0x20, and once you reach 0xFF you'll go to 0x100.

Now you might be thinking,

"Hey puggsoy this is cool and all, but why display bytes in hex? Why can't hex editors just use normal numbers?"

Well, the reason is that 0xFF equals 255, that is, the highest number a byte can hold can be represented as the highest number a two-digit hex number can hold. Which means that any byte can be represented by just a two-digit hex number. And this turns out to be pretty convenient.
It also means that you can think of the maximum boundary of a byte being the maximum boundary of two hex digits. In base ten, once you reach 255 you can keep going higher and still use the same number of digits (256, 257, etc). But in hex, if you want to go higher than 0xFF you need to use three digits, and you can kind of say "well if I need more than two, that means I'm going beyond the byte limit". In the long run, it's an easier and tidier way of representing bytes.

By the way, much like how in binary it's common to write leading zeroes even if they're not necessary (so that you show all 8 bits), in hex it's common to write both digits, even if the first one is 0. So when I write 0x0A, it's clearer that I'm talking about a byte with the value 10, rather than 0xA which isn't showing the whole byte. I can also write 0x00 to represent a byte with the value of 0 (also known as a "null byte"). Keep in mind that null bytes are just as important as any other byte, a value of 0 still means something.

Now unlike binary, hex is probably something you'll be seeing a lot of since, as I said, it's likely you'll be working with bytes a lot more than you will with bits. However, you don't have to try and go figuring out how to read hex and convert it in your brain. There are many converters out there, although my personal tool of choice is the programmer mode of the Windows calculator. This not only allows you to switch between decimal (base ten) and hexadecimal, but you can also do calculations if you need to. It also has binary, if you ever need to work with that.
If you're not on a Windows machine (or just don't want to use the calculator), you can probably find something else that does a similar job. Either way, I'd recommend messing around with it a little bit, convert between some numbers and try having it count up in hexadecimal (by repeatedly adding 1). Even though you don't need to know it off by heart, it's good if you get a bit of a feel for the base.

Anyway, that's where we're gonna stop today. In the next lesson we'll actually start using a hex editor and looking at some files!