01-15-2017, 08:55 AM
Please read this before reading the lesson!
Hey everyone! So as I explained in this thread, I wanted to make a thread where I teach people stuff that's useful if you want to do things like investigate file formats or hack ROMs. This is that thread!
Before we start I should point out a few things. First off, this is not your typical tutorial thread. Rather than me just explaining some steps and hoping you understand, it'll be more of a two-way learning experience; sort of like an online classroom.
Every week, I'll post a "lesson", where I cover a specific topic. Then, for that week people can discuss the topic, ask questions, and so forth. You can say whether or not you understood my explanation, if I missed something out, or if you need help with a specific point. This will hopefully make it more helpful and fun for both sides!
Secondly, please read the entire lesson! Even if you get tripped up at a certain point in a lesson, not every part relies on understanding the previous parts. Sometimes you might even find that a later part of a lesson helps understand a previous point. In the end if you've read a lesson and there are still parts you're not confident about then I can try clarify.
The third point is that, I'm not going to try aim these lessons towards any particular use. While my personal expertise is in file formats, and the content will naturally lean in that direction, it should also be useful for other things like ROM hacking, or just something cool to know that might be useful in the future.
Lastly, this thread is for anybody. If you're interested, just join in! I'm going to try make this followable even if you have zero experience in this sort of thing, so don't be shy! This thread is based around feedback and discussion, so the more people the better!
I think that's all, so without further ado, let's begin our first lesson!
Every file on a computer is made out of bytes. Documents, images, programs, all of them are a whole bunch of bytes. And bytes are just numbers, really. But before you can start to understand how these numbers make up files, you need to learn how bytes are made up of bits.
Now a computer is a pretty simple thing. At it's core, it can only really understand one of two states: on and off. This state is called a bit. This seems way too little to be able to do anything with, but let's start small. With these states we can already represent two numbers: off is 0, and on is 1. But of course, we want to counter higher than 1. How do we do that?
Well, let's just think about how we count. We've got 10 digits; 0 to 9. We can count up to 9, and if we need to go higher what we do is we go back to 0, then add a 1 to the left of it, which results in 10. Since the first place can't go any higher, we reset it and just increase the next one. Then we can count from 10 to 19, and then we do the same; set the first place to 0, and increase the second one, which gives us 20.
This continues going on until we reach 99. The first place can't go any higher, so we reset to 0 and increase the second. The second can't go any higher either, so we reset to 0 and increase the third. So we get 100.
So this is pretty straightforward right? Of course it is, you've counted like this all your life. And actually, computers count like this as well. The main difference is that we have ten digits; that is, we count in base ten. Computers on the other hand, as we've seen, only have two digits; they count in base two, also called binary. But how they count is exactly the same way. I'll show you, let's count in binary.
So you go 0, then 1. Welp, we've run out of digits already! No worries, we can just reset and increase the second place. So we get 10. Wait, what?
Maybe we need some notation to make it clearer. What I'm going to do is when I'm representing a number in binary, I'll precede it with "b", and when I'm representing it in base ten I'll just not precede it with anything. The "b" doesn't actually mean anything, it's just there so you know it's a binary number, not a base ten number.
So let's start again. We go b0, then b1. Running out of digits, we increase the second place and get b10. Now, this has a value of 2! The number after 1 is 2, but in binary the digit 2 doesn't exist, so we have to add an extra place.
Make sense? Let's count a bit higher, so we can see how this progresses:
2 = b10
3 = b11
4 = b100
5 = b101
6 = b110
7 = b111
8 = b1000
9 = b1001
10 = b1010
11 = b1011
etc.
It's just like counting in our regular base ten, but just without the digits 2-9. You just increase at the first place, if you can't go further you reset and increase the second place. If the second place can't go further, you reset and increase the third place, and so on.
If it doesn't quite make sense and it just looks like I'm shifting ones and zeroes around, feel free to say. That's how the thread works!
Now you can see that each place in binary is a bit. So we can represent numbers with a bunch of bits! Neat! As you may know a byte is made up of 8 bits. Often when talking about binary you talk about it in the context of bytes (as we will in this thread), so you want to show all 8 bits, even if they're not all necessary to represent the number. So you would display the number 0 like this:
b0000 0000
and 5 (b101) like this
b0000 0101
By the way, I'm separating every 4 bits because that makes it slightly easier to read. Most people do this, much like how some people separate large numbers by 3 digits (e.g. 1,234,567). (In case you're curious, 4 bits is called a nibble, but you don't really need to know that yet )
Since a byte has 8 bits, it can hold whatever numbers we can fit into 8 places in binary. These are all numbers from 0 to 255, inclusive.
0 = b0000 0000
255 = b1111 1111
Since we're including 0, this means we have 256 different numbers. If you've done any kind of game ripping or hacking this might be a familiar number, and this is why. You only need a single byte to point to any specific colour in a 256 colour palette, for instance. But anyway, we'll probably talk more about that in a later lesson.
This next part is kinda mathematical. Nothing too complex I hope, but I can understand if maths isn't your strong suit or it's just been a while. In this case you just say what you're struggling with and I can try help
Now the places in base ten go up by powers of 10; 1, 10, 100, 1000, and so on. In fact, there's a certain way you calculate a number written in base ten. For instance look at the number 453. What we do is we give each place a number from right to left, starting at zero:
Now what we do for each digit is we take the base (10), raise it to the power of the place, then multiply that by the digit. Then you add those together and you have your value.
This is 400 + 50 + 3 (anything to the power of 0 is 1) which, as you know, is 453. This seems super trivial and pointless in base ten, but it works exactly the same way in binary.
Say we have b0110 0001. Since we're always looking at 8 bits there's gonna be 8 places, from 0 to 7. We also tend to call these the "#th" bits, like 0th bit, 1st bit, etc.
Now we go about it the same way. We take the base (2), raise it to the power of the place, then multiply that by the digit, and add them together.
In this case we only have the digits 1 and 0. We can ignore the 0 bits (since those will always give 0), and the 1 bits we can skip the multiplying step (since it's just multiplying by 1) so we just get:
And 64 + 32 + 1 = 97. Tada! That's how you read a binary number. Look where the positions of the 1s, find out those powers of 2 (2 to the power of the bit position) and then just add them together.
Of course your powers of 2 might not be as intuitive as your powers of 10. After all, the latter is just putting zeroes at the end It's not too hard though, you just start at 1 and double until your reach your number. After a while you'll know these off by heart too.
Might be familiar if you've played 2048, that increases in powers of 2
So yeah, that's about it for this lesson! You know how to count in binary and how to read a binary number. You don't need to be able to do this in your head by the way, what's important is that you just understand the concepts. While working with bits may not come up very often, it's a really good place to start and gives you the fundamental understanding of how bytes are put together.
In the next lesson we'll zoom out and focus more on bytes themselves.
Hey everyone! So as I explained in this thread, I wanted to make a thread where I teach people stuff that's useful if you want to do things like investigate file formats or hack ROMs. This is that thread!
Before we start I should point out a few things. First off, this is not your typical tutorial thread. Rather than me just explaining some steps and hoping you understand, it'll be more of a two-way learning experience; sort of like an online classroom.
Every week, I'll post a "lesson", where I cover a specific topic. Then, for that week people can discuss the topic, ask questions, and so forth. You can say whether or not you understood my explanation, if I missed something out, or if you need help with a specific point. This will hopefully make it more helpful and fun for both sides!
Secondly, please read the entire lesson! Even if you get tripped up at a certain point in a lesson, not every part relies on understanding the previous parts. Sometimes you might even find that a later part of a lesson helps understand a previous point. In the end if you've read a lesson and there are still parts you're not confident about then I can try clarify.
The third point is that, I'm not going to try aim these lessons towards any particular use. While my personal expertise is in file formats, and the content will naturally lean in that direction, it should also be useful for other things like ROM hacking, or just something cool to know that might be useful in the future.
Lastly, this thread is for anybody. If you're interested, just join in! I'm going to try make this followable even if you have zero experience in this sort of thing, so don't be shy! This thread is based around feedback and discussion, so the more people the better!
I think that's all, so without further ado, let's begin our first lesson!
Lesson 1: Bits
Every file on a computer is made out of bytes. Documents, images, programs, all of them are a whole bunch of bytes. And bytes are just numbers, really. But before you can start to understand how these numbers make up files, you need to learn how bytes are made up of bits.
Now a computer is a pretty simple thing. At it's core, it can only really understand one of two states: on and off. This state is called a bit. This seems way too little to be able to do anything with, but let's start small. With these states we can already represent two numbers: off is 0, and on is 1. But of course, we want to counter higher than 1. How do we do that?
Well, let's just think about how we count. We've got 10 digits; 0 to 9. We can count up to 9, and if we need to go higher what we do is we go back to 0, then add a 1 to the left of it, which results in 10. Since the first place can't go any higher, we reset it and just increase the next one. Then we can count from 10 to 19, and then we do the same; set the first place to 0, and increase the second one, which gives us 20.
This continues going on until we reach 99. The first place can't go any higher, so we reset to 0 and increase the second. The second can't go any higher either, so we reset to 0 and increase the third. So we get 100.
So this is pretty straightforward right? Of course it is, you've counted like this all your life. And actually, computers count like this as well. The main difference is that we have ten digits; that is, we count in base ten. Computers on the other hand, as we've seen, only have two digits; they count in base two, also called binary. But how they count is exactly the same way. I'll show you, let's count in binary.
So you go 0, then 1. Welp, we've run out of digits already! No worries, we can just reset and increase the second place. So we get 10. Wait, what?
Maybe we need some notation to make it clearer. What I'm going to do is when I'm representing a number in binary, I'll precede it with "b", and when I'm representing it in base ten I'll just not precede it with anything. The "b" doesn't actually mean anything, it's just there so you know it's a binary number, not a base ten number.
So let's start again. We go b0, then b1. Running out of digits, we increase the second place and get b10. Now, this has a value of 2! The number after 1 is 2, but in binary the digit 2 doesn't exist, so we have to add an extra place.
Make sense? Let's count a bit higher, so we can see how this progresses:
2 = b10
3 = b11
4 = b100
5 = b101
6 = b110
7 = b111
8 = b1000
9 = b1001
10 = b1010
11 = b1011
etc.
It's just like counting in our regular base ten, but just without the digits 2-9. You just increase at the first place, if you can't go further you reset and increase the second place. If the second place can't go further, you reset and increase the third place, and so on.
If it doesn't quite make sense and it just looks like I'm shifting ones and zeroes around, feel free to say. That's how the thread works!
Now you can see that each place in binary is a bit. So we can represent numbers with a bunch of bits! Neat! As you may know a byte is made up of 8 bits. Often when talking about binary you talk about it in the context of bytes (as we will in this thread), so you want to show all 8 bits, even if they're not all necessary to represent the number. So you would display the number 0 like this:
b0000 0000
and 5 (b101) like this
b0000 0101
By the way, I'm separating every 4 bits because that makes it slightly easier to read. Most people do this, much like how some people separate large numbers by 3 digits (e.g. 1,234,567). (In case you're curious, 4 bits is called a nibble, but you don't really need to know that yet )
Since a byte has 8 bits, it can hold whatever numbers we can fit into 8 places in binary. These are all numbers from 0 to 255, inclusive.
0 = b0000 0000
255 = b1111 1111
Since we're including 0, this means we have 256 different numbers. If you've done any kind of game ripping or hacking this might be a familiar number, and this is why. You only need a single byte to point to any specific colour in a 256 colour palette, for instance. But anyway, we'll probably talk more about that in a later lesson.
This next part is kinda mathematical. Nothing too complex I hope, but I can understand if maths isn't your strong suit or it's just been a while. In this case you just say what you're struggling with and I can try help
Now the places in base ten go up by powers of 10; 1, 10, 100, 1000, and so on. In fact, there's a certain way you calculate a number written in base ten. For instance look at the number 453. What we do is we give each place a number from right to left, starting at zero:
Now what we do for each digit is we take the base (10), raise it to the power of the place, then multiply that by the digit. Then you add those together and you have your value.
This is 400 + 50 + 3 (anything to the power of 0 is 1) which, as you know, is 453. This seems super trivial and pointless in base ten, but it works exactly the same way in binary.
Say we have b0110 0001. Since we're always looking at 8 bits there's gonna be 8 places, from 0 to 7. We also tend to call these the "#th" bits, like 0th bit, 1st bit, etc.
Now we go about it the same way. We take the base (2), raise it to the power of the place, then multiply that by the digit, and add them together.
In this case we only have the digits 1 and 0. We can ignore the 0 bits (since those will always give 0), and the 1 bits we can skip the multiplying step (since it's just multiplying by 1) so we just get:
And 64 + 32 + 1 = 97. Tada! That's how you read a binary number. Look where the positions of the 1s, find out those powers of 2 (2 to the power of the bit position) and then just add them together.
Of course your powers of 2 might not be as intuitive as your powers of 10. After all, the latter is just putting zeroes at the end It's not too hard though, you just start at 1 and double until your reach your number. After a while you'll know these off by heart too.
Might be familiar if you've played 2048, that increases in powers of 2
So yeah, that's about it for this lesson! You know how to count in binary and how to read a binary number. You don't need to be able to do this in your head by the way, what's important is that you just understand the concepts. While working with bits may not come up very often, it's a really good place to start and gives you the fundamental understanding of how bytes are put together.
In the next lesson we'll zoom out and focus more on bytes themselves.