Lesson 2: Bytes and Hexadecimal Numbers
Now that we've covered bits, lets talk about bytes. Chances are you'll be working with bytes more often than bits, so even if you didn't completely understand the previous lesson, you should be just fine. (It definitely helps if you did, though!)
Now as you know, a byte is made up of 8 bits, and the maximum of this is b11111111, which is 255. So each byte can hold a number between 0 and 255, that's 256 different numbers. As I mentioned, this is why 256 is a number you see come up a lot when dealing with things like image palettes.
Not too hard right? And really, there isn't much more to a byte than that! Like I said at the very start, all files in a computer are made up of these, each holding a number between 0 and 255. However, these numbers can be interpreted by programs (word processors, image viewers, etc) and then presented to us in a way we can understand them in a way that makes sense.
Let's see how this works with something really basic: text files. Each character in a text file is actually a byte, which a text program (such as Notepad) displays to you as a character. For example, a byte with a value of 97 is displayed as the character a. Other letters are the same, and in a somewhat orderly fashion. 98 is b, 99 is c, and so on. This is called a character encoding, that is, the encoding is a system that the text program uses to know which numbers correspond to which characters.
For instance, if I had a text file containing the text "VG Resource", the file would be made up of these bytes (which I'm just representing as regular numbers):
86 71 32 82 101 115 111 117 114 99 101
The first byte, 86, is V. 71 is G, 32 is the space character, 82 is R, and so on.
Some people might expect me to mention ASCII right about now. But actually, the most common standards used are ANSI (which is technically an extended ASCII) and UTF-8, since the original ASCII only uses 7 bits. In any case, all three standards use the same numbers for most common characters so it doesn't really matter for now. If you don't know what I'm talking about don't worry, just know that there are different types of character encodings (where certain numbers might be different characters) that can be used but they're all pretty similar, and in this case it doesn't matter.
Anyway, that's one way that you can see how bytes can be interpreted as text. How about an image? Well, suppose we have a file that contains the following bytes:
4 3 0 1 2 3 1 2 3 4 2 3 4 5
Let's also say that there's a list of six colours at the end of the file. Don't worry right now about how these colours are stored, just that we can access and display them. Here's the list:
The file has the following format. The first byte is the width of our image, and the second is the height. In this case these are 4 and 3, which means it's a 4x3 image (very small, but good for explanation purposes).
Every byte after that represents a pixel of a colour from our list of six (starting at zero). This means that any pixel byte with the value of 0 is a magenta pixel, 1 is a cyan pixel, and so on. So let's take our bytes and draw them out as pixels (zoomed in a bit, so you can see them properly):
But wait a second, our image is 4x3, not 12x1! Well, the way we deal with that is simply to start on a new row once we reach the image width:
If this was a bit hard to follow, remember you can always ask for a more thorough explanation In any case, this is just to showcase how an image viewing program would go through the bytes in this type of file to display an image. You don't have to remember this format specifically (it's just something I made up, albeit based on existing formats), you're good as long as you understand the concept of how a bunch of bytes can interpreted as an image.
So now you've seen some simple examples of how files are made up of bytes, and how programs go about reading and displaying them as useful information. However, to take a look at the bytes of real files people usually use a hex editor. Bytes in a hex editor are displayed in base 16, also known as hexadecimal (or hex for short). So before we can dive into using one, we need to understand how hex works first.
This is kinda similar to learning binary, the concepts are the same, but instead of having only two digits, we now have sixteen digits. But wait a second. In binary we could just use the first two regular digits, 0 and 1. We don't have any digits past 9 though! Well, to compensate we just use letters. So, the first nine digits are the same as base ten, we just go from 0 to 9. However, what we would write as 10 in base ten, we write as A in hex. 11 will be B, 12 will be C, and so on until we reach 15, which is F. Then, to go to 16, we do what we do when we run out of digits in any case; we set the first place to 0, increase the second place, and so we get 10.
Now before we go any further, I should introduce some notation again. Much like we used "b" to denote a binary number, I'm going to use "0x" to denote a hexadecimal number. The reason I'm using that (rather than something like "h") is just because that's how hexadecimal numbers are denoted in programming languages, as well as other places. Some people do use "$", it doesn't really matter as long as you know what it means. In any case, I'll be using "0x" from here on out.
Now that's established, here's a list of numbers counting up in hex so you can get a bit of a feel for it.
0 = 0x0
1 = 0x1
2 = 0x2
...
8 = 0x8
9 = 0x9
10 = 0xA
11 = 0xB
12 = 0xC
13 = 0xD
14 = 0xE
15 = 0xF
16 = 0x10
17 = 0x11
18 = 0x12
And so on. It works like any other base, it just has 16 digits. Just like base two or base ten, once it runs out it increases the next place and starts over. So when you get to 0x1F you'll increment it and get 0x20, and once you reach 0xFF you'll go to 0x100.
Now you might be thinking,
"Hey puggsoy this is cool and all, but why display bytes in hex? Why can't hex editors just use normal numbers?"
Well, the reason is that 0xFF equals 255, that is, the highest number a byte can hold can be represented as the highest number a two-digit hex number can hold. Which means that any byte can be represented by just a two-digit hex number. And this turns out to be pretty convenient.
It also means that you can think of the maximum boundary of a byte being the maximum boundary of two hex digits. In base ten, once you reach 255 you can keep going higher and still use the same number of digits (256, 257, etc). But in hex, if you want to go higher than 0xFF you need to use three digits, and you can kind of say "well if I need more than two, that means I'm going beyond the byte limit". In the long run, it's an easier and tidier way of representing bytes.
By the way, much like how in binary it's common to write leading zeroes even if they're not necessary (so that you show all 8 bits), in hex it's common to write both digits, even if the first one is 0. So when I write 0x0A, it's clearer that I'm talking about a byte with the value 10, rather than 0xA which isn't showing the whole byte. I can also write 0x00 to represent a byte with the value of 0 (also known as a "null byte"). Keep in mind that null bytes are just as important as any other byte, a value of 0 still means something.
Now unlike binary, hex is probably something you'll be seeing a lot of since, as I said, it's likely you'll be working with bytes a lot more than you will with bits. However, you don't have to try and go figuring out how to read hex and convert it in your brain. There are many converters out there, although my personal tool of choice is the programmer mode of the Windows calculator. This not only allows you to switch between decimal (base ten) and hexadecimal, but you can also do calculations if you need to. It also has binary, if you ever need to work with that.
If you're not on a Windows machine (or just don't want to use the calculator), you can probably find something else that does a similar job. Either way, I'd recommend messing around with it a little bit, convert between some numbers and try having it count up in hexadecimal (by repeatedly adding 1). Even though you don't need to know it off by heart, it's good if you get a bit of a feel for the base.
Anyway, that's where we're gonna stop today. In the next lesson we'll actually start using a hex editor and looking at some files!