Binary, Bits and Bytes

Jürgen Welz

Demo File

 

Why would anyone want to query on a bit value? The idea is that we can store more information in less space and retrieve more information with less traffic. Not everyone feels this is important because the cost of data storage continues to decline and processors are getting faster. There are, however, circumstances where data transfer is choked by bandwidth limitations, and this is where the most benefit can be reaped. I have also yet to hear anyone using an application complain that it is too fast. The downside is that by compressing data into a single field, the data is obscured, and conventional query methods aren’t as applicable. Furthermore, storing multiple attributes of a record in a single field violates the rule of data normalization that requires data to be atomic.

 

Atomic data is data where only a single attribute of a record is stored in a field. An example of a violation of this rule is the combination of first and last name in a record of attributes of a person. Since it may be desirable to locate, query, or sort records by either first or last name, one may conclude that they are really separate attributes and as separate attributes, each should have its own field. Although it is possible to devise a query in Access that will split the attributes in order to constrain a recordset appropriately, this is generally done with a user-defined function creating a calculated field and support for user-defined functions is not universal across databases. In addition, the necessity of creating a calculated field means that it becomes necessary to perform the calculation on each record in the data being queried and this fact denies the option of placing an index on the field, an option that would have been available had the data been divided between first and last name fields at the time of data entry. In this case, the violation of atomicity makes the data more complex and slows down any analysis. Perhaps the best argument that can be made for this approach is the situation where there are more attributes of a record than fields permitted in a table. By combining fields, it may be possible to keep within Access’ 255 field limit per table rather than creating an additional 1-1 table of additional fields, which is another violation of normalization principles. The loss of ability to index a field on which records are likely to be selected or sorted would mandate the latter form of violation.

 

Programs like Access run on computers working with binary data and addressing. Given that we are educated in a decimal numeric system and a character-based language system, there is considerable translation required from one system to others in order to make computers useable and comprehensible to the average user.

 

The fundamental unit of storage of data is the bit, from binary digit. This unit is a single switch that has one of two possible states. In the computer, the bit represents either a high or a low voltage that can represent either a zero or a one in a binary number system. Think of a light bulb that is on or off and either state can also be abstracted to mean true or false. A bit all by itself doesn’t have a lot to convey, but if you have 3 bits, you can represent a conventional traffic light but you need some more to write a letter. Morse code uses a sequence of two state information, dots and dashes, to represent the alphabet. Our computers use not a sequence but a combination of zeros and ones to represent letters, numerals, capitalization, and punctuation characters. In order to represent all the possible combinations and allow a few extras for controlling printers, it turns out that 8 bits can be combined in 256 unique patterns, each of which can uniquely represent a single character. For this reason, the byte has become the basic unit of data storage in our computers.

 

In binary math terms, a byte is another kind of critter altogether. Binary math has to do with a number system that has two possible values. We assign the values as zero and one and doing so maps well to the high/low voltage states used by our computers. We grew up with a base 10 number system and the binary number system is base 2. With base 10, there are 10 possible values a digit can take, 0 through 9 and base 2 gives us only two values, 0 or 1. Any number can be represented in either number system but the binary representation isn’t always intuitive to people versed, schooled and inculcated in a base 10 system.

 

In base 10, the number four thousand and twenty seven is written 4027. This means 4 thousand plus 0 hundred plus 2 tens plus 7 ones. This can also be written as 4 times the base raised to the power of the position of the fours digit minus one plus 0 times the base raised to the power of the position of the 0 bit minus 1 plus 2 times the power of the twos digit minus one plus 7 times 10 raised to the power of the position of the 7 digit minus 1. Rewritten again, this is:

 

4*10^(4–1) + 0*10^(3-1) + 2*10^(2–1) + 7*10^(1–1) or:

 

4*10^3 + 0*10^2 + 2*10^1 + 7*10^0 or:

 

4*1000 + 0*100 + 2*10 + 7*1 or:

 

4000 + 0 + 20 + 7

 

We don’t need to go through all these machinations to get to the number in base 10 since we understand it intuitively. However, the same mathematical rules do work in any other base number system including base 2. Here’s an example using base 1000. The value 123,456,789,012 can be considered to be 4 ‘digits’ separated by commas. Each possible combination of all 3 numeric characters can be mapped to a unique quantity and when counting, as the quantity reaches the base, the number resets to 000 and increments the next ‘digit.’ Applying the rules of the base 10 example:

 

123*1000^(4-1) + 456*1000^(3-1) + 789*1000^(2-1) + 012 *1000^(1-1) or:

 

123*1000^(3) + 456*1000^(2) + 789*1000^(1) + 012 *1000^(0) or:

 

123*1,000,000,000 + 456*1,000,000 + 789*1,000 + 012 *1 or:

 

123,000,000,000 + 456,000,000 + 789,000 + 012 or:

 

123,456,789,012

 

Now base 1000 is still intuitively a kind of base 10. You have to wrap your mind around the fact that there are bases that don’t use only the characters 0 through 9. For example, an octal number system wraps after 7. Decimal wraps after 9. Hexadecimal wraps after 15 so we need to invent digits to mean 10, 11, 12, 13, 14 and 15 and these are commonly mapped to A, B, C, D, E and F, respectively. Consider seconds, degrees, or seconds, minutes, and hours—all which wrap at 59 (base 60) or dozens (gross) or inches and feet, which wrap at 12. For those of you who have seen movies where prisoners scratch off days with vertical lines up to four and then a line through them at five, the grouping is base 5 but the number system is in reality a base one, where you wrap to the next character upon every increment.

 

That brings us to base 2. 10100101 is a typical byte, a combination of 8 bits that can represent a character but is also a number. Rewritten:

 

1*2^(8–1) + 0*2^(7–1) + 1*2^(6–1) + 0*2^(5–1) + 0*2^(4–1) + 1*2^(3–1) + 0*2^(2–1) + 1*2^(1–1) or:

 

1*2^7 + 0*2^6 + 1*2^5 + 0*2^4 + 0*2^3 + 1*2^2 + 0*2^1 + 1*2^0 or:

 

128 + 0 + 32 + 0 + 0 + 4 + 0 + 1 or:

 

165

 

A couple of things are apparent. To increment a number in any number system, you increase the digit from zero until you get to the highest number in the number system and then you increment the next digit which is what I called wrap earlier. If it’s already the highest number, you move on to the next digit, and the next like an old mechanical odometer in a car. I’ve included some more information as to the byte numeric equivalent of several characters based on the US keyboard (Table 1). You can test them out in the Debug window easily enough.

 

Table 1

 




 

0

00000000

 

64

01000000

@

128

10000000

192

11000000

 

1

00000001

 

65

01000001

A

129

10000001

 

193

11000001

 

2

00000010

 

66

01000010

B

130

10000010

 

194

11000010

 

3

00000011

 

67

01000011

C

131

10000011

ƒ

195

11000011

 

4

00000100

 

68

01000100

D

132

10000100

196

11000100

 

5

00000101

 

69

01000101

E

133

10000101

197

11000101

 

6

00000110

 

70

01000110

F

134

10000110

198

11000110

 

7

00000111

 

71

01000111

G

135

10000111

199

11000111

 

8

00001000

 

72

01001000

H

136

10001000

ˆ

200

11001000

 

9

00001001

 

73

01001001

I

137

10001001

201

11001001

 

10

00001010

 

74

01001010

J

138

10001010

Š

202

11001010

 

11

00001011

 

75

01001011

K

139

10001011

203

11001011

 

12

00001100

 

76

01001100

L

140

10001100

Œ

204

11001100

 

13

00001101

 

77

01001101

M

141

10001101

 

205

11001101

 

14

00001110

 

78

01001110

N

142

10001110

 

206

11001110

 

15

00001111

 

79

01001111

O

143

10001111

 

207

11001111

 

16

00010000

 

80

01010000

P

144

10010000

 

208

11010000

 

17

00010001

 

81

01010001

Q

145

10010001

209

11010001

 

18

00010010

 

82

01010010

R

146

10010010

 

210

11010010

 

19

00010011

 

83

01010011

S

147

10010011

 

211

11010011

 

20

00010100

 

84

01010100

T

148

10010100

 

212

11010100

 

21

00010101

 

85

01010101

U

149

10010101

 

213

11010101

 

22

00010110

 

86

01010110

V

150

10010110

 

214

11010110

 

23

00010111

 

87

01010111

W

151

10010111

 

215

11010111

 

24

00011000

 

88

01011000

X

152

10011000

 

216

11011000

 

25

00011001

 

89

01011001

Y

153

10011001

 

217

11011001

 

26

00011010

 

90

01011010

Z

154

10011010

 

218

11011010

 

27

00011011