Jürgen Welz
Why would anyone want to query on a bit value? The idea is that we can store more information in less space and retrieve more information with less traffic. Not everyone feels this is important because the cost of data storage continues to decline and processors are getting faster. There are, however, circumstances where data transfer is choked by bandwidth limitations, and this is where the most benefit can be reaped. I have also yet to hear anyone using an application complain that it is too fast. The downside is that by compressing data into a single field, the data is obscured, and conventional query methods aren’t as applicable. Furthermore, storing multiple attributes of a record in a single field violates the rule of data normalization that requires data to be atomic.
Atomic data is data where only a single attribute of a record is stored in a field. An example of a violation of this rule is the combination of first and last name in a record of attributes of a person. Since it may be desirable to locate, query, or sort records by either first or last name, one may conclude that they are really separate attributes and as separate attributes, each should have its own field. Although it is possible to devise a query in Access that will split the attributes in order to constrain a recordset appropriately, this is generally done with a user-defined function creating a calculated field and support for user-defined functions is not universal across databases. In addition, the necessity of creating a calculated field means that it becomes necessary to perform the calculation on each record in the data being queried and this fact denies the option of placing an index on the field, an option that would have been available had the data been divided between first and last name fields at the time of data entry. In this case, the violation of atomicity makes the data more complex and slows down any analysis. Perhaps the best argument that can be made for this approach is the situation where there are more attributes of a record than fields permitted in a table. By combining fields, it may be possible to keep within Access’ 255 field limit per table rather than creating an additional 1-1 table of additional fields, which is another violation of normalization principles. The loss of ability to index a field on which records are likely to be selected or sorted would mandate the latter form of violation.
Programs like Access run on computers working with binary data and addressing. Given that we are educated in a decimal numeric system and a character-based language system, there is considerable translation required from one system to others in order to make computers useable and comprehensible to the average user.
The fundamental unit of storage of data is the bit, from binary digit. This unit is a single switch that has one of two possible states. In the computer, the bit represents either a high or a low voltage that can represent either a zero or a one in a binary number system. Think of a light bulb that is on or off and either state can also be abstracted to mean true or false. A bit all by itself doesn’t have a lot to convey, but if you have 3 bits, you can represent a conventional traffic light but you need some more to write a letter. Morse code uses a sequence of two state information, dots and dashes, to represent the alphabet. Our computers use not a sequence but a combination of zeros and ones to represent letters, numerals, capitalization, and punctuation characters. In order to represent all the possible combinations and allow a few extras for controlling printers, it turns out that 8 bits can be combined in 256 unique patterns, each of which can uniquely represent a single character. For this reason, the byte has become the basic unit of data storage in our computers.
In binary math terms, a byte is another kind of critter altogether. Binary math has to do with a number system that has two possible values. We assign the values as zero and one and doing so maps well to the high/low voltage states used by our computers. We grew up with a base 10 number system and the binary number system is base 2. With base 10, there are 10 possible values a digit can take, 0 through 9 and base 2 gives us only two values, 0 or 1. Any number can be represented in either number system but the binary representation isn’t always intuitive to people versed, schooled and inculcated in a base 10 system.
In base 10, the number four thousand and twenty seven is written 4027. This means 4 thousand plus 0 hundred plus 2 tens plus 7 ones. This can also be written as 4 times the base raised to the power of the position of the fours digit minus one plus 0 times the base raised to the power of the position of the 0 bit minus 1 plus 2 times the power of the twos digit minus one plus 7 times 10 raised to the power of the position of the 7 digit minus 1. Rewritten again, this is:
4*10^(4–1) + 0*10^(3-1) + 2*10^(2–1) + 7*10^(1–1) or:
4*10^3 + 0*10^2 + 2*10^1 + 7*10^0 or:
4*1000 + 0*100 + 2*10 + 7*1 or:
4000 + 0 + 20 + 7
We don’t need to go through all these machinations to get to the number in base 10 since we understand it intuitively. However, the same mathematical rules do work in any other base number system including base 2. Here’s an example using base 1000. The value 123,456,789,012 can be considered to be 4 ‘digits’ separated by commas. Each possible combination of all 3 numeric characters can be mapped to a unique quantity and when counting, as the quantity reaches the base, the number resets to 000 and increments the next ‘digit.’ Applying the rules of the base 10 example:
123*1000^(4-1) + 456*1000^(3-1) + 789*1000^(2-1) + 012 *1000^(1-1) or:
123*1000^(3) + 456*1000^(2) + 789*1000^(1) + 012 *1000^(0) or:
123*1,000,000,000 + 456*1,000,000 + 789*1,000 + 012 *1 or:
123,000,000,000 + 456,000,000 + 789,000 + 012 or:
123,456,789,012
Now base 1000 is still intuitively a kind of base 10. You have to wrap your mind around the fact that there are bases that don’t use only the characters 0 through 9. For example, an octal number system wraps after 7. Decimal wraps after 9. Hexadecimal wraps after 15 so we need to invent digits to mean 10, 11, 12, 13, 14 and 15 and these are commonly mapped to A, B, C, D, E and F, respectively. Consider seconds, degrees, or seconds, minutes, and hours—all which wrap at 59 (base 60) or dozens (gross) or inches and feet, which wrap at 12. For those of you who have seen movies where prisoners scratch off days with vertical lines up to four and then a line through them at five, the grouping is base 5 but the number system is in reality a base one, where you wrap to the next character upon every increment.
That brings us to base 2. 10100101 is a typical byte, a combination of 8 bits that can represent a character but is also a number. Rewritten:
1*2^(8–1) + 0*2^(7–1) + 1*2^(6–1) + 0*2^(5–1) + 0*2^(4–1) + 1*2^(3–1) + 0*2^(2–1) + 1*2^(1–1) or:
1*2^7 + 0*2^6 + 1*2^5 + 0*2^4 + 0*2^3 + 1*2^2 + 0*2^1 + 1*2^0 or:
128 + 0 + 32 + 0 + 0 + 4 + 0 + 1 or:
165
A couple of things are apparent. To increment a number in any number system, you increase the digit from zero until you get to the highest number in the number system and then you increment the next digit which is what I called wrap earlier. If it’s already the highest number, you move on to the next digit, and the next like an old mechanical odometer in a car. I’ve included some more information as to the byte numeric equivalent of several characters based on the US keyboard (Table 1). You can test them out in the Debug window easily enough.
Table 1
|
0 |
00000000 |
|
64 |
01000000 |
@ |
128 |
10000000 |
€ |
192 |
11000000 |
|
|
1 |
00000001 |
|
65 |
01000001 |
A |
129 |
10000001 |
|
193 |
11000001 |
|
|
2 |
00000010 |
|
66 |
01000010 |
B |
130 |
10000010 |
|
194 |
11000010 |
|
|
3 |
00000011 |
|
67 |
01000011 |
C |
131 |
10000011 |
ƒ |
195 |
11000011 |
|
|
4 |
00000100 |
|
68 |
01000100 |
D |
132 |
10000100 |
„ |
196 |
11000100 |
|
|
5 |
00000101 |
|
69 |
01000101 |
E |
133 |
10000101 |
… |
197 |
11000101 |
|
|
6 |
00000110 |
|
70 |
01000110 |
F |
134 |
10000110 |
† |
198 |
11000110 |
|
|
7 |
00000111 |
|
71 |
01000111 |
G |
135 |
10000111 |
‡ |
199 |
11000111 |
|
|
8 |
00001000 |
|
72 |
01001000 |
H |
136 |
10001000 |
ˆ |
200 |
11001000 |
|
|
9 |
00001001 |
|
73 |
01001001 |
I |
137 |
10001001 |
‰ |
201 |
11001001 |
|
|
10 |
00001010 |
|
74 |
01001010 |
J |
138 |
10001010 |
Š |
202 |
11001010 |
|
|
11 |
00001011 |
|
75 |
01001011 |
K |
139 |
10001011 |
‹ |
203 |
11001011 |
|
|
12 |
00001100 |
|
76 |
01001100 |
L |
140 |
10001100 |
Œ |
204 |
11001100 |
|
|
13 |
00001101 |
|
77 |
01001101 |
M |
141 |
10001101 |
|
205 |
11001101 |
|
|
14 |
00001110 |
|
78 |
01001110 |
N |
142 |
10001110 |
|
206 |
11001110 |
|
|
15 |
00001111 |
|
79 |
01001111 |
O |
143 |
10001111 |
|
207 |
11001111 |
|
|
16 |
00010000 |
|
80 |
01010000 |
P |
144 |
10010000 |
|
208 |
11010000 |
|
|
17 |
00010001 |
|
81 |
01010001 |
Q |
145 |
10010001 |
‘ |
209 |
11010001 |
|
|
18 |
00010010 |
|
82 |
01010010 |
R |
146 |
10010010 |
|
210 |
11010010 |
|
|
19 |
00010011 |
|
83 |
01010011 |
S |
147 |
10010011 |
|
211 |
11010011 |
|
|
20 |
00010100 |
|
84 |
01010100 |
T |
148 |
10010100 |
|
212 |
11010100 |
|
|
21 |
00010101 |
|
85 |
01010101 |
U |
149 |
10010101 |
|
213 |
11010101 |
|
|
22 |
00010110 |
|
86 |
01010110 |
V |
150 |
10010110 |
|
214 |
11010110 |
|
|
23 |
00010111 |
|
87 |
01010111 |
W |
151 |
10010111 |
|
215 |
11010111 |
|
|
24 |
00011000 |
|
88 |
01011000 |
X |
152 |
10011000 |
|
216 |
11011000 |
|
|
25 |
00011001 |
|
89 |
01011001 |
Y |
153 |
10011001 |
|
217 |
11011001 |
|
|
26 |
00011010 |
|
90 |
01011010 |
Z |
154 |
10011010 |
|
218 |
11011010 |
|
|
27 |
00011011 |