ASCII table

An Overview of the ASCII Table (American System Code for Information Interchange)

ASCII table (American System Code for Information Interchange) is a character encoding standard used in computers and other devices for text files. ASCII is a subset of Unicode with a character set of 128 symbols. The symbols include capital and lowercase letters, numerals, punctuation marks, memorable characters, and control characters. Each symbol in the character set has an equal Hexadecimal and Octal value and a Decimal value spanning from 0 to 127.

The ASCII is a character encoding standard for electronic communication. Text is represented using ASCII code in computers, telecommunications equipment, and other devices. Although they enable many more characters, most current character-encoding methods are based on ASCII.

The Internet Assigned Numbers Authority (IANA) favours the designation US-ASCII for this character encoding. One of the IEEE milestones is ASCII.

ASCII

Telegraph coding inspired the development of ASCII table. It was initially used commercially as a seven-bit teleprinter code promoted by Bell data services. The inaugural meeting of the American Standards Association’s (ASA) (now the American National Standards Institute or ANSI) X3.2 subcommittee in May 1961 marked the start of work on the ASCII standard. The standard’s initial version was issued in 1963, it underwent a substantial modification in 1967, and it was most recently updated in 1986. Compared to previous telegraph codes, the planned Bell code and ASCII were both arranged for more convenient list sorting (i.e., alphabetization) and introduced functionality for devices other than teleprinters].

ASCII
Credit: Science Buddies

In 1969, the usage of ASCII table format for network interchange was defined. [9] In 2015, that paper was formally upgraded to the status of Internet Standard.

ASCII, originally based on the English alphabet, encodes 128 defined characters into seven-bit integers, as seen by the ASCII chart above. The numerals 0 to 9, lowercase letters a to z, capital letters A to Z, and punctuation marks are among the 95 encoded characters that are printable. Furthermore, the original ASCII definition includes 33 non-printing control codes that originated with Teletype machines; most of these are now obsolete, while a few, such as the carriage return, line feed, and tab codes, are still widely used.

Lowercase I, for example, would be represented by binary 1101001 = hexadecimal 69 I is the ninth letter) = decimal 105 in the ASCII encoding.

History of ASCII

ASCII table was created under the supervision of the American Standards Association (ASA). The American Standards Association (ASA) evolved into the United States of America Standards Institute (USASI) and, eventually, the American National Standards Institute (ANSI).

With the additional special characters and control codes filled in, ASCII was published as ASA X3.4-1963, leaving 28 code places with no assigned meaning and one unassigned control code. There was a substantial dispute about whether additional control characters should be used instead of the lowercase alphabet. The hesitation was short-lived: in May 1963, the CCITT Working Party on the New Telegraph Alphabet advocated assigning lowercase letters to sticks 6 and 7, and in October, the International Organization for Standardization TC 97 SC 2 opted to include the adjustment in its draught standard.

At its May 1963 meeting, the X3.2.4 workgroup approved the switch to ASCII. Locating the lowercase letters on sticks 6 and 7 led the characters’ bit patterns to deviate from upper case in a single bit, simplifying case-insensitive character matching and the design of keyboards and printers.

Other modifications made by the X3 committee included the addition of new characters (the brace and vertical bar characters), the renaming of some control characters (SOM became the start of a header (SOH)), and the relocation or removal of others (RU was removed). As a result, ASCII was modified as USAS X3.4-1967, then USAS X3.4-1968, the ANSI X3.4-1977, and ultimately ANSI X3.4-1986.

Revisions of the ASCII table standard:

  • ASA X3.4-1963
  • the ASA X3.4-1965 (approved, but not published, nevertheless used by IBM 2260 & 2265 Display Stations and IBM 2848 Display Control)
  • USAS X3.4-1967
  • the USAS X3.4-1968
  • ANSI X3.4-1977
  • ANSI X3.4-1986[
  • the ANSI X3.4-1986 (R1992)
  • ANSI X3.4-1986 (R1997)
  • ANSI INCITS 4-1986 (R2002)
  • the ANSI INCITS 4-1986 (R2007)
  • (ANSI) INCITS 4-1986[R2012]
  • the (ANSI) INCITS 4-1986[R2017]

The X3 committee also addressed the transmission methods of ASCII (least significant bit first) and recorded on perforated tape in the X3.15 standard. They suggested a 9-track magnetic tape standard and experimented with various punched card formats.

 Bit Width

Based on prior teleprinter encoding methods, the X3.2 subcommittee created ASCII. Like other character encodings, ASCII provides a relationship between digital bit patterns and character symbols (i.e. graphemes and control characters). It enables digital devices to communicate with one another and analyze, store, and transmit character-oriented data such as written language. Before the development of ASCII, the encodings contained 26 alphabetic letters, ten numerical digits, and between 11 and 25 unique visual symbols. More than 64 codes were necessary for ASCII table to incorporate all of these.

ITA2 was based on Émile Baudot’s 5-bit telegraph code, which he devised in 1870 and patented in 1874.

The committee discussed the idea of a shift function (similar to that found in ITA2). It would allow more than 64 codes to be represented by a six-bit code. Some character codes in a shifted code determine the alternatives for the following character codes. It provides concise encoding but is less dependable for data transmission since a shift code error often renders a large portion of the transmission illegible. Because the standards group opted against moving, ASCII mandated at least a seven-bit code.

The committee explored eight-bit coding because eight bits (octets) would allow two four-bit patterns to encode two digits with binary-coded decimal efficiently. However, any data transmission would have to send eight bits when seven would be enough. To reduce data transmission costs, the group decided to utilize seven-bit coding. Because perforated tape could record eight bits in one spot at a time, it also provided for a slight bit for error checking if necessary. Eight-bit devices (with octets as the native data type) that did not perform parity checking would usually set the eighth bit to 0.

ASCII
Credit: Wikimedia

Internal Organization

The code in ASCII table itself was designed so that most control codes and all visual codes were grouped for ease of recognition. The first two “ASCII sticks” (32 places) were set aside for control characters. The “space” character had to appear before images to simplify sorting. Therefore it became position 20. For the same reason, numerous unique signs typically used as separators were placed before numbers. The committee determined that it was critical to enable uppercase 64-character alphabets and opted to design ASCII to be readily reduced to a useable 64-character set of graphic codes, as was done in the DEC SIXBIT code (1963).

As a result, lowercase letters were not interspersed with uppercase letters. Instead, unique and numeric codes were placed before the letters to keep lowercase letters and other visual options accessible. For example, the letter A was placed in position 41 to match the draught of the related British standard. The numbers 0–9 are prefixed with 011, while the remaining 4 bits match their binary values, making the conversion to binary-coded decimal simple.

Many non-alphanumeric characters were repositioned to match their shifting location on typewriters; an essential distinction is that they were based on mechanical typewriters rather than electric typewriters. For example, Remington No. 2 (1878), the first typewriter with a shift key, set the standard for mechanical typewriters. The shifted values of 23456789- were used “#$ per cent _&'() – early typewriters omitted 0 and 1, replacing them with O (capital letter o) and l (lowercase letter L), but 1! and 0) pairs became common once 0 and 1 became common. As a result, in ASCII! “#$ per cent were put in places 1–5 of the second stick, corresponding to the digits 1–5 of the neighbouring stick. However, the parenthesis could not equate to 9 and 0 because the space character took the spot corresponding to 0.

It was addressed by eliminating the (underscore) from 6 and rearranging the remaining letters. It matched the parenthesis with 8 and 9 on many European typewriters. This difference between typewriters and bit-paired keyboards resulted in bit-paired keyboards, most notably the Teletype Model 33, which employed the left-shifted layout corresponding to ASCII rather than standard mechanical typewriters. Electric typewriters, mostly the IBM Selectric (1961), used a slightly different layout that has since become standard on computers. It followed the IBM PC (1981), particularly the Model M (1984). Thus, the shift values for the symbols existing in modern keyboards do not come as close to the ASCII table.

The /? pair was also used on the No. 2, as were the,.> pairings (other keyboards, including the No. 2, did not shift, (comma) or. (full stop), allowing them to be used in uppercase without unshifting). On the other hand, ASCII divided the ;: pair (dating from No. 2) and rearranged mathematical symbols (various norms, frequently -* =+) to:*;+ -=.

Some frequent letters, particularly 1214, were omitted, while ‘ was added as a diacritic for international usage, and > for mathematical use. These are along with the simple line symbols | (in addition to conventional /). The @ sign was not used in continental Europe, and the committee anticipated that an accented French variant would replace it. Thus it was put in position 40, immediately before the letter A.

The control codes deemed necessary for data transmission was the start of message (SOM), end of the address (EOA), end of message (EOM), end of transmission (EOT), “who are you?” (WRU), “are you?” (RU), reserved device control (DC0), synchronous idle (SYNC), and acknowledge (ACK). These were placed so that the Hamming distance between their bit patterns was as little as possible.

Character Order

ASCII-code order is often referred to as ASCIIbetical order. Data collection is occasionally done in this order rather than the “normal” alphabetical approach (collating sequence). The following are the most significant departures from ASCII order:

Uppercase letters always come before lowercase letters; for example, “Z” comes before “a.” Likewise, digits and various punctuation marks always appear before letters. Before comparing ASCII data, an intermediary order changes uppercase characters to lowercase.

Control Characters

ASCII table preserves the initial 32 codes (numbers 0–31 decimal) for control characters: codes that were initially intended to control devices (such as printers) that use ASCII or provide meta-information about data streams such as those stored on magnetic tape, rather than to represent printable information.

For example, letter 10 symbolizes the “line feed” function (which causes a printer’s paper to advance) while character 8 indicates “backspace.” Non-whitespace control characters are defined by RFC 2822 as control characters that do not contain carriage return, line feed, or white space. ASCII does not provide any system for expressing the structure or look of text inside a document, except control characters that specify basic line-oriented formatting. Other techniques include markup languages, page layout and formatting, and document layout and formatting.

In the original ASCII standard, only brief descriptive sentences were utilized for each control character. The ambiguity this created was sometimes purposeful, such as when a character was used somewhat in a different way on a terminal link as compared to a data stream, and sometimes unintentional, such as when the meaning of “delete” was unclear.

ASCII
Credit: Delightly Linux

The Teletype Model 33 ASR, a printing terminal with a paper tape reader/punch option, was probably the most significant single instrument on the interpretation of these characters. Until the 1980s, paper tape was a prevalent medium for long-term programme storage because it was less expensive and, in some respects, less brittle than magnetic tape. As a result, the machine assignments for codes 17, 19, and 127 on the Teletype Model 33 became de facto standards.

Model 33 was also unusual for interpreting the definition of Control-G (code 7, BEL, meaning warn the operator) literally, since the device featured an actual bell that sounded when a BEL character was received. As the keytop for the O key also displays a left-arrow symbol, a noncompliant use of code 15 (Control-O, Shift In) inteprets as “delete the previous character” was also adopted by many early timesharing systems but was eventually abandoned.

When a Teletype 33 ASR equipped with an automated paper tape reader received a Control-S (XOFF, an acronym for “transmit off”), the tape reader stopped, receiving a Control-Q (XON, “transmit on”) restarted the tape reader. This approach was used by various early computer operating systems as a “handshaking” signal advising a sender to cease transmission due to impending overflow; it is still used as a manual output control technique in many systems today. Control-S retains its meaning on specific systems, although Control-Q is replaced with a second Control-S to continue output. The 33 ASR could also be set to use Control-R (DC2) and Control-T (DC4) to start and stop the tape punch; on specific systems, the matching control character writing on the keycap above a specific letter was both TAPE and TAPE, respectively.

Delete & Backspace Key

Because the Teletype couldn’t move the head rearward, it couldn’t press a key on the keyboard to transmit a BS (backspace). Instead, there was a RUBOUT key that sent code 127. (DEL). The function of this key was to erase mistakes in a hand-typed paper tape: the operator had to back it up by pressing a button on the tape punch, then type the rubout, which filled all holes and replaced the mistake with an intended-to-be-ignored character. Teletypes were often utilized for Digital Equipment Corporation’s less-expensive computers. Therefore these systems had to use the available key and the DEL code to delete the preceding character.

As a result, DEC video terminals (by default) transmitted the DEL code for the “Backspace” key, while the “Delete” key sent an escape sequence, and many other terminals sent BS for the Backspace key. As a result, the Unix terminal driver could only use one code to remove the preceding character. It could be changed to either BS or DEL, but not both, leading to a lengthy aggravation period where users had to adjust it depending on the terminal they were using. Because it was assumed that no key delivered a BS, Control+H was used for various reasons, such as the “help” prefix command in GNU Emacs.

Escape Key

Many additional control codes have been assigned meanings vastly different from their originals. For example, the “escape” character (ESC, code 27) was designed to convey other control characters as literals rather than invoking their meaning. However, it is the exact meaning of “escape”, as seen in URL encodings, C language strings, and other systems where some characters are reserved.

This meaning has been co-opted and eventually altered over time. In current use, an ESC transmitted to the terminal generally marks the beginning of a command sequence. It is in the form of an “ANSI escape code” from ECMA-48 (1972) and its successors, beginning with ESC followed by a “[” character. An ESC transmitted from the terminal is most commonly used as an out-of-band character to end an operation, such as in the TECO and vi text editors. In addition, ESC often prompts a programme in the graphical user interface (GUI) and windowing systems to abort its current operation or exit (terminate) altogether.

Conclusion

ASCII table was first commercially used in 1963 as a 7-bit teletype code for the American Telephone & Telegraph’s TWX (TeletypeWriter eXchange) network. The TWX originally used the previous 5-bit ITA2, which was also used in competing Telex telex systems. Bob Bemer introduced features such as escape sequences. His English colleague Hugh McGregor Ross contributed to the spread of this work. Due to his extensive research on ASCII, Bemmer has been called the “Father of ASCII”. ASCII emerged as the most popular character encoding on the World Wide Web until December 2007, when UTF8 encoding surpassed it. UTF8 is backwards compatible with ASCII

Leave a Reply