2.4b: Character Storage

Exam Board:
OCR

Specification:
2020

What is a Character Set?

A character set is a table that matches together a character and a binary value.

 

Each character in a character set has a unique binary number matched with it.


Character sets are necessary as they allow computers to exchange data and humans to input characters.

Two common character sets are ASCII and Unicode:

H

=

01001000

ASCII

Unicode

ASCII (American Standard Code for Information Interchange) is a common character set which does not take up much memory space

 

It is important to understand that the number of characters that can be stored is limited by the bits available - ASCII uses 1 byte (8 bits) which only gives 256 possible characters.


This is enough for the English language but it can’t be used for other languages or all punctuation symbols.

Unicode is a more popular character set because it uses 2 bytes (16 bits) that allow for 65,536 possible characters.

 

The extra byte allows many different languages to be represented, as well as thousands of symbols and emojis


However Unicode requires more memory to store each character than ASCII as it uses an extra byte.

Character sets are logically ordered.

 

For example, the binary code for A is 01000001, B is 01000010 and C is 01000011 as the code increases by 1 with each character.

The file size of a text file is calculated as shown below:

bits per character x number of characters

Example:

A small text file uses the ASCII character set (which uses 8 bits per character).

There are 300 characters in the file.

300 x 8 = 2,400 bits

This could be simplified as 300 bytes or 0.3 kilobytes.

Monochrome on Transparent.png

Questo's Questions

2.4b - Character Storage:

 

1. What is a character set and why are they needed[2]

2. Describe 3 differences between ASCII and Unicode. [6]

3. The binary code for the character P in ASCII is 01010000. State what the binary code for the character S would be. [1]

4a. A text file uses the ASCII character set and contains 400 characters. What would the file size be in kilobytes[2]

4b. A text file uses the Unicode character set and contains 150 characters. What would the file size be in kilobytes
[2]

File Size of Text Files

01101010 = 256 possible characters

8 bits (1 byte)

1000101101001111 = 65,536 possible characters

16 bits (2 bytes)