COA: Computer Organisation & Architecture Floating Point Representation

December 30, 2021

What we will cover

Why do we need floating point representation?

Format of Floating Point Representation.

Limitations of traditional float point representation.

So, if we have fixed point representation why do we need floating point representation?

And answer to our question is very simple, we humans are very greedy!

"We want more in least efforts!"

Means we want to store more number in less space. And we will give the answers of all your questions.

Why do we need floating point representation?

It provides larger range of numbers with limited number of bits.

Comparison from traditional floating point representation

Let's assume that we have a register of 8 B so the range of the number that we can store in two's complement will be -2^7 to 2^7-1. Means from -128 to 127.

Remember this above range and you will see at the end of this topic, how IEEE 754 is better in storing.

Format of Floating Point Representation

There are three terms in the representation which I will explain you one by one. Firstly be familiar with their names.

Signed Bit (S): It shows whether the value stored is positive or negative.

When, S is 0 means the number is positive and when S is1 in the number is negative.

Exponent (E): Exponent is stored in Biased form. Biased means only unsigned number.

Mentissa (M): M is mentissa, it is also called fraction. Mantissa is a signed normalised (Implicit / Explicit) fraction number.

Biased E

Assume, we have a register in which E is of 5 bits.

Step 1: simply find the range

Range of number with 5 bits will be from -2^4 to 2^4-1 means -16 to 15.

Step 2: Transform to Biased from

So, -16 to 15 can transform to

0 to 31 (-16+16 to 15+16 we add 16 both the side) hence, bias will be 16 in this case and also called excess-16 code. By looking at the table you will get more clarity.

Hence if k bits are used to store E then,
Bias = 2^k-1
E = e + bias

Where "e" Is original exponent and "E" Is biased exponent. So, remember that formula.

Mentissa (M) : using Normalisation

There are two type of normalisation standardized in traditional float point representation.

Implicit normalisation
Explicit normalisation

Let's understand these by taking and example. Let's suppose we have a value: (101.11)2

Implicit normalisation

Step 1: in implicit normalisation we have to shift the point exactly after the first one (1) of the value.

1.0111 * 2^2

Many of you have confused with this shifting. I will give you an analogy and all of you will be cleared with that.

Analogy

Suppose, you have a number 197.33 in decimal and you have to shift the point after one. And you will simply tell it's 1.9733 * 10^2 and all of you agree on that. Here the base (radix) of decimal is 10 so we write in power of 10 while, in binary the base (radix) is 2 then, we write in power of 2. Getting the point.

So getting back to the point we have normalised the value (101.11)2 implicitly to 1.0111 * 2^2

Step 2: the number after the point is your mentissa M

1.0111 *2^2

M = 0111 and in the power of two is your e (original exponent) e = 2.

E = 2 + bias

Explicit Normalisation

It is very simple my friend, as you have seen in implicit you have to ensure that the point is just after the first one. While in explicit normalisation you have to ensure that point is before the first one. Let's see how.

We have the same number remember (101.11)2

Step 1: Take the point just before the first one. Then it will be-

0.10111 * 2^3

Step 2: The number after the point will be our M (Mentissa) and the number in two's power is our e (original exponent).

In our example it will be-

0.10111 * 2^3

Here, M = 10111 and e = 3

So, E = 3 + bias

By default use Explicit Normalisation

Use Explicit Normalisation if not given in problem in case of classical Floating Point Representation.

Value Formula

Question Time

Try to solve the problem on your own but I will solve some for you here and give TIY (Try it yourself) as assignment.

Q1. Consider a 16-bit register used to store floating point numbers. The mantissa is normalized signed fraction number. Exponent is represented in excess-32 form. What is the 16-bit value for +(19.25)10 in this register.

Ans.

Try it yourself Question

Consider a 16-bit register used to store floating point numbers. The mantissa is implicitly normalized signed fraction number. Exponent is represented in excess-64 form. What is the 16-bit value for -(27.625)10 in this register?

What's Next?

Next - IEEE 754 Floating Point Representation

After Next - Components of Computer

Arin CS