Floating Point Representation
Very
large integer and very small fractions, a computer must be able to represent
numbers and operate on them in such a way that the position of the binary point
is variable and is automatically adjusted as computation proceeds. In this case
binary numbers are said to float and the numbers are called floating point
numbers.
The floating-point
representation has three fields: sign, significant digits and exponent. Let us
to consider the number 111101.1000110 represented in the floating-point format.
To represent the above number in floating point number, first binary point is
shifted to right of the first bit and the number is multiplied by the correct
scaling factor to get the same value. It is important that the base in the
scaling factor is fixed 2. The string of the significant digits is commonly
known as mantissa.
The number should be in the normalized form and is given as
In the
above example, we can say that,
Sign = 0
(this number is positive)
Mantissa
= 111011001110 (Significant Digits)
Exponent
= 5
There are
two types of IEEE standard:
1. 32-bit
standard (Single-precision representation)
2. 64-bit
standard (Double – precision representation)
Single Precision Representation
Field 1
Sign = 1-bit
Field 2
Exponent = 8-bit
Field 3 Mantissa = 23-bit
Instead
of the signed exponent, E, the value actually stored in the exponent field is
E’ = E
(Scaling factor) + Bias (127)
Here Bias
is 127, so it is known as excess-127 format.
Double Precision Representation
Field 1
Sign = 1-bit
Field 2
Exponent = 11-bit
Field 3 Mantissa = 52-bit
Instead
of the signed exponent, E, the value actually stored in the exponent field is
E’ = E
(Scaling factor) + Bias (1023)
Here Bias
is 1023, so it is known as excess-1023 format.
Example:
Represent
1259.12510 in single precision and double precision formats.
Solution:
Step 1: Convert
Decimal Numbers in to binary
1259 =
100 1110 1011
0.125 =
0.001
Binary
number = 1 0 0 1 1 1 0 1 0 1 1 + 0. 0 0 1 = 1 0 0 1 1 1 0 1 0 1 1. 0 0 1
Step 2: Normalize
the number
1 0 0 1 1
1 0 1 0 1 1. 0 0 1 = 1. 0 0 1 1 1 0 1 0 1 1 0 0 1 x 210
Step 3: Single
precision representation
Here S=0
(because given number is positive)
E=10
(exponent)
M = 0 0 1
1 1 0 1 0 1 1 0 0 1
Bias for
a single precision format = 127
E’ = E +
127 = 10+127 = (137)10 = (1 0 0 0 1 0 0 1)2
Number in
single precision format is given:
Step 4: Double precision representation
Here S=0
(because given number is positive)
E=10
(exponent)
M = 0 0 1
1 1 0 1 0 1 1 0 0 1
Bias for
a double precision format = 1023
E’ = E + 1023 = 10+1023 = (1033)10 = (1 0 0 0 0 0 0 1 0 0 1)2
Number in double precision format is given:
To learn more about Single and Double Precision, Click here
No comments:
Post a Comment