Wednesday, 17 October 2012

Chapeter 2 Floating point

Floating point
1. Representing real numbers in a way that can support a wide range of values.
2. Number in scientific notation (normalized&not normalized)
3. In binary,decimal
4. In general, numbers are represented approximately to a fixed number of significant digits and scaled using an exponent.


 Floating point standard
1.Defined by IEEE std 754-1985
   -IEEE 754-1985 was technical standard for floating-point computation.
2.Develop in response to divergence of representation.
3.Now almost universally adopted
4. The standard provides definitions for single precision and double precision representations.

IEEE Floating-point format
1bit
8bits
23bits
Sign
Exponent
Mantissa/Significand
Single Precision Range:
• 32-bit: 1-bit sign + 8-bit exponent + 23-bit significand
• Range: 2.0 * 10^-38< N < 2.0 * 10^38
• Precision: ~7 significant (decimal) digits
• Used when exact precision is less important



1bit
11bits
52bits
Sign
Exponent
Mantissa/Significand
Double Precision Range:
• 64-bit: 1-bit sign + 11-bit exponent + 52-bit significand
• Range: 2.0 * 10^–308 < N < 2.0 * 10^308
• Precision: ~15 significant (decimal) digits
• Used for scientific computations

The sign(s) of binary floating-point number is represented by a single bit. A 1 bit indicates a negative number, and a 0 bit indicates a positive number.
M is the mantissa (000...000 to 111...111) and E is the biased exponent.
Single precision, bias = 127.
Double precision, bias = 1203.
Value = (-1)^s  x 1.M x 2^(E-bias)

For Example:
Represent  -0.4375
-0.4375=(-1)^1 * 1.11 * 2^(-2)
S=1
Fraction=11000....002
Exponent= -1+bias
Single = -1+127
=126
= 011111102
double= -1+1023
=1022
=011111111102

Single precision:
1
01111110
11000000..00

Double precision:

1
01111111110
11000000000..00

No comments:

Post a Comment