Floating point
1. Representing real numbers in a way that can support a wide range of values.
2. Number in scientific notation (normalized¬ normalized)
3. In binary,decimal
4. In general, numbers are represented approximately to a fixed number of significant digits and scaled using an exponent.
Floating point standard
IEEE Floating-point format
Double Precision Range:
• 64-bit: 1-bit sign + 11-bit exponent + 52-bit significand
• Range: 2.0 * 10^–308 < N < 2.0 * 10^308
• Precision: ~15 significant (decimal) digits
• Used for scientific computations
The sign(s) of binary floating-point number is represented by a single bit. A 1 bit indicates a negative number, and a 0 bit indicates a positive number.
M is the mantissa (000...000 to 111...111) and E is the biased exponent.
Single precision, bias = 127.
Double precision, bias = 1203.
Single precision:
Double precision:
2. Number in scientific notation (normalized¬ normalized)
3. In binary,decimal
4. In general, numbers are represented approximately to a fixed number of significant digits and scaled using an exponent.
Floating point standard
1.Defined by IEEE std 754-1985
-IEEE 754-1985 was technical standard for floating-point computation.
-IEEE 754-1985 was technical standard for floating-point computation.
2.Develop in response to divergence of representation.
3.Now almost universally adopted
4. The standard provides definitions for single precision and double precision representations.
IEEE Floating-point format
1bit
|
8bits
|
23bits
|
Sign
|
Exponent
|
Mantissa/Significand
|
Single Precision Range:
• 32-bit: 1-bit sign + 8-bit exponent + 23-bit significand
• Range: 2.0 * 10^-38< N < 2.0 * 10^38
• Precision: ~7 significant (decimal) digits
• Used when exact precision is less important
1bit
|
11bits
|
52bits
|
Sign
|
Exponent
|
Mantissa/Significand
|
• Range: 2.0 * 10^–308 < N < 2.0 * 10^308
• Precision: ~15 significant (decimal) digits
• Used for scientific computations
The sign(s) of binary floating-point number is represented by a single bit. A 1 bit indicates a negative number, and a 0 bit indicates a positive number.
M is the mantissa (000...000 to 111...111) and E is the biased exponent.
Single precision, bias = 127.
Double precision, bias = 1203.
Value = (-1)^s x 1.M x 2^(E-bias)
For Example:
Represent -0.4375
-0.4375=(-1)^1 * 1.11 * 2^(-2)
S=1
Fraction=11000....002
Exponent= -1+bias
Single = -1+127
=126
= 011111102
double= -1+1023
=1022
=011111111102
For Example:
Represent -0.4375
-0.4375=(-1)^1 * 1.11 * 2^(-2)
S=1
Fraction=11000....002
Exponent= -1+bias
Single = -1+127
=126
= 011111102
double= -1+1023
=1022
=011111111102
Single precision:
1
|
01111110
|
11000000..00
|
Double precision:
1
|
01111111110
|
11000000000..00
|
No comments:
Post a Comment