Tuesday, December 10, 2019

Floating Point Number Representation †Free Samples to Students

Question: Discuss about the Floating Point Number Representation. Answer: Introduction: The IEEE-754 is a number format on the computer system that occupies 4 bytes of memory in the computer system. It is also referred to as binary32 as the representation requires only 32 bits of memory. The format of the IEEE-754 32 bit single precision format is represented below: Figure: Single precision format Source: Kumar Basha, 2016 The IEEE 754 32 bit Single precision format consists of three components: Sign bit: 1 bit Exponent bit: 8 bit Significand precision: 24 in which 23 bits are explicitly stored. The signed bit represents the sign of the integer which represents positive as well as negative values. The 8 bits represents the exponent in signed format ranging from -127 to 128 as well as unsigned format ranging from 0 to 255 (Hou et al., 2017). The true significant bit is represented in the 23 fraction bits which following the exponent bit. An example of the IEEE 754 32 bit single precision format: Let us consider a value 0.25 in decimal. The 32 bit single precision format would be represented as: (0.25)10 can be considered as (1.0)2 * 2-2 The analysis of the above equation states that the exponent is -2 which can be represented in the biased form as 127-2=125. 125 can be further represented in binary form as 0111 1101. The fraction is 0 as the numbers following the right of the binary point in 1.0 are all zeros. Thus, the 23 significand bit representation consists of 00000000000000000000000. Thus, the complete representation of the number 0.25 in the 32 bit single precision format is as follows: 0 01111101 00000000000000000000000 IEEE-754 64-bit Double-Precision Floating-Point Numbers The IEEE 64 bit double precision number incorporates the capability to store 64 bit precision number. It occupies two adjacent storage locations in the computers memory. It is most commonly used in the PCs due to its wider range of information storage precision. The single precision format lacks the actual precision of the integer format, thus double precision format is more commonly used. The IEEE-754 64-bit double precision format is represented below: According to the figure, the format consists of the following three components: Sign bit: 1 bit Exponent bit: 11 bits Significand bits: 54 in which 23 bits are explicitly stored. Example: The exact value of the 64-bit double precision is given by, (-1)sign * (1.b51b50..b0)2 * 2e-1023 Where, sign stands for sign of integer and e stands for exponent. The number 1 can be represented as: The fixed point representation of a number includes three components, sign bit, integer field, and the fractional field. The sign bit is 1 bit, integer field is 15 bit and the fractional field is 16 bit. But in the floating point representation the integer field consists of either 8 bit or 11 bits, the remaining bits are the fractional part in both the representations (Lindstrom, Lloyd Hittinger, 2018). Moreover, the fixed point representation can represent smaller numbers but the floating point representation presents wider range of numbers. Reference Fulzele, S., Ghodke, V. (2015). Novel Technique for Parallel Pipeline Double Precision IEEE-754 Floating Point Adder.International Journal Of Engineering And Computer Science,4(06). Hou, J., Zhu, Y., Shen, Y., Li, M., Wu, H., Song, H. (2017, December). Tackling Gaps in Floating-Point Arithmetic: Unum Arithmetic Implementation on FPGA. InHigh Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on(pp. 615-616). IEEE. Kumar, B. V. V., Basha, S. M. (2016). Design and Simulation of Single-Precision Inexact Floating-Point Adder/Subtractor.i-Manager's Journal on Electronics Engineering,6(4), 7. Lindstrom, P., Lloyd, S., Hittinger, J. (2018). Universal Coding of the Reals: Alternatives to IEEE Floating Point.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.