Converting int to float in C (part 2)

In this post, we’ll program an int to float converter that will handle the whole range of int values . We’ll base our work on the results of the previous converter which didn’t handle rounding effects.

This is also a solution for exercise 2.97 of the Computer Systems: A Programmer’s Perspective book.

The initial converter

We’ll start working with the following implementation of the converter:

/* Convert 32-bits int to float */

float_bits float_i2f(int i) {

/* Special case : 0 is not a normalized value */

if (i==0)

return 0;

/* sign bit */

unsigned s = i>>31;

/* Exponent */

unsigned E = (int) (log(i<0 ? -i : i)/log(2));

unsigned exp = E + 127;

/* Magnificand*/

unsigned M= i>0 ? i : -i;

unsigned frac = M ^ (1<<E);

/* Move frac to start at bit postion 23 */

if (E>23)

/* Too long: Truncate to first 23 bits */

frac >>= E-23;

else

/* Too short: Pad to the right with zeros */

frac <<= 23-E;

return s<<31 | exp<<23 | frac;

}

/* Convert 32-bits int to float */ float_bits float_i2f(int i) { /* Special case : 0 is not a normalized value */ if (i==0) return 0; /* sign bit */ unsigned s = i>>31; /* Exponent */ unsigned E = (int) (log(i<0 ? -i : i)/log(2)); unsigned exp = E + 127; /* Magnificand*/ unsigned M= i>0 ? i : -i; unsigned frac = M ^ (1<<E); /* Move frac to start at bit postion 23 */ if (E>23) /* Too long: Truncate to first 23 bits */ frac >>= E-23; else /* Too short: Pad to the right with zeros */ frac <<= 23-E; return s<<31 | exp<<23 | frac; }

/* Convert 32-bits int to float */
float_bits float_i2f(int i) {
  /* Special case : 0 is not a normalized value */
   if (i==0)
    return 0;
  /* sign bit */
  unsigned s = i>>31;
  /* Exponent */
  unsigned E = (int) (log(i<0 ? -i : i)/log(2));
  unsigned exp = E + 127;
  /* Magnificand*/
  unsigned M= i>0 ? i : -i;
  unsigned frac = M ^ (1<<E);
      
  /* Move frac to start at bit postion 23 */
  if (E>23)
    /* Too long: Truncate to first 23 bits */
    frac >>= E-23;
  else
    /* Too short: Pad to the right with zeros */
    frac <<= 23-E;
  
  return s<<31 | exp<<23 | frac;
}

The details of which we covered in this post.

First of all, let’s see which cases aren’t correctly handled.

Understanding rounding effects (round-to-even)

Single-precision floating-point has a limited precision of 23 bits for its frac field. Numbers that require more than 23 bits for their fraction part when encoded as floats will have to be rounded.

For example, let’s take the number 123456789 (in decimals). Converting to binary, we’ll have:

And expressing it as a binary fraction:

As it can be seen, there are 26 significant digits after the binary point. Since the frac field can only accommodate for 23 of them, we won’t be able to exactly express this value. It will have to be rounded.

C handles rounding with the round-to-even mode. In this mode, we round to the closest possible value. See that for 123456789 in binary, we’ll round-off the last 3 digits: 101. Since the highest order bit of 101 is 1, we’ll add one to the least significant digit of the rounded value:

See that the new binary fraction is only 23 digits long after the binary point. The last digit was enclosed in parenthesis for clarity. If we had merely truncated to the 23rd digit after the binary point, it would have been 0. Because of rounding, we added 1 to it.

This is how C would handle most rounding cases. There’s only one special case.

When the value is equally distant to two possible rounded values, C will choose the value that ends with an even least significant digit. For example, if we were rounding 2.5 (in decimals) to a whole number, we can see that it’s equally distant from both 2 and 3. In round-to-even mode, 2 is chosen because it’s even.

With binary numbers, we have this case whenever the rounded-off digits are in the form 10...0 (a one followed by zeros). That’s the case for 123456788:

We have underlined the digits that will be rounded-off (100 ) to accommodate to 23 digits after the binary point. See that it’s a 1 followed by zeros. In this case, the value is equally distant from 1.11010110111100110100010 * 2^26 and from 1.1101011011110011010001 * 2^26 , the two possible rounding options. C will choose the first option (the one that ends with 0), because it’s even.

Finally, the rounded binary fraction will be:

Just like before, the rounded digit was enclosed in parentheses for clarity.

Now, we’ll see how to implement these changes in the converter.

Converter: rounding to the closest value

First of all, we’ll make the converter round to the closest value. As it was seen in the previous section, we just need to check the 24th digit of the binary fraction. If it’s 0, we leave the rounded digit as it is. If the 24th binary digit of the fraction is 1, we add 1 to the rounded digit.

All of the changes will take place only when the frac field is more than 23 digits long. We’ll accommodate the changes in the correspoding section of the code.

if (E>23) {

/* Frac field too long */

unsigned round = frac >> (E - 24) & 1;

frac >>= E-23;

frac += round;

}

if (E>23) { /* Frac field too long */ unsigned round = frac >> (E - 24) & 1; frac >>= E-23; frac += round; }

if (E>23) {
  /* Frac field too long */
  unsigned round = frac >> (E - 24) & 1;
  frac >>= E-23;
  frac += round;
 }

The first code line after if gets the 24th digit of the frac field. It will be either 0 or 1. After shifting the frac field, we increment it by the value of round.

Testing

We’ll test this new implementation in the range of values 100000000 to 100000020. These values require more than 23 digits in the frac field, so rounding will be required.

int main() {

int i;

float *fp;

float_bits f;

for (i=1e8; i<1e8+20; i++) {

f = float_i2f(i);

fp = &f;

if (*fp != (float) i)

printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);

}

int main() { int i; float *fp; float_bits f; for (i=1e8; i<1e8+20; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

int main() {
  int i;
  float *fp; 
  float_bits f;
  
  for (i=1e8; i<1e8+20; i++) {
    f = float_i2f(i);
    fp = &f;
    if (*fp != (float) i)
      printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);
  }
}

After compiling and executing, we get:

$ gcc i2f_roundv1.c -lm -o main

$ ./main

Casting not equal for value: 100000004

Converted value is 100000008

Casted value is 100000000

$ gcc i2f_roundv1.c -lm -o main $ ./main Casting not equal for value: 100000004 Converted value is 100000008 Casted value is 100000000

$ gcc i2f_roundv1.c -lm -o main
$ ./main
Casting not equal for value: 100000004
Converted value is 100000008
Casted value is 100000000

Here we can see that the converter is adequately handling most rounding cases. The only value it didn’t handle correctly was 100000004. Its binary representation is:

The digits that need to be rounded-off have been underlined. We can see this is a case where the two rounding options are equally distant. 100000004 is equally distant from 100000000 and from 100000008.

To adequately handle this cases, we’ll need to add a round-to-even feature to our converter.

Converter: round-to-even

We have to implement a special case when the rounded-off digits are in the form: 10...0 (a one followed by zeros). In that case, if the last frac digit is 0, we leave it unchanged. If the last frac digit is 1, we round the least significant digit by adding 1 to it.

We can implement those changes as follows:

/* Round to closest */

unsigned round = frac >> (E - 24) & 1;

/* Rouund-to-even modification */

unsigned last_frac_bit = frac >> (E - 23) & 1;

unsigned rounded_off_digits = frac << 32 - E + 23 >> 32 - E + 23;

unsigned one_followed_by_zeros = 1 << E - 24;

if (rounded_off_digits == one_followed_by_zeros)

round = last_frac_bit;

frac >>= E-23;

frac += round;

/* Round to closest */ unsigned round = frac >> (E - 24) & 1; /* Rouund-to-even modification */ unsigned last_frac_bit = frac >> (E - 23) & 1; unsigned rounded_off_digits = frac << 32 - E + 23 >> 32 - E + 23; unsigned one_followed_by_zeros = 1 << E - 24; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; frac >>= E-23; frac += round;

/* Round to closest */
unsigned round = frac >> (E - 24) & 1;
/* Rouund-to-even modification */
unsigned last_frac_bit = frac >> (E - 23) & 1;
unsigned rounded_off_digits = frac << 32 - E + 23 >> 32 - E + 23;
unsigned one_followed_by_zeros = 1 << E - 24;
if (rounded_off_digits == one_followed_by_zeros)
  round = last_frac_bit;

frac >>= E-23;
frac += round;

rounded_off_digits will get the bits of the frac field after the 23rd digit. We do so by setting all of the bits that come before to 0 with shift operations. We compare them with a value of the form 10...0 (one_followed_by_zeros). If they are equal, we change round to the value of the last bit (23rd) of the frac field.

The net effect of the code above is to modify the value of round to 0 when the rounded-off digits are in the form 10...0 and the last frac bit is also 0. This will round frac to an even value (least significant digit 0) when we’re equally distant from two rounding options.

Testing

This time we’ll test our converter in the range of values 100 million to 1 billion.

int main() {

int i;

float *fp;

float_bits f;

for (i=1e8; i<1e9; i++) {

f = float_i2f(i);

fp = &f;

if (*fp != (float) i)

printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);

}

int main() { int i; float *fp; float_bits f; for (i=1e8; i<1e9; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

int main() {
  int i;
  float *fp; 
  float_bits f;
  
  for (i=1e8; i<1e9; i++) {
    f = float_i2f(i);
    fp = &f;
    if (*fp != (float) i)
      printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);
  }
}

After compiling and executing we get some error messages:

Casting not equal for value: 134217724

Converted value is 67108864

Casted value is 134217728

...

Casting not equal for value: 134217724 Converted value is 67108864 Casted value is 134217728 ...

Casting not equal for value: 134217724
Converted value is 67108864
Casted value is 134217728
...

Our converter does not handle the value 134217724 (and many others) correctly. Checking its binary representation we get:

The digits that are rounded-off are underlined. We see that the frac field that corresponds to this value is all ones. We’ll need to handle this case differently.

Special case: `frac` field is all ones

When the frac field is all ones and it needs to be rounded, adding one to it will make it overflow. Let’s see an example:

Here we have a number with a binary fraction consisting of all ones. If we add one to its least significant digit, the fraction part of it will change to all zeros. Moreover, the addition will overflow beyond the binary point.

In terms of the exp field, we’ll have the following:

We can see that the exponent is incremented by one. Since float numbers are encoded in the form 1.XXX*2^y, when the frac field overflows, the exp field will be increased by one.

These changes can be readily applied into our converter:

/* Round to closest */

unsigned round = frac >> (E - 24) & 1;

/* Round-to-even modification */

unsigned last_frac_bit = frac >> (E - 23) & 1;

unsigned rounded_off_digits = frac << 2 + 30 - E + 23 >> 2 + 30 - E + 23;

unsigned one_followed_by_zeros = 1 << E - 24;

if (rounded_off_digits == one_followed_by_zeros)

round = last_frac_bit;

frac >>= E-23;

/* Special case: frac = 11...1 */

unsigned all_ones = -1 & 0x7FFFFF;

if (frac == all_ones && round) {

frac = 0;

exp++;

round = 0;

}

frac += round;

/* Round to closest */ unsigned round = frac >> (E - 24) & 1; /* Round-to-even modification */ unsigned last_frac_bit = frac >> (E - 23) & 1; unsigned rounded_off_digits = frac << 2 + 30 - E + 23 >> 2 + 30 - E + 23; unsigned one_followed_by_zeros = 1 << E - 24; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; frac >>= E-23; /* Special case: frac = 11...1 */ unsigned all_ones = -1 & 0x7FFFFF; if (frac == all_ones && round) { frac = 0; exp++; round = 0; } frac += round;

/* Round to closest */
unsigned round = frac >> (E - 24) & 1;
/* Round-to-even modification */
unsigned last_frac_bit = frac >> (E - 23) & 1;
unsigned rounded_off_digits = frac << 2 + 30 - E + 23 >> 2 + 30 - E + 23;
unsigned one_followed_by_zeros = 1 << E - 24;
if (rounded_off_digits == one_followed_by_zeros)
  round = last_frac_bit;

frac >>= E-23;

/* Special case: frac = 11...1 */
unsigned all_ones = -1 & 0x7FFFFF;
if (frac == all_ones && round) {
  frac = 0;
  exp++;
  round = 0;
 }

frac += round;

We included a conditional statement that makes the necessary changes when the frac field is all ones and needs to be rounded. In that case, we set frac to 0 and instead increment the exp field by one.

Testing

With this final changes, we’re ready to test the converter in the whole range of int values. Importing INT_MIN and INT_MAX from the limits.h library:

int main() {

int i;

float *fp;

float_bits f;

for (i=INT_MIN; i<INT_MAX; i++) {

f = float_i2f(i);

fp = &f;

if (*fp != (float) i)

printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);

}

int main() { int i; float *fp; float_bits f; for (i=INT_MIN; i<INT_MAX; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

int main() {
  int i;
  float *fp; 
  float_bits f;
  
  for (i=INT_MIN; i<INT_MAX; i++) {
    f = float_i2f(i);
    fp = &f;
    if (*fp != (float) i)
      printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);
  }
}

After compilation and execution, we get a single error message:

$ ./main

Casting not equal for value: -2147483648

Converted value is -2

Casted value is -2147483648

$ ./main Casting not equal for value: -2147483648 Converted value is -2 Casted value is -2147483648

$ ./main
Casting not equal for value: -2147483648
Converted value is -2
Casted value is -2147483648

This final error takes place because we cannot represent -INT_MIN using int (INT_MAX = abs(INT_MIN) - 1). We’ll handle this case differently.

Special case: INT_MIN

To handle this case, we’ll just set the E and M values for this single case:

/* Exponent */

unsigned E;

if (i != INT_MIN)

E = (int) (log(i<0 ? -i : i)/log(2));

else

/* INT_MIN = -1 * 2^31 */

E = 31;

unsigned exp = E + 127;

/* Magnificand*/

unsigned M;

if (i != INT_MIN)

M = i>0 ? i : -i;

else

/* INT_MIN = -1 * 2^31 */

M = 1<<31;

/* Exponent */ unsigned E; if (i != INT_MIN) E = (int) (log(i<0 ? -i : i)/log(2)); else /* INT_MIN = -1 * 2^31 */ E = 31; unsigned exp = E + 127; /* Magnificand*/ unsigned M; if (i != INT_MIN) M = i>0 ? i : -i; else /* INT_MIN = -1 * 2^31 */ M = 1<<31;

/* Exponent */
unsigned E;
if (i != INT_MIN) 
  E = (int) (log(i<0 ? -i : i)/log(2));
else
  /* INT_MIN = -1 * 2^31 */ 
  E = 31; 
unsigned exp = E + 127;
     
/* Magnificand*/
unsigned M;
if (i != INT_MIN)
  M = i>0 ? i : -i;
else
  /* INT_MIN = -1 * 2^31 */
  M = 1<<31;

With this new changes, we’ll now present the whole program and test it for the whole range of int values.

Complete converter and final testing

The code for the converter, including all of the modification above, and testing for the whole range of int values is presented below.

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <limits.h>

#define BIAS 127

#define K 23

typedef unsigned float_bits;

/* Convert 32-bits int to float */

float_bits float_i2f(int i) {

/* Special case : 0 is not a normalized value */

if (i==0)

return 0;

/* sign bit */

unsigned s = i>>31;

/* Exponent */

unsigned E;

if (i != INT_MIN)

E = (int) (log(i<0 ? -i : i)/log(2));

else

/* INT_MIN = -1.0 * 2^31 */

E = 31;

unsigned exp = E + BIAS;

/* Magnificand*/

unsigned M;

if (i != INT_MIN)

M = i>0 ? i : -i;

else

/* INT_MIN = -1.0 x 2^31*/

M = 1<<31;

unsigned frac = M ^ (1<<E);

/* Displace frac to start at bit position K=23 */

if (E>K) {

/* Frac field too long */

/* Round to closest */

unsigned round = frac >> (E - K - 1) & 1;

/* Round to even modification */

unsigned last_frac_bit = frac >> (E - K) & 1;

unsigned rounded_off_digits = frac << 32 - E + K >> 32 - E + K;

unsigned one_followed_by_zeros = 1 << E - K - 1;

if (rounded_off_digits == one_followed_by_zeros)

round = last_frac_bit;

/* Truncate to first 23th digits */

frac >>= E-K;

/* Special case: frac = 11..1 */

unsigned all_ones = -1 & 0x7FFFFF;

if (frac == all_ones && round) {

frac = 0;

exp++;

round = 0;

}

/* Round when necessary */

frac += round;

} else

/* Too short: Pad to the right with zeros */

frac <<= K-E;

return s<<31 | exp<<K | frac;

}

int main() {

int i;

float *fp;

float_bits f;

for (i=INT_MIN; i<INT_MAX; i++) {

f = float_i2f(i);

fp = &f;

if (*fp != (float) i)

printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);

}

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <limits.h> #define BIAS 127 #define K 23 typedef unsigned float_bits; /* Convert 32-bits int to float */ float_bits float_i2f(int i) { /* Special case : 0 is not a normalized value */ if (i==0) return 0; /* sign bit */ unsigned s = i>>31; /* Exponent */ unsigned E; if (i != INT_MIN) E = (int) (log(i<0 ? -i : i)/log(2)); else /* INT_MIN = -1.0 * 2^31 */ E = 31; unsigned exp = E + BIAS; /* Magnificand*/ unsigned M; if (i != INT_MIN) M = i>0 ? i : -i; else /* INT_MIN = -1.0 x 2^31*/ M = 1<<31; unsigned frac = M ^ (1<<E); /* Displace frac to start at bit position K=23 */ if (E>K) { /* Frac field too long */ /* Round to closest */ unsigned round = frac >> (E - K - 1) & 1; /* Round to even modification */ unsigned last_frac_bit = frac >> (E - K) & 1; unsigned rounded_off_digits = frac << 32 - E + K >> 32 - E + K; unsigned one_followed_by_zeros = 1 << E - K - 1; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; /* Truncate to first 23th digits */ frac >>= E-K; /* Special case: frac = 11..1 */ unsigned all_ones = -1 & 0x7FFFFF; if (frac == all_ones && round) { frac = 0; exp++; round = 0; } /* Round when necessary */ frac += round; } else /* Too short: Pad to the right with zeros */ frac <<= K-E; return s<<31 | exp<<K | frac; } int main() { int i; float *fp; float_bits f; for (i=INT_MIN; i<INT_MAX; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <limits.h>

#define BIAS 127
#define K 23

typedef unsigned float_bits;

/* Convert 32-bits int to float */
float_bits float_i2f(int i) {
  /* Special case : 0 is not a normalized value */
  if (i==0)
    return 0;
  /* sign bit */
  unsigned s = i>>31;
  /* Exponent */
  unsigned E;
  if (i != INT_MIN) 
    E = (int) (log(i<0 ? -i : i)/log(2));
  else
    /* INT_MIN = -1.0 * 2^31 */
    E = 31; 
  unsigned exp = E + BIAS;
  /* Magnificand*/
  unsigned M;
  if (i != INT_MIN)
    M = i>0 ? i : -i;
  else
    /* INT_MIN = -1.0 x 2^31*/
    M = 1<<31;
  unsigned frac = M ^ (1<<E);
      
  /* Displace frac to start at bit position K=23 */
  if (E>K) {
    /* Frac field too long */
    /* Round to closest  */
    unsigned round = frac >> (E - K - 1) & 1;
    /* Round to even modification */
    unsigned last_frac_bit = frac >> (E - K) & 1;
    unsigned rounded_off_digits = frac << 32 - E + K >> 32 - E + K;
    unsigned one_followed_by_zeros = 1 << E - K - 1;
    if (rounded_off_digits == one_followed_by_zeros)
      round = last_frac_bit;

    /* Truncate to first 23th digits */
    frac >>= E-K;

    /* Special case: frac = 11..1 */
    unsigned all_ones = -1 & 0x7FFFFF;
    if (frac == all_ones && round) {
      frac = 0;
      exp++;
      round = 0;
    }
    /* Round when necessary */
    frac += round;
  } else
    /* Too short: Pad to the right with zeros */
    frac <<= K-E;
  
  return s<<31 | exp<<K | frac;
}

int main() {
  int i;
  float *fp; 
  float_bits f;
  
  for (i=INT_MIN; i<INT_MAX; i++) {
    f = float_i2f(i);
    fp = &f;
    if (*fp != (float) i)
      printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i);
  }
}

We have changed all instances of value 23 with K to denote that the frac field is K=23 bits long.

When we compile it and run it, we get no output message. This means that the presented converter correctly handles the whole range of int values just like C’s castin to float.

Converting int to float in C (part 2)

The initial converter

Understanding rounding effects (round-to-even)

Converter: rounding to the closest value

Testing

Converter: round-to-even

Testing

Special case: frac field is all ones

Testing

Special case: INT_MIN

Complete converter and final testing

Special case: `frac` field is all ones