In this post, we’ll program an `int`

to `float`

converter that will handle the whole range of `int`

values . We’ll base our work on the results of the previous converter which didn’t handle rounding effects.

This is also a solution for exercise 2.97 of the Computer Systems: A Programmerâ€™s Perspective book.

## The initial converter

We’ll start working with the following implementation of the converter:

/* Convert 32-bits int to float */ float_bits float_i2f(int i) { /* Special case : 0 is not a normalized value */ if (i==0) return 0; /* sign bit */ unsigned s = i>>31; /* Exponent */ unsigned E = (int) (log(i<0 ? -i : i)/log(2)); unsigned exp = E + 127; /* Magnificand*/ unsigned M= i>0 ? i : -i; unsigned frac = M ^ (1<<E); /* Move frac to start at bit postion 23 */ if (E>23) /* Too long: Truncate to first 23 bits */ frac >>= E-23; else /* Too short: Pad to the right with zeros */ frac <<= 23-E; return s<<31 | exp<<23 | frac; }

The details of which we covered in this post.

First of all, let’s see which cases aren’t correctly handled.

## Understanding rounding effects (round-to-even)

Single-precision floating-point has a limited precision of 23 bits for its `frac`

field. Numbers that require more than 23 bits for their fraction part when encoded as floats will have to be rounded.

For example, let’s take the number `123456789`

(in decimals). Converting to binary, we’ll have:

And expressing it as a binary fraction:

As it can be seen, there are 26 significant digits after the binary point. Since the `frac`

field can only accommodate for 23 of them, we won’t be able to exactly express this value. It will have to be rounded.

C handles rounding with the *round-to-even* mode. In this mode, we round to the closest possible value. See that for `123456789`

in binary, we’ll round-off the last 3 digits: `101`

. Since the highest order bit of `101`

is `1`

, we’ll add one to the least significant digit of the rounded value:

See that the new binary fraction is only 23 digits long after the binary point. The last digit was enclosed in parenthesis for clarity. If we had merely truncated to the 23rd digit after the binary point, it would have been `0`

. Because of rounding, we added `1`

to it.

This is how C would handle most rounding cases. There’s only one special case.

When the value is equally distant to two possible rounded values, C will choose the value that ends with an even least significant digit. For example, if we were rounding `2.5`

(in decimals) to a whole number, we can see that it’s equally distant from both `2`

and `3`

. In *round-to-even* mode, `2`

is chosen because it’s even.

With binary numbers, we have this case whenever the rounded-off digits are in the form `10...0`

(a one followed by zeros). That’s the case for `123456788`

:

We have underlined the digits that will be rounded-off (`100`

) to accommodate to 23 digits after the binary point. See that it’s a `1`

followed by zeros. In this case, the value is equally distant from `1.11010110111100110100010 * 2^26`

and from `1.1101011011110011010001 * 2^26`

, the two possible rounding options. C will choose the first option (the one that ends with `0`

), because it’s even.

Finally, the rounded binary fraction will be:

Just like before, the rounded digit was enclosed in parentheses for clarity.

Now, we’ll see how to implement these changes in the converter.

## Converter: rounding to the closest value

First of all, we’ll make the converter round to the closest value. As it was seen in the previous section, we just need to check the 24th digit of the binary fraction. If it’s `0`

, we leave the rounded digit as it is. If the 24th binary digit of the fraction is `1`

, we add `1`

to the rounded digit.

All of the changes will take place only when the `frac`

field is more than 23 digits long. We’ll accommodate the changes in the correspoding section of the code.

if (E>23) { /* Frac field too long */ unsigned round = frac >> (E - 24) & 1; frac >>= E-23; frac += round; }

The first code line after `if`

gets the 24th digit of the `frac`

field. It will be either `0`

or `1`

. After shifting the `frac`

field, we increment it by the value of `round`

.

### Testing

We’ll test this new implementation in the range of values `100000000`

to `100000020`

. These values require more than 23 digits in the `frac`

field, so rounding will be required.

int main() { int i; float *fp; float_bits f; for (i=1e8; i<1e8+20; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

After compiling and executing, we get:

$ gcc i2f_roundv1.c -lm -o main $ ./main Casting not equal for value: 100000004 Converted value is 100000008 Casted value is 100000000

Here we can see that the converter is adequately handling most rounding cases. The only value it didn’t handle correctly was `100000004`

. Its binary representation is:

The digits that need to be rounded-off have been underlined. We can see this is a case where the two rounding options are equally distant. `100000004`

is equally distant from `100000000`

and from `100000008`

.

To adequately handle this cases, we’ll need to add a *round-to-even* feature to our converter.

## Converter: round-to-even

We have to implement a special case when the rounded-off digits are in the form: `10...0`

(a one followed by zeros). In that case, if the last `frac`

digit is `0`

, we leave it unchanged. If the last `frac`

digit is `1`

, we round the least significant digit by adding `1`

to it.

We can implement those changes as follows:

/* Round to closest */ unsigned round = frac >> (E - 24) & 1; /* Rouund-to-even modification */ unsigned last_frac_bit = frac >> (E - 23) & 1; unsigned rounded_off_digits = frac << 32 - E + 23 >> 32 - E + 23; unsigned one_followed_by_zeros = 1 << E - 24; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; frac >>= E-23; frac += round;

`rounded_off_digits`

will get the bits of the `frac`

field after the 23rd digit. We do so by setting all of the bits that come before to `0`

with shift operations. We compare them with a value of the form `10...0`

(`one_followed_by_zeros`

). If they are equal, we change `round`

to the value of the last bit (23rd) of the `frac`

field.

The net effect of the code above is to modify the value of `round`

to `0`

when the rounded-off digits are in the form `10...0`

and the last `frac`

bit is also `0`

. This will round `frac`

to an even value (least significant digit `0`

) when we’re equally distant from two rounding options.

### Testing

This time we’ll test our converter in the range of values 100 million to 1 billion.

int main() { int i; float *fp; float_bits f; for (i=1e8; i<1e9; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

After compiling and executing we get some error messages:

Casting not equal for value: 134217724 Converted value is 67108864 Casted value is 134217728 ...

Our converter does not handle the value `134217724`

(and many others) correctly. Checking its binary representation we get:

The digits that are rounded-off are underlined. We see that the `frac`

field that corresponds to this value is all ones. We’ll need to handle this case differently.

## Special case: `frac`

field is all ones

When the `frac`

field is all ones and it needs to be rounded, adding one to it will make it overflow. Let’s see an example:

Here we have a number with a binary fraction consisting of all ones. If we add one to its least significant digit, the fraction part of it will change to all zeros. Moreover, the addition will overflow beyond the binary point.

In terms of the `exp`

field, we’ll have the following:

We can see that the exponent is incremented by one. Since float numbers are encoded in the form `1.XXX*2^y`

, when the `frac`

field overflows, the `exp`

field will be increased by one.

These changes can be readily applied into our converter:

/* Round to closest */ unsigned round = frac >> (E - 24) & 1; /* Round-to-even modification */ unsigned last_frac_bit = frac >> (E - 23) & 1; unsigned rounded_off_digits = frac << 2 + 30 - E + 23 >> 2 + 30 - E + 23; unsigned one_followed_by_zeros = 1 << E - 24; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; frac >>= E-23; /* Special case: frac = 11...1 */ unsigned all_ones = -1 & 0x7FFFFF; if (frac == all_ones && round) { frac = 0; exp++; round = 0; } frac += round;

We included a conditional statement that makes the necessary changes when the `frac`

field is all ones and needs to be rounded. In that case, we set `frac`

to 0 and instead increment the `exp`

field by one.

### Testing

With this final changes, we’re ready to test the converter in the whole range of `int`

values. Importing `INT_MIN`

and `INT_MAX`

from the `limits.h`

library:

int main() { int i; float *fp; float_bits f; for (i=INT_MIN; i<INT_MAX; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

After compilation and execution, we get a single error message:

$ ./main Casting not equal for value: -2147483648 Converted value is -2 Casted value is -2147483648

This final error takes place because we cannot represent `-INT_MIN`

using `int`

(`INT_MAX = abs(INT_MIN) - 1`

). We’ll handle this case differently.

## Special case: INT_MIN

To handle this case, we’ll just set the `E`

and `M`

values for this single case:

/* Exponent */ unsigned E; if (i != INT_MIN) E = (int) (log(i<0 ? -i : i)/log(2)); else /* INT_MIN = -1 * 2^31 */ E = 31; unsigned exp = E + 127; /* Magnificand*/ unsigned M; if (i != INT_MIN) M = i>0 ? i : -i; else /* INT_MIN = -1 * 2^31 */ M = 1<<31;

With this new changes, we’ll now present the whole program and test it for the whole range of `int`

values.

## Complete converter and final testing

The code for the converter, including all of the modification above, and testing for the whole range of `int`

values is presented below.

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <limits.h> #define BIAS 127 #define K 23 typedef unsigned float_bits; /* Convert 32-bits int to float */ float_bits float_i2f(int i) { /* Special case : 0 is not a normalized value */ if (i==0) return 0; /* sign bit */ unsigned s = i>>31; /* Exponent */ unsigned E; if (i != INT_MIN) E = (int) (log(i<0 ? -i : i)/log(2)); else /* INT_MIN = -1.0 * 2^31 */ E = 31; unsigned exp = E + BIAS; /* Magnificand*/ unsigned M; if (i != INT_MIN) M = i>0 ? i : -i; else /* INT_MIN = -1.0 x 2^31*/ M = 1<<31; unsigned frac = M ^ (1<<E); /* Displace frac to start at bit position K=23 */ if (E>K) { /* Frac field too long */ /* Round to closest */ unsigned round = frac >> (E - K - 1) & 1; /* Round to even modification */ unsigned last_frac_bit = frac >> (E - K) & 1; unsigned rounded_off_digits = frac << 32 - E + K >> 32 - E + K; unsigned one_followed_by_zeros = 1 << E - K - 1; if (rounded_off_digits == one_followed_by_zeros) round = last_frac_bit; /* Truncate to first 23th digits */ frac >>= E-K; /* Special case: frac = 11..1 */ unsigned all_ones = -1 & 0x7FFFFF; if (frac == all_ones && round) { frac = 0; exp++; round = 0; } /* Round when necessary */ frac += round; } else /* Too short: Pad to the right with zeros */ frac <<= K-E; return s<<31 | exp<<K | frac; } int main() { int i; float *fp; float_bits f; for (i=INT_MIN; i<INT_MAX; i++) { f = float_i2f(i); fp = &f; if (*fp != (float) i) printf("Casting not equal for value: %d\nConverted value is %.0f\nCasted value is %.0f\n", i, *fp, (float) i); } }

We have changed all instances of value `23`

with `K`

to denote that the `frac`

field is K=23 bits long.

When we compile it and run it, we get no output message. This means that the presented converter correctly handles the whole range of `int`

values just like C’s castin to `float`

.