This post is about a popular but niche technique I can never find a succinct reference for. I didn’t invent it, I just need a page I can link when giving optimization advice.
Integer
SnippetsAll functions are vectorizable.
/// Convert an integer in range [0; 2^23) to a float exactly.
///
/// Produces an incorrect result for integers outside the range.
fn u23_to_f32(x: u32) -> f32 {
let magic = ((1u32 << 23) as f32).to_bits();
f32::from_bits(x ^ magic) - f32::from_bits(magic)
}
/// Convert an integer in range [0; 2^52) to a double exactly.
///
/// Produces an incorrect result for integers outside the range.
fn u52_to_f64(x: u64) -> f64 {
let magic = ((1u64 << 52) as f64).to_bits();
f64::from_bits(x ^ magic) - f64::from_bits(magic)
}
/// Convert a float in range [-0.25; 2^23] to the nearest integer, rounding ties to even.
///
/// Produces an incorrect result for floats outside the range or `NaN`s. Rounds just like
/// `x.round_ties_even()`.
fn f32_to_u23_rounding(x: f32) -> u32 {
let magic = (1u32 << 23) as f32;
(x + magic).to_bits() ^ magic.to_bits()
}
/// Convert a double in range [-0.25; 2^52] to the nearest integer, rounding ties to even.
///
/// Produces an incorrect result for doubles outside the range or `NaN`s. Rounds just like
/// `x.round_ties_even()`.
fn f64_to_u52_rounding(x: f64) -> u64 {
let magic = (1u64 << 52) as f64;
(x + magic).to_bits() ^ magic.to_bits()
}
/// Convert a double in range [-0.25; 2^32 - 0.5) to the nearest integer, rounding ties to even.
///
/// Produces an incorrect result for doubles outside the range or `NaN`s. Rounds just like
/// `x.round_ties_even()`.
fn f64_to_u32_rounding(x: f64) -> u32 {
let magic = (1u64 << 52) as f64;
(x + magic).to_bits() as u32
}
No alternatives for flooring are explicitly provided; if you has access to AVX-512, changing the rounding of addition in the last two methods to flooring should work.
How this works(1u32 << 23) as f32
is an IEEE-754 number with the (unbiased) exponent set to
In u23_to_f32
,
In f32_to_u23
,
The situation for doubles and
f64_to_u32_rounding
is equivalent to f64_to_u52_rounding(x) as u32
; it’s mentioned explicitly because the bottom
In cases where two different constants would make more intuitive sense,
By changing the exponent of the magic, you can also divide or multiply the float by a power of two at no additional cost; this is occasionally useful.