Sometimes modern technology is scary.
Feb. 12th, 2008 06:19 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I was writing a subroutine to run on one of the coprocessors on the Cell processor, and it occurred to me that I could make things run a bit faster in a particular case by assuming that the floating point product of 0.0 and any bit pattern would always be 0.0 (at least for single-precision numbers).
Now, this isn't usually the case -- there are bit patterns that represent "not a valid number", and on a computer that does math according to the IEEE standard for floating-point arithmetic, the result of multiplying "not a valid number" and 0.0 is supposed to be "not a valid number". However, the single-precision floating-point arithmetic on the Cell's coprocessors are not designed to comply with all the niceties of the IEEE standard for this; instead, like the Cray processors of old, they're designed to go as fast as possible. So it was a reasonable conjecture that they would, in fact, simply return 0.0 for everything.
The question, then, was how to quickly get an answer that I'd trust. Google was one option, but trusting random stuff on the internet is unwise, especially when the risk is introducing a subtle and hard-to-track-down bug.
The naive answer, of course, is to test every possible input. This is the sort of thing that every computer-science freshman knows is absurd and impossible -- the number of possible inputs grows exponentially, and you can get numbers like "every possible position of every electron in the universe since the big bang" without hardly trying.
Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.
...
Except, wait. This is a processor with a clock speed of 3.2 billion cycles per second, and it takes probably a few dozen cycles to test each number. That's ... entirely plausible.
So I tried it.
It turned out to take it about two minutes. They are, indeed, all zero.
Now, this isn't usually the case -- there are bit patterns that represent "not a valid number", and on a computer that does math according to the IEEE standard for floating-point arithmetic, the result of multiplying "not a valid number" and 0.0 is supposed to be "not a valid number". However, the single-precision floating-point arithmetic on the Cell's coprocessors are not designed to comply with all the niceties of the IEEE standard for this; instead, like the Cray processors of old, they're designed to go as fast as possible. So it was a reasonable conjecture that they would, in fact, simply return 0.0 for everything.
The question, then, was how to quickly get an answer that I'd trust. Google was one option, but trusting random stuff on the internet is unwise, especially when the risk is introducing a subtle and hard-to-track-down bug.
The naive answer, of course, is to test every possible input. This is the sort of thing that every computer-science freshman knows is absurd and impossible -- the number of possible inputs grows exponentially, and you can get numbers like "every possible position of every electron in the universe since the big bang" without hardly trying.
Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.
...
Except, wait. This is a processor with a clock speed of 3.2 billion cycles per second, and it takes probably a few dozen cycles to test each number. That's ... entirely plausible.
So I tried it.
It turned out to take it about two minutes. They are, indeed, all zero.
no subject
Date: 2008-02-13 03:53 am (UTC)no subject
Date: 2008-02-13 06:32 am (UTC)no subject
Date: 2008-02-13 06:54 am (UTC)no subject
Date: 2008-02-13 09:40 am (UTC)\me to my computer yesterday: "You have enough power to run an entire space mission, so will you *please* open that window in a reasonable time?"
no subject
Date: 2008-02-13 01:04 pm (UTC)MAO
Inconceivable!
Date: 2008-02-13 03:55 pm (UTC)It's smaller than the number of people on the earth and similar to the number of bytes on a DVD. Did you store all the results on your hard drive? ;-)
I had a similar epiphany long ago when I realized I could dramatically speed up repeated FFT calculations by storing all the required trig values in a lookup table.
Two minutes suggests somewhere around 100 cycles per test, so there must have been a lot of overhead in there.
Re: Inconceivable!
Date: 2008-02-14 03:46 am (UTC)Re: Inconceivable!
Date: 2008-02-14 04:30 am (UTC)Also, there was some overhead from the fact that my pseudocode was roughly this:
So, it was doing double the work.
From the quick glance I had at the assembly code, I think it was using branches for the tests, too, even though I'd written them as ternary expressions (which the Cell has a "select" instruction for), and those tend to produce significant stalls.
Actual timing results using "time": 33 seconds for -O2 (which turns out not to optimize away the multiply after all), and 4 minutes 58 seconds for -O0 -- I guess I underestimated a bit. Given how things have worked with the real code I've been working on, I expect I could probably get that down to about 5 seconds with some judicious loop unrolling and vectorization -- but not in 5 minutes of programming time!
Re: Inconceivable!
Date: 2008-02-14 05:32 am (UTC)A half-hour later, I have a program that will check them in 4-element vectors in an average of 5.4 cycles per vector. The overall program run took 1.728 seconds.
Re: Inconceivable!
Date: 2008-02-14 07:13 pm (UTC)I gave up counting cycles and hand optimizing assembly code back when 12 MHz was fast. Now I spend a lot of time saying "First make it work, then make it fast..."
no subject
Date: 2008-02-13 05:33 pm (UTC)Re: Inconceivable!
Date: 2008-02-13 05:46 pm (UTC)I'm going to have to store this little incident away in my own memory as well... it's good to remember that there's a lot of stuff you won't do repeatedly, or as a matter of course, but might be worth the computing power to do it once.
Re: Inconceivable!
Date: 2008-02-14 04:33 am (UTC)Which is not a good answer to this particular question, but still entirely feasible if I needed to do it for something similar.
no subject
Date: 2008-02-14 03:47 am (UTC)