brooksmoses: (Default)
[personal profile] brooksmoses
I was writing a subroutine to run on one of the coprocessors on the Cell processor, and it occurred to me that I could make things run a bit faster in a particular case by assuming that the floating point product of 0.0 and any bit pattern would always be 0.0 (at least for single-precision numbers).

Now, this isn't usually the case -- there are bit patterns that represent "not a valid number", and on a computer that does math according to the IEEE standard for floating-point arithmetic, the result of multiplying "not a valid number" and 0.0 is supposed to be "not a valid number". However, the single-precision floating-point arithmetic on the Cell's coprocessors are not designed to comply with all the niceties of the IEEE standard for this; instead, like the Cray processors of old, they're designed to go as fast as possible. So it was a reasonable conjecture that they would, in fact, simply return 0.0 for everything.

The question, then, was how to quickly get an answer that I'd trust. Google was one option, but trusting random stuff on the internet is unwise, especially when the risk is introducing a subtle and hard-to-track-down bug.

The naive answer, of course, is to test every possible input. This is the sort of thing that every computer-science freshman knows is absurd and impossible -- the number of possible inputs grows exponentially, and you can get numbers like "every possible position of every electron in the universe since the big bang" without hardly trying.

Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.

...

Except, wait. This is a processor with a clock speed of 3.2 billion cycles per second, and it takes probably a few dozen cycles to test each number. That's ... entirely plausible.

So I tried it.

It turned out to take it about two minutes. They are, indeed, all zero.

Date: 2008-02-13 03:53 am (UTC)
From: [identity profile] falsedrow.livejournal.com
Hooray for Gigahertz!

Date: 2008-02-13 06:32 am (UTC)
From: [identity profile] echristo.livejournal.com
haha. Most excellent.

Date: 2008-02-13 06:54 am (UTC)
From: [identity profile] ejalbert.livejournal.com
Wow, that's frightening.

Date: 2008-02-13 09:40 am (UTC)
From: [identity profile] green-knight.livejournal.com
Wow. The problem is when your first computer had an 8mhz processor, you sometimes forget how powerful these things can be.

\me to my computer yesterday: "You have enough power to run an entire space mission, so will you *please* open that window in a reasonable time?"

Date: 2008-02-13 01:04 pm (UTC)
ext_153365: Leaf with a dead edge (Default)
From: [identity profile] oldsma.livejournal.com
I think my Blackberry has more computing power than the roomful of computer they mention in Apollo 13. Sometimes I wonder if me getting to appointments on time is more important than getting men to the moon and back, but then I come to my senses and resume texting about what I'm going to have for lunch.

MAO

Inconceivable!

Date: 2008-02-13 03:55 pm (UTC)
From: [identity profile] dragon3.livejournal.com
Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.


It's smaller than the number of people on the earth and similar to the number of bytes on a DVD. Did you store all the results on your hard drive? ;-)

I had a similar epiphany long ago when I realized I could dramatically speed up repeated FFT calculations by storing all the required trig values in a lookup table.

Two minutes suggests somewhere around 100 cycles per test, so there must have been a lot of overhead in there.

Re: Inconceivable!

Date: 2008-02-14 03:46 am (UTC)
From: [identity profile] flippac.livejournal.com
It may well just have stalled on every test, so not necessarily overhead.

Re: Inconceivable!

Date: 2008-02-14 07:13 pm (UTC)
From: [identity profile] dragon3.livejournal.com
Is that after factoring out the overhead of the clock calls? I think you may have just redefined the word "geek" ;-)

I gave up counting cycles and hand optimizing assembly code back when 12 MHz was fast. Now I spend a lot of time saying "First make it work, then make it fast..."

Date: 2008-02-13 05:33 pm (UTC)
From: [identity profile] cjsmith.livejournal.com
Wow. That is way cool. And now you have some test code to keep lying around in case you ever need to port to a new coprocessor. :)

Re: Inconceivable!

Date: 2008-02-13 05:46 pm (UTC)
From: [identity profile] johnpalmer.livejournal.com
You know, it's interesting... Google is facing something of the same problem. They have this incredibly huge network of incredible computing power, and they keep trying to get people who work for them to think *big*.

I'm going to have to store this little incident away in my own memory as well... it's good to remember that there's a lot of stuff you won't do repeatedly, or as a matter of course, but might be worth the computing power to do it once.

Date: 2008-02-14 03:47 am (UTC)
From: [identity profile] flippac.livejournal.com
My comp currently has so much RAM the OS doesn't have address space for swap. This amuses me in a similar manner.
Page generated Jul. 17th, 2025 02:08 pm
Powered by Dreamwidth Studios