Virtual Polywell

dch24 · Post by **dch24** » Wed Jan 30, 2008 8:32 pm

drmike wrote:Confusion is the first step in learning, so this is good!

Your optimism is infectious.

I surely am confused.

drmike wrote:I'll do the md5sum this evening - I bet they match.

OK, thanks!

MSimon · Post by **MSimon** » Wed Jan 30, 2008 10:22 pm

The reason it is so hard for most people to learn new stuff is that they hate to be confused. It is rather embarrassing. Some one asks you a question and your answer is not even wrong. Or it is 180 out.

The delicious part comes after much suffering and it all starts to fit.

To make all this work a touch of masochism doesn't hurt. Maybe like a couple of barrels full.

drmike · Post by **drmike** » Wed Jan 30, 2008 10:52 pm

MSimon wrote:The reason it is so hard for most people to learn new stuff is that they hate to be confused. It is rather embarrassing. Some one asks you a question and your answer is not even wrong. Or it is 180 out.

The delicious part comes after much suffering and it all starts to fit.

To make all this work a touch of masochism doesn't hurt. Maybe like a couple of barrels full.

I prefer beer and scotch to masochism

Then the embarrassment isn't noticeable!

drmike · Post by **drmike** » Thu Jan 31, 2008 1:15 am

dch24 wrote:drmike, I think I understand now what my error means, "gsl: qag.c:261: ERROR: could not integrate function". I don't think it's a bug in your code. The error basically means the integration didn't converge -- it diverged.

So I need to be sure the input data is OK. I'm doing MAXSTEPS=100, and I want to verify that my potential.dat is correct. Can you check that the output below is what you get, too?
Code: Select all
localhost $ md5sum potential.dat 
e234099bd4142c01d4da056389781573  potential.dat
localhost $ ls -l potential.dat 
-rw-r--r-- 1 dch24 dch24 1414808 Jan 29 22:38 potential.dat

OK, they don't match???

Code: Select all

drmike@truth:/sata0/fusion> md5sum potential.dat
5fbf5e66b0a21fbdbd573f118e94384b  potential.dat

Now I'm _really_ confused.

Hmmm... which version of gsl are you using? It's probably more up to date than mine. I did a

Code: Select all

 cat /usr/local/include/gsl/gsl_version.h

and got

Code: Select all

#define GSL_VERSION "1.4"

A diff on our code base would be good, I'd love to have an excuse to go buy a new computer

I'll send you an e-mail and we can do that offline.

drmike · Post by **drmike** » Thu Jan 31, 2008 4:00 am

Well, I'm learning something.

I changed the order of parameters in the integral so that terms with only radial or one particular angle would only be included in the correct portion. Now I get divergence too - so it fails to integrate. But inside I get terms like this

Code: Select all

err = 0.000000
ring = 806686282371712224890919312937202483459490826810337205558242705408.000000

So the integration gives no error, but the value is essentially infinite. Somewhere that must divide out to zero later, or by combining terms I was getting zero first rather than infinity first.

In any event, you guys are getting the right answer - the integral as specified does not converge. Now I have to figure out why!!

drmike · Post by **drmike** » Thu Jan 31, 2008 5:07 am

OK, I found a pretty major, obviously stupid, bug. The new version of code is posted, and now I don't get zeros or nan's or inf's. I'll let it run over night and see what the full output is.

The mods should speed up calculations on better machines since I reduce the number of multiplies and adds in the inner loops. I think the interpolation block needs a lot of work, and finite element type analysis might be more appropriate. We'll see as we play.

The bug was a "1" where I should have had an "i". DOH!!

New code posted here.

Thank you for all the help - everyone's comments turned into very useful clues that helped me track things down.

dch24 · Post by **dch24** » Thu Jan 31, 2008 6:08 am

We've PMed each other our email addresses. We'll figure it out.

First, I want to change to gsl-1.4 and see if that gets me the md5sum. I'm using gsl-1.9, and since this is floating-point intensive stuff, I'm sure a few lines of code difference would flip a bit somewhere and we'd get different results.

I'm just downloading the plain vanilla code from the eskimo site.

dch24 · Post by **dch24** » Thu Jan 31, 2008 6:33 am

MSimon wrote:The reason it is so hard for most people to learn new stuff is that they hate to be confused. It is rather embarrassing. Some one asks you a question and your answer is not even wrong. Or it is 180 out.

The delicious part comes after much suffering and it all starts to fit.

To make all this work a touch of masochism doesn't hurt. Maybe like a couple of barrels full.

Well, I haven't finished my first keg of it, yet.

I ran potential.c, freshly downloaded from the eskimo site, but I modified MAXSTEPS to 100. (The one on eskimo has 400.)

I checked and double-checked. I'm using gsl-1.4. I'm on x86_64, so I'm going to try it on my macbook (OS X 10.4, so Core 2 Duo but plain ol' Intel x86). But ... this is the result I get:

Code: Select all

localhost $ ls -l potential.dat 
-rw-r--r-- 1 dch24 dch24 1414808 Jan 30 23:21 potential.dat
localhost $ md5sum potential.dat 
e234099bd4142c01d4da056389781573  potential.dat

The astute reader will notice that I'm still getting the same result. MAXSTEPS=400 isn't done yet, and I'm pulling out my macbook now...

drmike wrote:I have to admit this is fun though!

Oh yeah. I'm laughing out loud and my buddies are looking at me like I'm crazy.

dch24 · Post by **dch24** » Thu Jan 31, 2008 7:20 am

Well, nothing I do seems to change the outcome. Here is how I compile potential.c:

Code: Select all

gcc -o potential -g potential.c -I/usr/src/patches/polywell/gsl-1.4inst/include -L/usr/src/patches/polywell/gsl-1.4inst/lib -lm -lgsl

For MAXSTEPS=100:

Code: Select all

e234099bd4142c01d4da056389781573  potential.dat

For MAXSTEPS=400:

Code: Select all

0206f2f489a5133a711b11db81c1b2d3  potential.dat

(Same md5sums as before on x86_64, even though I'm using gsl-1.4 now.)

On i386-darwin-gcc (MacOS 10.4) I get different results. (Strange!) First, gsl-1.4 does not support i386-darwin-gcc. It only supports ppc-darwin-gcc. So I am forced to use gsl-1.9. But that seems OK, since gsl-1.4 and gsl-1.9 on linux produce the exact same results.

These are the md5sum results. (File sizes are all the same for both systems.)

Code: Select all

localhost:~/Desktop/polywell dch24$ md5sum potential*.dat
dea85412261a327fe514ef0a7894498d  potential100.dat
4161ffaeb8d2f98c0dc2626d91cdb85c  potential400.dat

It may be that MacOS handles floating point differently, based on the fact that gsl-1.9 actively selects between i386-linux-gcc and i386-darwin-gcc. But currently I'm going on the opinion the selection is necessary to find the right IEEE 754 headers. The floating-point standard is standard, I hope.

I need to track down exactly what the difference is between 32-bit and 64-bit code. I think the first thing is to generate 32-bit code on Linux (gcc -m32) and see what comes out.

dch24 · Post by **dch24** » Thu Jan 31, 2008 8:23 am

Code: Select all

localhost $ gcc -o potential -m32 -g -I/usr/src/patches/polywell/gsl-1.9inst/include potential.c  -L/usr/src/patches/polywell/gsl-1.9inst/lib -lm -lgsl
localhost $ ./potential
...
(add suffixes to potential.dat files)
...
localhost $ md5sum potential*.dat
e234099bd4142c01d4da056389781573  potential100.dat
fc4c7fe045161a6937e753c540eb236e  potential100_m32.dat
0206f2f489a5133a711b11db81c1b2d3  potential400.dat
a91c0e27acc73ed4b7caf097b0b5abba  potential400_m32.dat

So I think bit-exact output is not going to be possible, at least for now. I see that i386-darwin-gcc produces different results than i386-linux-gcc, also different from x86_64-linux-gcc. And drmike, I'm guessing you're using a ppc processor, because you also get a different result.

But the bug fix you posted today works. electron_fluid.c is now running on my machine. I'll think up another way of validating the output so I can start playing with parallel processing again.

drmike · Post by **drmike** » Thu Jan 31, 2008 1:26 pm

Nice work dch24! I bet a single bit difference is all it takes, and with floating point that is very easy at the least significant bits between different hardware. I've got an AMD64 not a PPC, but the floating point units are probably totally different. When it rounds probably matters.

After 8 hours I'm only to step 46 out of 100. I don't think it will finish before I get back to it at the end of the day, but I'll let it run just to see.

Hopefully the result won't be all zeros this time....

MSimon · Post by **MSimon** » Thu Jan 31, 2008 2:36 pm

drmike wrote:Nice work dch24! I bet a single bit difference is all it takes, and with floating point that is very easy at the least significant bits between different hardware. I've got an AMD64 not a PPC, but the floating point units are probably totally different. When it rounds probably matters.

After 8 hours I'm only to step 46 out of 100. I don't think it will finish before I get back to it at the end of the day, but I'll let it run just to see.

Hopefully the result won't be all zeros this time....
:)

Uh, wouldn't it be wise to put in some kind of break point so you could check while the calculation is in progress?

drmike · Post by **drmike** » Thu Jan 31, 2008 3:37 pm

Yes, that would be smart. I kind of did that by setting the counter to 10 instead of 100 and checked that it worked. So the long run should be giving me something useful too. I'm kind of an optimist - fix bugs after they show up. It's ok for software - hardware I do differently!

But it has been mentioned to me off line that the interpolation routine is the major dog. So I started thinking about why I need it. The answer is I don't need it if I approach the problem differently. So I'm going to attack in a different direction.

If I get zeros after 10 steps I'll know I've got to fix something!

And hopefully 10 steps will only take a few milliseconds instead of a few minutes. That will be even better.

scareduck · Post by **scareduck** » Thu Jan 31, 2008 4:14 pm

dch24 wrote:It may be that MacOS handles floating point differently, based on the fact that gsl-1.9 actively selects between i386-linux-gcc and i386-darwin-gcc. But currently I'm going on the opinion the selection is necessary to find the right IEEE 754 headers. The floating-point standard is standard, I hope.

I need to track down exactly what the difference is between 32-bit and 64-bit code. I think the first thing is to generate 32-bit code on Linux (gcc -m32) and see what comes out.

If you read the current wars fueled by the proposed IEEE-754r spec, one of them is all about how underflows and certain exceptions are trapped. My suspicion is that the standard is being followed, but one of the nuances may be that underflow handling has different defaults between the two compilers.

scareduck · Post by **scareduck** » Thu Jan 31, 2008 8:10 pm

drmike, I was wondering if you had ever looked into OOPic Pro:

http://www.txcorp.com/products/OOPIC_Pro/

Looks like it was developed for exactly the sort of simulation you're trying to do.

OOPIC Pro simulates physical systems, including:

* Plasmas
* Beams of charged particles with selfconsistent and externally generated electric and magnetic fields
* Low-to-moderate density neutral gasses
* Wide variety of boundary conditions

OOPIC Pro has electrostatic and electromagnetic field solvers for 2D geometries in both x-y (slab) and r-z (cylindrical) coordinates, and includes Monte Carlo collision and ionization models.

Of course, their prices are a bit steep at $995 per seat for research. Industrial user licenses cost about twice that at $1995 per seat.