low-complexity time algorithm for n-body simulation

happyjack27 · Post by **happyjack27** » Sat Jun 29, 2013 6:13 pm

for instance, for simulating polywells.

O(N*log(resolution)) complexity time, by yours truly. I believe that beats the world's best.

NOTE: this is the most up-to-date-revision posted. last revised 2013.07.05. also available at http://kbtrader.info/nbody.txt

* fixed a minor bug
* added expand and contract (for sparse quad-tree/oct-tree)
* added readWithLocalSubtract for concurrent "all particles minus one particle" calculations (see NOTE in code comments)

Code: Select all


//use unit square
import java.util.*;
import java.util.concurrent.*;

/*
WHAT THIS DOES:
this calculates the total force at a point x,y, where the force is a sum of forces from N points.
where the forces follow the inverse-square law.  that is, the force is 1/(distance squared).
examples are gravity and the electro-magnetic force.


HOW TO USE:
*ADDING: to add a point of force, e.g. a charge carrier or a mass, at coordinates x,y, call add(x,y,vals),
where x,y is the coordinate, and vals is an array of force singularities at that point (e.g. electric charge, mass, etc.)

*SUBTRACTING: to subtract a point of force, call subtract(x,y,vals)

*MOVING/CHANGING: to move a point of force or change its force, first subtract the old values from the old spot, then add the new values to the new spot.

*CALCULATING: to find the total force on a point x,y, call readMaxContrib(x,y).

*NOTE: if you are calculating the force on a force point that you have already adding to the system,
* you want to caclulate it _without_ that point. to do that, call instead readWithLocalSubract(x,y,px,py,sub).
* where px and py are the coords of the point you want to subtract, and sub is that point's vals (it's array of force singularities).
* otherwise the calculated force will be singular at that point. 
* 
variations:
*1) fixed depth
**2) fixed depth w/threshold
***3) fixed depth w/threshold & bins
****4) dynamic depth w/threshold & bins (instead of binning, add or remove depth - bins are there so can be done)
*****5) dynamic compute space partitioning for non-uniform memory architecture; aka distributed n-body
*****   to be designed
*****   each bottom of a bin extension (thus holding x individual particles) is given a computation unit, with links to the up levels.  make sure all levels are done before iterating.


for 4):
*make add and subtract also recurse to the replane
*make read also recurse to the replane
*make adjust x and y scale and translation for recursion.


for replaning, first add a distance scaling and shifting function (scale, xshift,yshift).

shown in the code below are the first three variations

1) use_bins = false, use read() function
2) use_bins = false, use readMaxContrib() function 
3) use_bins = true , use readMaxContrib() function 
4) all of the above, plus use_replane = true, fill in contract/expand thresholds

*/

public class Plane {
	
	PlaneConfig planeConfig = null;
	class PlaneConfig {
		//params for variation 4.
		ConcurrentLinkedQueue<Plane> planeReuseQueue = new ConcurrentLinkedQueue<Plane>();
		boolean use_replane = false;
		int contract_items_per_plane = 4; //OR
		double contract_value_at_plane = 0.5;
		int expand_items_per_plane = 8; //AND
		double expand_value_at_plane = 1;

		int num_values;
		int cascade_depth;
		boolean use_bins = false;
		double max_contrib =  0.1;
		boolean use_min_contrib_frac = false;
		double min_contrib_frac = 0.0001;
	}
	
	double[][][][] vals = null;
	Vector<double[]>[][] bins = null; //need bin members x and y coordinates!
	Vector<double[]>[][] bin_coords = null; //need bin members x and y coordinates!
	int item_count = 0;
	
	Plane[][] re_plane = null; //need bin members x and y coordinates!

	//int max_items_per_plane
	
	double x_up_shift = 0;
	double y_up_shift = 0;
	double x_absolute_shift = 0;
	double y_absolute_shift = 0;
	double xy_down_scale = 1;
	double xy_up_scale = 1;
	double xy_absolute_scale = 1;
	
	double translate_x_down(double x) { return (x - x_up_shift)*xy_down_scale; }
	double translate_y_down(double y) { return (y - y_up_shift)*xy_down_scale; }
	double translate_x_up(double x) { return x*xy_up_scale + x_up_shift; }
	double translate_y_up(double y) { return y*xy_up_scale + y_up_shift; }
	double translate_x_absolute(double x) { return x*xy_absolute_scale + x_absolute_shift; }
	double translate_y_absolute(double y) { return y*xy_absolute_scale + y_absolute_shift; }

	//note: replaning is relative to total value, not contribution.
	//highest variation, each compute unit maintains its own map, depth is done relative to the compute unit.
	//Queue<Plane> recycle_planes = new Queue<Plane>();

	//cascade depth  = log(resolution)
	public void init(PlaneConfig planeConfig) {
		this.planeConfig = planeConfig;


		vals = new double[planeConfig.cascade_depth][][][];
		int size = 1;
	
		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			vals[i] = new double[size][size][planeConfig.num_values];
			size *= 2;
		}
		if( planeConfig.use_bins) {
			size /= 2;
			bins = new Vector[size][size];
			bin_coords = new Vector[size][size];
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					bins[j][k] = new Vector<double[]>();
					bin_coords[j][k] = new Vector<double[]>();
				}				
			}
		}
		
	}
	
	public void expand(int xi, int yi) {
		if( re_plane[yi][xi] != null) {
			System.err.println("error!  plane already expanded!");
			return;
		}

		Plane p = planeConfig.planeReuseQueue.poll();
		if( p == null) {
			p = new Plane();
			p.init(planeConfig);
		} else {
			p.reset();
		}
		re_plane[yi][xi] = p;
		
		double scale =  1.0/(double)(0x01 << (planeConfig.cascade_depth-1));
		p.xy_up_scale = scale;
		p.xy_down_scale = 1.0/scale;
		p.xy_absolute_scale = p.xy_up_scale*this.xy_absolute_scale;
		p.x_absolute_shift = this.x_absolute_shift + p.xy_absolute_scale * (double)xi;
		p.y_absolute_shift = this.y_absolute_shift + p.xy_absolute_scale * (double)yi;

		p.x_up_shift = scale * (double)xi;
		p.y_up_shift = scale * (double)yi;
		
		Vector<double[]> bin = bins[yi][xi];
		Vector<double[]> bin_coord = bin_coords[yi][xi];
		for( int i = 0; i < bin.size(); i++) {
			double[] xx = bin_coord.get(i);
			double[] vv = bin.get(i);
			p.add(xx[0], xx[1], vv);
		}
		
	}
	
	public void contract(int ix, int iy) {

		Plane p = re_plane[iy][ix];
		re_plane[iy][ix] = null;
		planeConfig.planeReuseQueue.offer(p);
	}
	
	public void reset() {
		int size = 1;
		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					for( int l = 0; l < planeConfig.num_values; l++) {
						vals[i][j][k][l] = 0;
					}	
				}				
			}
			size *= 2;
		}
		if( planeConfig.use_bins) {
			size /= 2;
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					bins[j][k] = new Vector<double[]>();
					bin_coords[j][k] = new Vector<double[]>();
					re_plane[j][k] = null;
				}				
			}
		}
	}
	
	public void subtract(double x, double y, double[] _vals) {
		x = translate_x_down(x);
		y = translate_y_down(y);
		
		addScaled(x,y,_vals,-1);
		if( planeConfig.use_bins) {
			item_count--;
			int depth = planeConfig.cascade_depth-1;
			double scale =  (double)(0x01 << (depth));
			int xi0 = (int)(x*scale);
			int yi0 = (int)(y*scale);
			Vector<double[]> bin = bins[yi0][xi0];
			Vector<double[]> bin_coord = bin_coords[yi0][xi0];
			int ndx = bin.indexOf(_vals);
			bin.remove(ndx);
			bin_coord.remove(ndx);
			
			Plane p = re_plane[yi0][xi0];
			if( p != null) {
				p.subtract(x,y,_vals);
				
				boolean ok = true;
				if( item_count > planeConfig.contract_items_per_plane) {
					for( int i = 0; i < planeConfig.num_values; i++) {
						if ( Math.abs(vals[depth][yi0][xi0][i]) > planeConfig.contract_value_at_plane) {
							ok = false;
							break;
						}
					}
				}
				if( ok)
					contract(xi0,yi0);
			}
		}
	}
	public void add(double x, double y, double[] _vals) {
		x = translate_x_down(x);
		y = translate_y_down(y);
		
		double xi = x;
		double yi = y;

		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			int xi0 = (int)Math.floor(xi);
			int yi0 = (int)Math.floor(yi);
			for( int j = 0; j < planeConfig.num_values; j++)
				vals[i][yi0][xi0][j] += _vals[j];
			xi *= 2;
			yi *= 2;
		}
		if( planeConfig.use_bins) {
			item_count++;
			int depth = planeConfig.cascade_depth-1;
			double scale =  (double)(0x01 << (depth));
			int xi0 = (int)(x*scale);
			int yi0 = (int)(y*scale);
			Vector<double[]> bin = bins[yi0][xi0];
			bin.add(_vals);
			Vector<double[]> bin_coord = bin_coords[yi0][xi0];
			bin_coord.add(new double[]{x,y});
			
			Plane p = re_plane[yi0][xi0];
			if( p != null)
				p.add(x,y,_vals);
			
			if( p == null) {
				boolean ok = false;
				if( item_count > planeConfig.expand_items_per_plane) {
					for( int i = 0; i < planeConfig.num_values; i++) {
						if ( Math.abs(vals[depth][yi0][xi0][i]) > planeConfig.expand_value_at_plane) {
							ok = true;
							break;
						}
					}
				}
				if( ok)
					expand(xi0,yi0);
			}
		}
	}
	
	public void addScaled(double x, double y, double[] _vals, double scale) {
		double xi = x;
		double yi = y;
		if( scale != -1 && planeConfig.use_bins) {
			System.err.println("don't use addScaled if you are using bins!  instead subtract then add!");
			return;
		}

		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			int xi0 = (int)Math.floor(xi);
			int yi0 = (int)Math.floor(yi);
			for( int j = 0; j < planeConfig.num_values; j++)
				vals[i][yi0][xi0][j] += _vals[j]*scale;
			xi *= 2;
			yi *= 2;
		}
	}
	public double[] read(double x, double y) {
		double[] ret = new double[planeConfig.num_values];
		read(x,y,0,0,0,ret);
		return ret;
	}
	

	//enclosed volume must be less than squared distance at closest approach.
	//otherwise, recurse in a level.
	private void read(double x, double y, int depth, int xi, int yi, double[] _vals) {
		double d = getSquaredDistance(x,y,depth,xi,yi);
		if( d > getEnclosedVolume(depth) || depth == planeConfig.cascade_depth-1) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				_vals[i] += vals[depth][yi][xi][i] / d;
			}
		} else {
			xi *= 2;
			yi *= 2;
			depth++;
			read(x,y,depth,xi+0,yi+0,_vals);
			read(x,y,depth,xi+1,yi+0,_vals);
			read(x,y,depth,xi+0,yi+1,_vals);
			read(x,y,depth,xi+1,yi+1,_vals);
		}
	}
	
	
	public double[] readMaxContrib(double x, double y) {
		double[] ret = new double[planeConfig.num_values];
		readMaxContrib(x,y,0,0,0,ret);
		return ret;
	}
	
	public double[] readWithLocalSubtract(double x, double y, double px, double py, double[] sub) {
		double[] ret = new double[planeConfig.num_values];
		readWithLocalSubtract(x,y,0,0,0,ret,px,py,sub);
		return ret;
	}

	private void readWithLocalSubtract(double x, double y, int depth, int xi, int yi, double[] _vals, double px, double py, double[] sub) {
		double d = getSquaredDistance(x,y,depth,xi,yi);
		double max = planeConfig.max_contrib * d;
		double min_frac = planeConfig.min_contrib_frac * d;
		boolean containsSubtractPoint = containsAbsolutePoint( depth,  yi,  xi,  px,  py);
		
		//if it doesn't contain the subtract point, then we don't need to check that anymore going forward.
		if( !containsSubtractPoint) {
			readMaxContrib(x,y,depth,xi,yi,_vals);
			return;
		}
		
		boolean ok = true;
		//if using bins, then at max depth, add individual particles (becomes n^2 at max  depth)
		if( planeConfig.use_bins && depth == planeConfig.cascade_depth-1) {
			if( re_plane[yi][xi] != null) {
				re_plane[yi][xi].readWithLocalSubtract(x,y,0,0,0,_vals,px,py,sub);
				return;
			}
			Vector<double[]> bin = bins[yi][xi];
			Vector<double[]> bin_coord = bin_coords[yi][xi];
			for( int i = 0; i < bin.size(); i++) {
				double[] xx = bin_coord.get(i);
				//skip if is the subtract point
				if( xx[0] == px && xx[1] == py) {
					continue;
				}
				double[] vv = bin.get(i);
				double xd = xx[0]-x;
				double yd = xx[1]-y;

				double rsd = 1.0/(xd*xd+yd*yd);
				for( int j = 0; j < _vals.length; j++) {
					_vals[j] += vv[j]*rsd;
				}
			}
			return;
		}
		if( depth != planeConfig.cascade_depth-1) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				if( planeConfig.use_min_contrib_frac) {
					if ( Math.abs((vals[depth][yi][xi][i]-sub[i])) < min_frac*_vals[i]) {
						continue;
					}
				}
				if ( Math.abs((vals[depth][yi][xi][i]-sub[i])) > max) {
					ok = false;
					break;
				}
			}
		}
		if( ok) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				_vals[i] += (vals[depth][yi][xi][i]-sub[i]) / d;
			}
			return;
		}

		xi *= 2;
		yi *= 2;
		depth++;
		readWithLocalSubtract(x,y,depth,xi+0,yi+0,_vals,px,py,sub);
		readWithLocalSubtract(x,y,depth,xi+1,yi+0,_vals,px,py,sub);
		readWithLocalSubtract(x,y,depth,xi+0,yi+1,_vals,px,py,sub);
		readWithLocalSubtract(x,y,depth,xi+1,yi+1,_vals,px,py,sub);
	}
	
	public boolean containsAbsolutePoint(int depth, int yi, int xi, double px, double py) {
		double mult = 1.0/(double)(0x01 << (depth));
		double x0 = mult*(double)xi;
		double y0 = mult*(double)yi;
		double x1 = x0+mult;
		double y1 = y0+mult;
		x0 = translate_x_absolute(x0);
		y0 = translate_y_absolute(y0);
		x1 = translate_x_absolute(x1);
		y1 = translate_y_absolute(y1);
		if( px < x0 || px > x1 || py < y0 || py > y1)
			return false;
		return true;
	}


	//enclosed volume must be less than squared distance at closest approach.
	//otherwise, recurse in a level.
	//make read with local subtract function
	private void readMaxContrib(double x, double y, int depth, int xi, int yi, double[] _vals) {
		double d = getSquaredDistance(x,y,depth,xi,yi);
		double max = planeConfig.max_contrib * d;
		double min_frac = planeConfig.min_contrib_frac * d;
		
		boolean ok = true;
		//if using bins, then at max depth, add individual particles (becomes n^2 at max  depth)
		if( planeConfig.use_bins && depth == planeConfig.cascade_depth-1) {
			if( re_plane[yi][xi] != null) {
				re_plane[yi][xi].readMaxContrib(x,y,0,0,0,_vals);
				return;
			}
			Vector<double[]> bin = bins[yi][xi];
			Vector<double[]> bin_coord = bin_coords[yi][xi];
			for( int i = 0; i < bin.size(); i++) {
				double[] xx = bin_coord.get(i);
				double[] vv = bin.get(i);
				double xd = xx[0]-x;
				double yd = xx[1]-y;
				double rsd = 1.0/(xd*xd+yd*yd);
				for( int j = 0; j < _vals.length; j++) {
					_vals[j] += vv[j]*rsd;
				}
			}
			return;
		}
		if( depth != planeConfig.cascade_depth-1) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				if( planeConfig.use_min_contrib_frac) {
					if ( Math.abs(vals[depth][yi][xi][i]) < min_frac*_vals[i]) {
						continue;
					}
				}
				if ( Math.abs(vals[depth][yi][xi][i]) > max) {
					ok = false;
					break;
				}
			}
		}
		if( ok) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				_vals[i] += vals[depth][yi][xi][i] / d;
			}
			return;
		}

		xi *= 2;
		yi *= 2;
		depth++;
		readMaxContrib(x,y,depth,xi+0,yi+0,_vals);
		readMaxContrib(x,y,depth,xi+1,yi+0,_vals);
		readMaxContrib(x,y,depth,xi+0,yi+1,_vals);
		readMaxContrib(x,y,depth,xi+1,yi+1,_vals);
	}
	
	private double getSquaredDistance(double x, double y, int depth, int xi, int yi) {
		double mult = 1.0/(double)(0x01 << (depth));
		double x0 = mult*(double)xi + mult/2;
		double y0 = mult*(double)yi + mult/2;
		
		double xd = x - translate_x_absolute(x0);
		double yd = y - translate_y_absolute(y0);

		return xd*xd + yd*yd;
	}
	
	private double getEnclosedVolume(int depth) {
		return xy_absolute_scale*xy_absolute_scale/(double)(0x01 << (depth*2));
	}
}

happyjack27 · Post by **happyjack27** » Sun Jun 30, 2013 6:00 pm

implemented variation #4.

now all that's left is #5: adding code for memory space, transfer, and execution control for non-uniform-memory-access platforms. (e.g. supercomputers)

Code: Select all


//use unit square
import java.util.*;
import java.util.concurrent.*;

/*
WHAT THIS DOES:
this calculates the total force at a point x,y, where the force is a sum of forces from N points.
where the forces follow the inverse-square law.  that is, the force is proportional to 1/(distance squared).
examples are gravity and the electro-magnetic force.


HOW TO USE:
*ADDING: to add a point of force, e.g. a charge carrier or a mass, at coordinates x,y, call add(x,y,vals),
where x,y is the coordinate, and vals is an array of force singularities at that point (e.g. electric charge, mass, etc.)

*SUBTRACTING: to subtract a point of force, call subtract(x,y,vals)

*MOVING/CHANGING: to move a point of force or change its force, first subtract the old values from the old spot, then add the new values to the new spot.

*CALCULATING: to find the total force on a point x,y, call readMaxContrib(x,y,err) , where err is a quantity proportional to the error tolerance.

*NOTE: if you are calculating the force on a force point that you have already adding to the system, you have to first subtract it from the system, otherwise the calculated force will be singular at that point. (remember to add it back when you're done)

variations:
*1) fixed depth
**2) fixed depth w/threshold
***3) fixed depth w/threshold & bins
****4) dynamic depth w/threshold & bins (instead of binning, add or remove depth - bins are there so can be done)
*****5) dynamic compute space partitioning for non-uniform memory architecture; aka distributed n-body
*****   to be designed
*****   each bottom of a bin extension (thus holding x individual particles) is given a computation unit, with links to the up levels.  make sure all levels are done before iterating.


for 4):
*make add and subtract also recurse to the replane
*make read also recurse to the replane
*make adjust x and y scale and translation for recursion.


for replaning, first add a distance scaling and shifting function (scale, xshift,yshift).

shown in the code below are the first three variations

1) use_bins = false, use read() function
2) use_bins = false, use readMaxContrib() function 
3) use_bins = true , use readMaxContrib() function 
4) all of the above, plus use_replane = true, fill in contract/expand thresholds

*/

public class Plane {
	
	PlaneConfig planeConfig = null;
	class PlaneConfig {
		//params for variation 4.
		ConcurrentLinkedQueue<Plane> planeReuseQueue = new ConcurrentLinkedQueue<Plane>();
		boolean use_replane = false;
		int contract_items_per_plane = 4; //OR
		double contract_value_at_plane = 0.5;
		int expand_items_per_plane = 8; //AND
		double expand_value_at_plane = 1;

		int num_values;
		int cascade_depth;
		boolean use_bins = false;
		boolean use_min_contrib_frac = false;
		double min_contrib_frac = 0.0001;
	}
	
	double[][][][] vals = null;
	Vector<double[]>[][] bins = null; //need bin members x and y coordinates!
	Vector<double[]>[][] bin_coords = null; //need bin members x and y coordinates!
	int item_count = 0;
	
	Plane[][] re_plane = null; //need bin members x and y coordinates!

	//int max_items_per_plane
	
	double x_up_shift = 0;
	double y_up_shift = 0;
	double x_absolute_shift = 0;
	double y_absolute_shift = 0;
	double xy_down_scale = 1;
	double xy_up_scale = 1;
	double xy_absolute_scale = 1;
	
	double translate_x_down(double x) { return (x - x_up_shift)*xy_down_scale; }
	double translate_y_down(double y) { return (y - y_up_shift)*xy_down_scale; }
	double translate_x_up(double x) { return x*xy_up_scale + x_up_shift; }
	double translate_y_up(double y) { return y*xy_up_scale + y_up_shift; }
	double translate_x_absolute(double x) { return x*xy_absolute_scale + x_absolute_shift; }
	double translate_y_absolute(double y) { return y*xy_absolute_scale + y_absolute_shift; }

	//note: replaning is relative to total value, not contribution.
	//highest variation, each compute unit maintains its own map, depth is done relative to the compute unit.
	//Queue<Plane> recycle_planes = new Queue<Plane>();

	//cascade depth  = log(resolution)
	public void init(PlaneConfig planeConfig) {
		this.planeConfig = planeConfig;


		vals = new double[planeConfig.cascade_depth][][][];
		int size = 1;
	
		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			vals[i] = new double[size][size][planeConfig.num_values];
			size *= 2;
		}
		if( planeConfig.use_bins) {
			size /= 2;
			bins = new Vector[size][size];
			bin_coords = new Vector[size][size];
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					bins[j][k] = new Vector<double[]>();
					bin_coords[j][k] = new Vector<double[]>();
				}				
			}
		}
		
	}
	
	public void expand(int xi, int yi) {
		if( re_plane[yi][xi] != null) {
			System.err.println("error!  plane already expanded!");
			return;
		}

		Plane p = planeConfig.planeReuseQueue.poll();
		if( p == null) {
			p = new Plane();
			p.init(planeConfig);
		} else {
			p.reset();
		}
		re_plane[yi][xi] = p;
		
		double scale =  1.0/(double)(0x01 << (planeConfig.cascade_depth-1));
		p.xy_up_scale = scale;
		p.xy_down_scale = 1.0/scale;
		p.xy_absolute_scale = p.xy_up_scale*this.xy_absolute_scale;
		p.x_absolute_shift = this.x_absolute_shift + p.xy_absolute_scale * (double)xi;
		p.y_absolute_shift = this.y_absolute_shift + p.xy_absolute_scale * (double)yi;

		p.x_up_shift = scale * (double)xi;
		p.y_up_shift = scale * (double)yi;
		
		Vector<double[]> bin = bins[yi][xi];
		Vector<double[]> bin_coord = bin_coords[yi][xi];
		for( int i = 0; i < bin.size(); i++) {
			double[] xx = bin_coord.get(i);
			double[] vv = bin.get(i);
			p.add(xx[0], xx[1], vv);
		}
		
	}
	
	public void contract(int ix, int iy) {

		Plane p = re_plane[iy][ix];
		re_plane[iy][ix] = null;
		planeConfig.planeReuseQueue.offer(p);
	}
	
	public void reset() {
		int size = 1;
		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					for( int l = 0; l < planeConfig.num_values; l++) {
						vals[i][j][k][l] = 0;
					}	
				}				
			}
			size *= 2;
		}
		if( planeConfig.use_bins) {
			size /= 2;
			for( int j = 0; j < size; j++) {
				for( int k = 0; k < size; k++) {
					bins[j][k] = new Vector<double[]>();
					bin_coords[j][k] = new Vector<double[]>();
					re_plane[j][k] = null;
				}				
			}
		}
	}
	
	public void subtract(double x, double y, double[] _vals) {
		x = translate_x_down(x);
		y = translate_y_down(y);
		
		addScaled(x,y,_vals,-1);
		if( planeConfig.use_bins) {
			item_count--;
			int depth = planeConfig.cascade_depth-1;
			double scale =  (double)(0x01 << (depth));
			int xi0 = (int)(x*scale);
			int yi0 = (int)(y*scale);
			Vector<double[]> bin = bins[yi0][xi0];
			Vector<double[]> bin_coord = bin_coords[yi0][xi0];
			int ndx = bin.indexOf(_vals);
			bin.remove(ndx);
			bin_coord.remove(ndx);
			
			Plane p = re_plane[yi0][xi0];
			if( p != null) {
				p.subtract(x,y,_vals);
				
				boolean ok = true;
				if( item_count > planeConfig.contract_items_per_plane) {
					for( int i = 0; i < planeConfig.num_values; i++) {
						if ( Math.abs(vals[depth][yi0][xi0][i]) > planeConfig.contract_value_at_plane) {
							ok = false;
							break;
						}
					}
				}
				if( ok)
					contract(xi0,yi0);
			}
		}
	}
	public void add(double x, double y, double[] _vals) {
		x = translate_x_down(x);
		y = translate_y_down(y);
		
		double xi = x;
		double yi = y;

		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			int xi0 = (int)Math.floor(xi);
			int yi0 = (int)Math.floor(yi);
			for( int j = 0; j < planeConfig.num_values; j++)
				vals[i][yi0][xi0][j] += _vals[j];
			xi *= 2;
			yi *= 2;
		}
		if( planeConfig.use_bins) {
			item_count++;
			int depth = planeConfig.cascade_depth-1;
			double scale =  (double)(0x01 << (depth));
			int xi0 = (int)(x*scale);
			int yi0 = (int)(y*scale);
			Vector<double[]> bin = bins[yi0][xi0];
			bin.add(_vals);
			Vector<double[]> bin_coord = bin_coords[yi0][xi0];
			bin_coord.add(new double[]{x,y});
			
			Plane p = re_plane[yi0][xi0];
			if( p != null)
				p.add(x,y,_vals);
			
			if( p == null) {
				boolean ok = false;
				if( item_count > planeConfig.expand_items_per_plane) {
					for( int i = 0; i < planeConfig.num_values; i++) {
						if ( Math.abs(vals[depth][yi0][xi0][i]) > planeConfig.expand_value_at_plane) {
							ok = true;
							break;
						}
					}
				}
				if( ok)
					expand(xi0,yi0);
			}
		}
	}
	
	public void addScaled(double x, double y, double[] _vals, double scale) {
		double xi = x;
		double yi = y;
		if( scale != -1 && planeConfig.use_bins) {
			System.err.println("don't use addScaled if you are using bins!  instead subtract then add!");
			return;
		}

		for( int i = 0; i < planeConfig.cascade_depth; i++) {
			int xi0 = (int)Math.floor(xi);
			int yi0 = (int)Math.floor(yi);
			for( int j = 0; j < planeConfig.num_values; j++)
				vals[i][yi0][xi0][j] += _vals[j]*scale;
			xi *= 2;
			yi *= 2;
		}
	}
	public double[] read(double x, double y) {
		double[] ret = new double[planeConfig.num_values];
		read(x,y,0,0,0,ret);
		return ret;
	}
	

	//enclosed volume must be less than squared distance at closest approach.
	//otherwise, recurse in a level.
	private void read(double x, double y, int depth, int xi, int yi, double[] _vals) {
		double d = getSquaredDistance(x,y,depth,xi,yi);
		if( d > getEnclosedVolume(depth) || depth == planeConfig.cascade_depth-1) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				_vals[i] += vals[depth][yi][xi][i] / d;
			}
		} else {
			xi *= 2;
			yi *= 2;
			depth++;
			read(x,y,depth,xi+0,yi+0,_vals);
			read(x,y,depth,xi+1,yi+0,_vals);
			read(x,y,depth,xi+0,yi+1,_vals);
			read(x,y,depth,xi+1,yi+1,_vals);
		}
	}
	
	
	public double[] readMaxContrib(double x, double y, double max_contrib) {
		double[] ret = new double[planeConfig.num_values];
		readMaxContrib(x,y,0,0,0,ret,max_contrib);
		return ret;
	}
	

	//enclosed volume must be less than squared distance at closest approach.
	//otherwise, recurse in a level.
	private void readMaxContrib(double x, double y, int depth, int xi, int yi, double[] _vals, double max_contrib) {
		double d = getSquaredDistance(x,y,depth,xi,yi);
		double max = max_contrib * d;
		double min_frac = planeConfig.min_contrib_frac * d;
		
		boolean ok = true;
		//if using bins, then at max depth, add individual particles (becomes n^2 at max  depth)
		if( planeConfig.use_bins && depth == planeConfig.cascade_depth-1) {
			if( re_plane[yi][xi] != null) {
				re_plane[yi][xi].readMaxContrib(x,y,max_contrib);
				return;
			}
			Vector<double[]> bin = bins[yi][xi];
			Vector<double[]> bin_coord = bin_coords[yi][xi];
			for( int i = 0; i < bin.size(); i++) {
				double[] xx = bin_coord.get(i);
				double[] vv = bin.get(i);
				double xd = xx[0]-x;
				double yd = xx[1]-y;
				double rsd = 1.0/(xd*xd+yd*yd);
				for( int j = 0; j < _vals.length; j++) {
					_vals[j] += vv[j]*rsd;
				}
			}
			return;
		}
		if( depth != planeConfig.cascade_depth-1) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				if( planeConfig.use_min_contrib_frac) {
					if ( Math.abs(vals[depth][yi][xi][i]) < min_frac*_vals[i]) {
						continue;
					}
				}
				if ( Math.abs(vals[depth][yi][xi][i]) > max) {
					ok = false;
					break;
				}
			}
		}
		if( ok) {
			for( int i = 0; i < planeConfig.num_values; i++) {
				_vals[i] += vals[depth][yi][xi][i] / d;
			}
			return;
		}

		xi *= 2;
		yi *= 2;
		depth++;
		readMaxContrib(x,y,depth,xi+0,yi+0,_vals, max_contrib);
		readMaxContrib(x,y,depth,xi+1,yi+0,_vals, max_contrib);
		readMaxContrib(x,y,depth,xi+0,yi+1,_vals, max_contrib);
		readMaxContrib(x,y,depth,xi+1,yi+1,_vals, max_contrib);
	}
	
	private double getSquaredDistance(double x, double y, int depth, int xi, int yi) {
		double mult = 1.0/(double)(0x01 << (depth));
		double x0 = mult*(double)xi + mult/2;
		double y0 = mult*(double)yi + mult/2;
		
		double xd = x - translate_x_absolute(x0);
		double yd = y - translate_y_absolute(y0);

		return xd*xd + yd*yd;
	}
	
	private double getEnclosedVolume(int depth) {
		return xy_absolute_scale*xy_absolute_scale/(double)(0x01 << (depth*2));
	}
}

D Tibbets · Post by **D Tibbets** » Sun Jun 30, 2013 7:22 pm

I was going to ask for specifics for programing language / computer requirements, but your second post narrows the field.

Dan Tibbets

happyjack27 · Post by **happyjack27** » Sun Jun 30, 2013 8:16 pm

D Tibbets wrote:I was going to ask for specifics for programing language / computer requirements, but your second post narrows the field.

Dan Tibbets

i wrote it in Java. it can be converted to any object-orientated language (e.g. C++) pretty trivially. (or any language at all with a little more effort.)

The algorithm is designed to scale well onto a cluster or more generally distributed-computing environment, but those parts are yet to be written. By scale well i mean it's designed so that the bandwidth needs scale very slowly with the number of particles, so when N is large enough so that the problem becomes bandwidth-limited, it still scales very well. (However what you see there doesn't include code for that; it doesn't include code for taking maximum advantage of the network, memory, and compute-node topology of a large compute cluster. That remains to be written. That's variation 5 -- and, come to think of it -- variation 6, which would transmit updates between nodes based on deltas, rather than on a fixed schedule. It's designed with that in mind, though.)

But yeah, you should be able to plug that class into any java program and use it "out-of-the-box", as it were; without modification. And a good programmer should be able to convert it into whatever other environment you might want to use it in.

kcdodd · Post by **kcdodd** » Sun Jun 30, 2013 8:21 pm

Could you give a brief outline, in plain words, of what this code does?

happyjack27 · Post by **happyjack27** » Sun Jun 30, 2013 8:57 pm

WHAT THIS DOES:
this calculates the total force at a point x,y, where the force is a sum of forces from N points.
where the forces follow the inverse-square law. that is, the force is proportional to 1/(distance squared).
examples are gravity and the electro-magnetic force.

HOW TO USE:
*ADDING: to add a point of force, e.g. a charge carrier or a mass, at coordinates x,y, call add(x,y,vals),
where x,y is the coordinate, and vals is an array of force singularities at that point (e.g. electric charge, mass, etc.)

*SUBTRACTING: to subtract a point of force, call subtract(x,y,vals)

*MOVING/CHANGING: to move a point of force or change its force, first subtract the old values from the old spot, then add the new values to the new spot.

*CALCULATING: to find the total force on a point x,y, call readMaxContrib(x,y,err) , where err is a quantity proportional to the error tolerance.

*NOTE: if you are calculating the force on a force point that you have already adding to the system, you have to first subtract it from the system, otherwise the calculated force will be singular at that point. (remember to add it back when you're done)

HOW IT WORKS:
*the algorithm takes advantage of the fact that the contribution from a force carrier drops at 1/(distance squared), and thus the needed resolution as you go further away from the measurement point drops at the same rate. For instance, a 2x2 block at 2 meters away effects the result of the calculation to the same approximate level of precision as a 1x1 block at 1 meters away. roughly speaking, the further away you look, the blurrier the number can be, because the error contribution drops with the distance squared.

*reasoning: using the gibbs inequality https://en.wikipedia.org/wiki/Gibbs%27_inequality (or fano's inequality?! https://en.wikipedia.org/wiki/Fano%27s_inequality in any case, you're minimizing the KL-divergence https://en.wikipedia.org/wiki/Relative_entropy , and thus the wasted bits and/or compute cycles ) one can reason that to maximize overall precision per unit compute time, the compute time spent on a contribution should be proportional to the number of bits of precision by which it affects the result. for adding together a sum of forces this means simply: when the expected logarithm of the value of the numbers you add together is roughly proportional to the time it taks to compute them, your algorithm is maximally efficient for the given level of precision. okay, i know that wasn't english. sorry, it's hard to translate that.

*the algorithm stores a series of grid-based approximations of the positions of force carriers at different resolutions, each separated by an order of magnitude resolution, e.g. a 1x1 grid, a 2x2 grid, a 4x4 grid, 8x8 grid, etc, up to your desired level of resolution. when you add or remove a particle, it updates each grid, accordingly, then stores the exact particle position in the final grid.

*then to get the sum of forces at a point, it sweeps through the grid from upper left to lower right, starting at the coarsest-level grid first. if the volume enclosed by that grid box times the total sum of forces from force carriers in that grid divided by the squared distance from the point you are summing the forces at exceeds a certain threshold (related to the desired level of accuracy), then it goes up to the next higher resolution grid (only for the current grid box!), and repeats.

**for variation 3, if it reaches the highest resolution grid, it does the standard nxn n-body algorithm (sums all the particles exactly), or for variation 4 and above, if the compute cost of that is too high, it creates another whole set of grids of different resolutions for that particular grid point. (thus recursing the entire algorithm again.) (these are the expand and contract functions -- they add or remove a level of recursion, respectively)

ultimately this translates to an average compute time proportional to the number of particles times the logarithm of the desired degree of accuracy.

perhaps more importantly, the algorithm automatically (and dynamically!) partitions the problem space into spaces of high computational dependance. these spaces can then be distributed to different compute nodes, with a minimal amount of communication needed to be transferred between the compute nodes. because each compute node only needs a "blurry" approximation of its neighboring compute nodes results, and an even "blurrier" approximation of ones "further" away.

happyjack27 · Post by **happyjack27** » Sun Jun 30, 2013 10:27 pm

I thought up a better explanation for the reasoning: the maximum decrease in error per unit computation is achieved when the computation unit is used on the variable whose final error contribution is decreased more by one additional compute unit than any other variable that compute unit could be use on.

This sounds like a convoluted tautology, and it is. But you'll notice that if you keep on repeating this choice - using each next compute unit on the variance where it makes the biggest impact, you're always pushing it towards the point where they all have the same impact. Thus, compute units are used most efficiently when the choice of which variable to use one less compute unit on effects the final accuracy the least.

One of the main design goals of the algorithm is to always use the next available compute unit in this way. And that's the inspiration for using a resolution scaled to the distance.

kcdodd · Post by **kcdodd** » Sun Jun 30, 2013 11:02 pm

So, on short length scales it's n-body, and on large scale lengths it's grid based?

happyjack27 · Post by **happyjack27** » Sun Jun 30, 2013 11:38 pm

Pretty much. The "grid-based" part is really multi-resolution-grid-array-based. a series of grids each at twice the resolution of the previous. the reolution is chosen on a box-by-box basis to be the one just under a given error contribution threshold - on average that'll turn out to be a resolution inversely proportional to the distance.)

The "short length scale" isn't really a length threshold - it's an error contribution threshold combined with a particle count threshold. Basically, if its cheaper per unit accuracy to just do a brute-force n-body summation, then it switches to that.

So, close, but without those subtle differences it would be a much slower algorithm.

happyjack27 · Post by **happyjack27** » Mon Jul 01, 2013 1:17 am

The fact that you have to subtract the point your reading the forces on poses a potential issue for concurrency, but I've just solved this - you can keep the point as a local variable and just subtract it locally at every recursion step, resulting in a worst case cost of O(log(resolution)), which is under the complexity time, so reduces to zero cost aka free.

kcdodd · Post by **kcdodd** » Mon Jul 01, 2013 2:07 am

Does the "cost" take into account the issue that as your field resolution gets finer, the time-step needs to get smaller too? Having smoother fields lets you step longer. I think that's why most people don't worry about doing n-body even on short length scales. Collisions are slow enough at these densities and temperatures that you can approximate that without doing n-body collisions, if you need collisions at all. It might even overdo the effect of collisions if you do n-body at short lengths because that isn't really physical. The computer doesn't have the 10^12 particles to get it right. When people do two-body collisions they don't do a single coulomb collision. The other particle is for momentum and energy balance only. The scattering angles and probability of collision are computed based on the entire approximated distribution.

I'm not trying to bash your algorithm. The multi-resolution grid might give a pretty good increase to speed over a full grid. I'm just not convinced doing n-body stuff is a good thing.

happyjack27 · Post by **happyjack27** » Mon Jul 01, 2013 3:09 am

It's a general purpose algorithm. The goal is to allow scaling to extremely large particle counts that would otherwise be practically impossible. Other applications would be simulating climate change or molecular biology on very large scales. There will always be problems in which other simplifying assumptions can be made which produce more efficient solution than even the best possible n-body. That doesn't concern me; that's simpy a mathematical reality. The focus here is just on those problems where n-body, for whatever reason, was chosen as the perspective from witch to look at the problem. (And perhaps by having a different computational complexity time, this algorithm broadens the scope where that perspective would be practical.)

As regards collisions, one can certainly add that math in the short length scales when it switches to classic n-body, and one can even program in a length threshold for that if they need it to switch at a certain scale (e.g. Plank length or Bohr radius). On the other hand it may be more practical to use a more gestalt statistical method, depending on the application, and this algorithm doesn't eliminate that possibility.

As regards time resolution, the manifestation shows simply maximizes the "frame" rate for a given accuracy. if you want to take into account relative velocities, and thus expected time to collision, the part of the algorithm that "zooms in", so to speak (in the readmaxcontrib function) can be altered accordingly. For instance, the velocity vector can be stored in the vals array, an the max error contribution adjusted proportionally to the relative velocity.

Point being, roughly speaking, the algorithm is pretty flexible. But where the algorithm is not well suited to the problem, by all means, use a different one.

happyjack27 · Post by **happyjack27** » Mon Jul 01, 2013 3:19 am

The step time does not need to get smaller at finer resolutions. The step time is pre-given. It's part of the calculation. The concern is the "frame rate", which is how long it takes to compute a "step". The goal of the algorithm is to minimize the time it takes to compute a "step" as the number of particles grows to extremely large numbers. If relative particle velocity effects the accuracy of the simulation (such as for magnetic fields), bother the global time step and the local spatial resolution can be adjusted on-the-fly to compensate. (But the global time step is for the whole system, so can only be based on prior data; e.g. data from the previous iteration.)

happyjack27 · Post by **happyjack27** » Mon Jul 01, 2013 3:26 am

To clarify how the algorithm is to be used: if there are x particles, there can be x processors that all, concurrently, calls the readMaxContrib function for the coordinates of a particle. Each particle's position and velocity is updated in parrallel.

happyjack27 · Post by **happyjack27** » Mon Jul 01, 2013 3:50 am

You pretty much have to do the whole thing at synchronously from the n-body perspective. Because to know the forces on particles x at time t, you need to know the position and velocities of _ALL_ other particles _AT TIME T_. So it really just simplifies everything I you do everything at time T, ie synchronously. But --- it a position between two times can be interpolated in O(1) time per particle-particle interaction. Problem is you don't know what resolution you need a priori, so you don't know how far the times can be apart. Also, the interpolation is per particle pair, so you're already at O(N x N). So unless you can solve these two problems, the decision to do it synchronously is forced by computational complexity time considerations.

Talk-Polywell.org

low-complexity time algorithm for n-body simulation

low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation

Re: low-complexity time algorithm for n-body simulation