SPARTA (23 Dec 2017)
KOKKOS mode is enabled (../kokkos.cpp:39)
  using 0 GPU(s) per MPI task
  using 1 thread(s) per MPI task
package kokkos
package kokkos reduction parallel/reduce comm classic
# advect particles via VSS collisional flow on a uniform grid
# particles reflect off global box boundaries

variable            x index 10
variable            y index 10
variable            z index 10

variable            lx equal $x*1.0e-5
variable            lx equal 128*1.0e-5
variable            ly equal $y*1.0e-5
variable            ly equal 160*1.0e-5
variable            lz equal $z*1.0e-5
variable            lz equal 160*1.0e-5

variable            n equal 10*$x*$y*$z
variable            n equal 10*128*$y*$z
variable            n equal 10*128*160*$z
variable            n equal 10*128*160*160

seed	    	    12345
dimension   	    3
global              gridcut 1.0e-5

boundary	    rr rr rr

create_box  	    0 ${lx} 0 ${ly} 0 ${lz}
create_box  	    0 0.00128 0 ${ly} 0 ${lz}
create_box  	    0 0.00128 0 0.0016 0 ${lz}
create_box  	    0 0.00128 0 0.0016 0 0.0016
Created orthogonal box = (0 0 0) to (0.00128 0.0016 0.0016)
create_grid 	    $x $y $z
create_grid 	    128 $y $z
create_grid 	    128 160 $z
create_grid 	    128 160 160
WARNING: Could not acquire nearby ghost cells b/c grid partition is not clumped (../grid.cpp:381)
Created 3276800 child grid cells
  parent cells = 1
  CPU time = 0.048594 secs
  create/ghost percent = 42.8407 57.1593

balance_grid        rcb part
Balance grid migrated 3264000 cells
  CPU time = 1.60116 secs
  reassign/sort/migrate/ghost percent = 46.8283 0.0758217 44.6394 8.45655

species		    ar.species Ar
mixture		    air Ar vstream 0.0 0.0 0.0 temp 273.15

global              nrho 7.07043E22
global              fnum 7.07043E6

collide		    vss air ar.vss

create_particles    air n $n
create_particles    air n 32768000
Created 32768000 particles
  CPU time = 0.163804 secs

stats		    100
compute             temp temp
stats_style	    step cpu np nattempt ncoll c_temp

# first equilibrate with large timestep to unsort particles
# then benchmark with normal timestep

timestep 	    7.00E-8
run                 30
Memory usage per proc in Mbytes:
  particles (ave,min,max) = 7.44173 7.44173 7.44173
  grid      (ave,min,max) = 2.44772 1.51388 2.45138
  surf      (ave,min,max) = 0 0 0
  total     (ave,min,max) = 9.88944 8.9556 9.8931
Step CPU Np Natt Ncoll c_temp 
       0            0 32768000        0        0    273.14528 
      30     6.643692 32768000 32872648 23139323    273.14528 
Loop time of 6.64488 on 512 procs for 30 steps with 32768000 particles

Particle moves    = 983040000 (983M)
Cells touched     = 4896676100 (4.9B)
Particle comms    = 177106458 (177M)
Boundary collides = 26591680 (26.6M)
Boundary exits    = 0 (0K)
SurfColl checks   = 0 (0K)
SurfColl occurs   = 0 (0K)
Surf reactions    = 0 (0K)
Collide attempts  = 951866750 (952M)
Collide occurs    = 691080291 (691M)
Reactions         = 0 (0K)
Particles stuck   = 0

Particle-moves/CPUsec/proc: 288944
Particle-moves/step: 3.2768e+07
Cell-touches/particle/step: 4.98116
Particle comm iterations/step: 3.26667
Particle fraction communicated: 0.180162
Particle fraction colliding with boundary: 0.0270505
Particle fraction exiting boundary: 0
Surface-checks/particle/step: 0
Surface-collisions/particle/step: 0
Surf-reactions/particle/step: 0
Collision-attempts/particle/step: 0.968289
Collisions/particle/step: 0.703003
Reactions/particle/step: 0

Move  time (%) = 3.8841 (58.4525)
Coll  time (%) = 1.83541 (27.6214)
Sort  time (%) = 0.295502 (4.44707)
Comm  time (%) = 0.623035 (9.37617)
Modfy time (%) = 0 (0)
Outpt time (%) = 0.00632421 (0.0951742)
Other time (%) = 0.000507945 (0.00764416)

Particles: 64000 ave 64815 max 63041 min
Histogram: 1 0 20 62 109 144 113 47 14 2
Cells:      6400 ave 6400 max 6400 min
Histogram: 512 0 0 0 0 0 0 0 0 0
GhostCell: 4140.97 ave 5120 max 1538 min
Histogram: 2 4 12 36 40 80 62 66 60 150
EmptyCell: 1691.73 ave 2808 max 441 min
Histogram: 12 45 23 97 55 76 92 32 60 20
timestep 	    7.00E-9
run 		    $t
run 		    100
Memory usage per proc in Mbytes:
  particles (ave,min,max) = 7.44173 7.44173 7.44173
  grid      (ave,min,max) = 2.44772 1.51388 2.45138
  surf      (ave,min,max) = 0 0 0
  total     (ave,min,max) = 9.88944 8.9556 9.8931
Step CPU Np Natt Ncoll c_temp 
      30            0 32768000 32872648 23139323    273.14528 
     100    4.2688608 32768000  3145985  2314365    273.14528 
     130    6.0606549 32768000  3185855  2313871    273.14528 
Loop time of 6.06192 on 512 procs for 100 steps with 32768000 particles

Particle moves    = 3276800000 (3.28B)
Cells touched     = 4577336989 (4.58B)
Particle comms    = 61669849 (61.7M)
Boundary collides = 8867225 (8.87M)
Boundary exits    = 0 (0K)
SurfColl checks   = 0 (0K)
SurfColl occurs   = 0 (0K)
Surf reactions    = 0 (0K)
Collide attempts  = 307003658 (307M)
Collide occurs    = 229954508 (230M)
Reactions         = 0 (0K)
Particles stuck   = 0

Particle-moves/CPUsec/proc: 1.05577e+06
Particle-moves/step: 3.2768e+07
Cell-touches/particle/step: 1.39689
Particle comm iterations/step: 1
Particle fraction communicated: 0.0188201
Particle fraction colliding with boundary: 0.00270606
Particle fraction exiting boundary: 0
Surface-checks/particle/step: 0
Surface-collisions/particle/step: 0
Surf-reactions/particle/step: 0
Collision-attempts/particle/step: 0.0936901
Collisions/particle/step: 0.0701765
Reactions/particle/step: 0

Move  time (%) = 3.83027 (63.1858)
Coll  time (%) = 0.835348 (13.7803)
Sort  time (%) = 0.96935 (15.9908)
Comm  time (%) = 0.412678 (6.8077)
Modfy time (%) = 0 (0)
Outpt time (%) = 0.0131919 (0.217619)
Other time (%) = 0.00107963 (0.01781)

Particles: 64000 ave 64683 max 63313 min
Histogram: 7 17 37 84 113 103 77 52 15 7
Cells:      6400 ave 6400 max 6400 min
Histogram: 512 0 0 0 0 0 0 0 0 0
GhostCell: 4140.97 ave 5120 max 1538 min
Histogram: 2 4 12 36 40 80 62 66 60 150
EmptyCell: 1691.73 ave 2808 max 441 min
Histogram: 12 45 23 97 55 76 92 32 60 20

