SPARTA (23 Dec 2017)
KOKKOS mode is enabled (../kokkos.cpp:39)
  using 0 GPU(s) per MPI task
  using 2 thread(s) per MPI task
package kokkos
package kokkos reduction parallel/reduce comm classic
# advect particles via VSS collisional flow on a uniform grid
# particles reflect off global box boundaries

variable            x index 10
variable            y index 10
variable            z index 10

variable            lx equal $x*1.0e-5
variable            lx equal 128*1.0e-5
variable            ly equal $y*1.0e-5
variable            ly equal 160*1.0e-5
variable            lz equal $z*1.0e-5
variable            lz equal 160*1.0e-5

variable            n equal 10*$x*$y*$z
variable            n equal 10*128*$y*$z
variable            n equal 10*128*160*$z
variable            n equal 10*128*160*160

seed	    	    12345
dimension   	    3
global              gridcut 1.0e-5

boundary	    rr rr rr

create_box  	    0 ${lx} 0 ${ly} 0 ${lz}
create_box  	    0 0.00128 0 ${ly} 0 ${lz}
create_box  	    0 0.00128 0 0.0016 0 ${lz}
create_box  	    0 0.00128 0 0.0016 0 0.0016
Created orthogonal box = (0 0 0) to (0.00128 0.0016 0.0016)
create_grid 	    $x $y $z
create_grid 	    128 $y $z
create_grid 	    128 160 $z
create_grid 	    128 160 160
WARNING: Could not acquire nearby ghost cells b/c grid partition is not clumped (../grid.cpp:381)
Created 3276800 child grid cells
  parent cells = 1
  CPU time = 0.0319569 secs
  create/ghost percent = 64.6806 35.3194

balance_grid        rcb part
Balance grid migrated 3174400 cells
  CPU time = 0.29986 secs
  reassign/sort/migrate/ghost percent = 28.4433 1.22994 34.8112 35.5156

species		    ar.species Ar
mixture		    air Ar vstream 0.0 0.0 0.0 temp 273.15

global              nrho 7.07043E22
global              fnum 7.07043E6

collide		    vss air ar.vss

create_particles    air n $n
create_particles    air n 32768000
Created 32768000 particles
  CPU time = 0.365453 secs

stats		    100
compute             temp temp
stats_style	    step cpu np nattempt ncoll c_temp

# first equilibrate with large timestep to unsort particles
# then benchmark with normal timestep

timestep 	    7.00E-8
run                 30
Memory usage per proc in Mbytes:
  particles (ave,min,max) = 144.727 144.727 144.727
  grid      (ave,min,max) = 20.8733 20.6389 21.5764
  surf      (ave,min,max) = 0 0 0
  total     (ave,min,max) = 165.6 165.366 166.303
Step CPU Np Natt Ncoll c_temp 
       0            0 32768000        0        0    273.19912 
      30    19.057788 32768000 32879631 23143567    273.19912 
Loop time of 19.0578 on 32 procs for 30 steps with 32768000 particles

Particle moves    = 983040000 (983M)
Cells touched     = 4888460555 (4.89B)
Particle comms    = 58415298 (58.4M)
Boundary collides = 26600395 (26.6M)
Boundary exits    = 0 (0K)
SurfColl checks   = 0 (0K)
SurfColl occurs   = 0 (0K)
Surf reactions    = 0 (0K)
Collide attempts  = 951857391 (952M)
Collide occurs    = 691076486 (691M)
Reactions         = 0 (0K)
Particles stuck   = 0

Particle-moves/CPUsec/proc: 1.61194e+06
Particle-moves/step: 3.2768e+07
Cell-touches/particle/step: 4.9728
Particle comm iterations/step: 3
Particle fraction communicated: 0.0594231
Particle fraction colliding with boundary: 0.0270593
Particle fraction exiting boundary: 0
Surface-checks/particle/step: 0
Surface-collisions/particle/step: 0
Surf-reactions/particle/step: 0
Collision-attempts/particle/step: 0.968279
Collisions/particle/step: 0.702999
Reactions/particle/step: 0

Move  time (%) = 11.3186 (59.3909)
Coll  time (%) = 4.52611 (23.7494)
Sort  time (%) = 2.17403 (11.4075)
Comm  time (%) = 0.961559 (5.04548)
Modfy time (%) = 0 (0)
Outpt time (%) = 0.0774286 (0.406283)
Other time (%) = 8.20458e-05 (0.00043051)

Particles: 1.024e+06 ave 1.02573e+06 max 1.022e+06 min
Histogram: 2 1 4 2 3 5 8 3 2 2
Cells:      102400 ave 102400 max 102400 min
Histogram: 32 0 0 0 0 0 0 0 0 0
GhostCell: 19634 ave 25376 max 14024 min
Histogram: 8 0 0 0 16 0 0 0 0 8
EmptyCell: 6004.94 ave 7591 max 1936 min
Histogram: 2 0 0 0 8 4 0 0 0 18
timestep 	    7.00E-9
run 		    $t
run 		    100
Memory usage per proc in Mbytes:
  particles (ave,min,max) = 144.727 144.727 144.727
  grid      (ave,min,max) = 20.8733 20.6389 21.5764
  surf      (ave,min,max) = 0 0 0
  total     (ave,min,max) = 165.6 165.366 166.303
Step CPU Np Natt Ncoll c_temp 
      30            0 32768000 32879631 23143567    273.19912 
     100    17.107723 32768000  3142795  2313377    273.19912 
     130    24.368059 32768000  3187894  2316098    273.19912 
Loop time of 24.3681 on 32 procs for 100 steps with 32768000 particles

Particle moves    = 3276800000 (3.28B)
Cells touched     = 4577480596 (4.58B)
Particle comms    = 19743178 (19.7M)
Boundary collides = 8869462 (8.87M)
Boundary exits    = 0 (0K)
SurfColl checks   = 0 (0K)
SurfColl occurs   = 0 (0K)
Surf reactions    = 0 (0K)
Collide attempts  = 307016095 (307M)
Collide occurs    = 229963969 (230M)
Reactions         = 0 (0K)
Particles stuck   = 0

Particle-moves/CPUsec/proc: 4.20222e+06
Particle-moves/step: 3.2768e+07
Cell-touches/particle/step: 1.39694
Particle comm iterations/step: 1
Particle fraction communicated: 0.00602514
Particle fraction colliding with boundary: 0.00270674
Particle fraction exiting boundary: 0
Surface-checks/particle/step: 0
Surface-collisions/particle/step: 0
Surf-reactions/particle/step: 0
Collision-attempts/particle/step: 0.0936939
Collisions/particle/step: 0.0701794
Reactions/particle/step: 0

Move  time (%) = 14.3052 (58.7047)
Coll  time (%) = 2.67234 (10.9665)
Sort  time (%) = 7.04442 (28.9084)
Comm  time (%) = 0.248135 (1.01828)
Modfy time (%) = 0 (0)
Outpt time (%) = 0.0976015 (0.40053)
Other time (%) = 0.000378869 (0.00155478)

Particles: 1.024e+06 ave 1.02608e+06 max 1.02149e+06 min
Histogram: 2 3 1 2 6 4 3 6 3 2
Cells:      102400 ave 102400 max 102400 min
Histogram: 32 0 0 0 0 0 0 0 0 0
GhostCell: 19634 ave 25376 max 14024 min
Histogram: 8 0 0 0 16 0 0 0 0 8
EmptyCell: 6004.94 ave 7591 max 1936 min
Histogram: 2 0 0 0 8 4 0 0 0 18

