Single node performance, Sphere benchmark, Haswell
Performance in millions of particle-timesteps / second

Nparticles CPU (mpi,hyper) Kokkos/OMP (mpi,hyper,thread) Kokkos/serial (mpi,hyper)
32000 254.4 (32,1) 180.3 (32,2,2) 221.9 (32,1)
64000 367.5 (32,1) 301.8 (32,2,2) 333.1 (32,1)
128000 514.1 (32,1) 426.5 (32,2,2) 462.9 (32,1)
256000 636.8 (32,1) 536.9 (32,2,2) 563.1 (32,1)
512000 674.6 (32,1) 523.5 (32,2,2) 546.5 (32,1)
1024000 318.6 (32,1) 288.6 (32,2,2) 277.8 (64,2)
2048000 279.4 (64,2) 265.6 (32,2,2) 275.6 (64,2)
4096000 252.8 (64,2) 263.7 (32,2,2) 263.9 (64,2)
8192000 224 (64,2) 240.9 (32,2,2) 229.7 (64,2)
16384000 176.5 (64,2) 189.9 (16,2,4) 186.6 (64,2)
32768000 153.1 (64,2) 183.1 (16,2,4) 173.1 (64,2)
65536000 127.1 (64,2) 140.4 (32,2,2) 140.8 (64,2)
131072000 112.4 (64,2) 113.7 (16,2,4) 122.7 (64,2)

Run commands and logfile links for column CPU

32000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 16 -v y 10 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=32K.node=1.mpi=32.hyper=1
64000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 16 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=64K.node=1.mpi=32.hyper=1
128000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 32 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=128K.node=1.mpi=32.hyper=1
256000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 32 -v y 20 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=256K.node=1.mpi=32.hyper=1
512000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 32 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=512K.node=1.mpi=32.hyper=1
1024000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_cpu -v x 64 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=1M.node=1.mpi=32.hyper=1
2048000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 64 -v y 40 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=2M.node=1.mpi=64.hyper=2
4096000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 64 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=4M.node=1.mpi=64.hyper=2
8192000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=8M.node=1.mpi=64.hyper=2
16384000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 128 -v y 80 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=16M.node=1.mpi=64.hyper=2
32768000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 128 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=32M.node=1.mpi=64.hyper=2
65536000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=64M.node=1.mpi=64.hyper=2
131072000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 256 -v y 160 -v z 320 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=node.size=128M.node=1.mpi=64.hyper=2

Run commands and logfile links for column Kokkos/OMP

32000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 16 -v y 10 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=32K.node=1.mpi=32.thread=2.hyper=2
64000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 16 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=64K.node=1.mpi=32.thread=2.hyper=2
128000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=128K.node=1.mpi=32.thread=2.hyper=2
256000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 20 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=256K.node=1.mpi=32.thread=2.hyper=2
512000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=512K.node=1.mpi=32.thread=2.hyper=2
1024000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=1M.node=1.mpi=32.thread=2.hyper=2
2048000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 40 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=2M.node=1.mpi=32.thread=2.hyper=2
4096000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=4M.node=1.mpi=32.thread=2.hyper=2
8192000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=8M.node=1.mpi=32.thread=2.hyper=2
16384000 setenv OMP_NUM_THREADS 4; srun -n 16 -C haswell --ntasks-per-node 16 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_omp -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=16M.node=1.mpi=16.thread=4.hyper=2
32768000 setenv OMP_NUM_THREADS 4; srun -n 16 -C haswell --ntasks-per-node 16 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_omp -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=32M.node=1.mpi=16.thread=4.hyper=2
65536000 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=64M.node=1.mpi=32.thread=2.hyper=2
131072000 setenv OMP_NUM_THREADS 4; srun -n 16 -C haswell --ntasks-per-node 16 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_omp -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 320 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=node.size=128M.node=1.mpi=16.thread=4.hyper=2

Run commands and logfile links for column Kokkos/serial

32000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 16 -v y 10 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=32K.node=1.mpi=32.hyper=1
64000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 16 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=64K.node=1.mpi=32.hyper=1
128000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 20 -v z 20 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=128K.node=1.mpi=32.hyper=1
256000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 20 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=256K.node=1.mpi=32.hyper=1
512000 srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=rank -c 2 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 32 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=512K.node=1.mpi=32.hyper=1
1024000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 40 -v z 40 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=1M.node=1.mpi=64.hyper=2
2048000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 40 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=2M.node=1.mpi=64.hyper=2
4096000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 64 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=4M.node=1.mpi=64.hyper=2
8192000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=8M.node=1.mpi=64.hyper=2
16384000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=16M.node=1.mpi=64.hyper=2
32768000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=32M.node=1.mpi=64.hyper=2
65536000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=64M.node=1.mpi=64.hyper=2
131072000 srun -n 64 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 320 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=node.size=128M.node=1.mpi=64.hyper=2