Strong scaling performance, Free benchmark, SandyBridge, 8M particles
Performance in millions of particle-timesteps / second / node

Nodes CPU (mpi) Kokkos/OMP (mpi,thread) Kokkos/serial (mpi)
1 151.2 (16) 132.5 (8,2) 132.9 (16)
2 187.6 (16) 107.5 (8,2) 174.2 (16)
4 227.7 (16) 191.8 (8,2) 212.8 (16)
8 252.6 (16) 198.5 (8,2) 231.1 (16)
16 251.7 (16) 196 (8,2) 266.4 (16)
32 272.1 (16) 252.5 (8,2) 232.5 (16)
64 257.7 (16) 165.2 (8,2) 262.9 (16)

Run commands and logfile links for column CPU

1 mpirun -n 16 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=1.mpi=16
2 mpirun -n 32 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=2.mpi=16
4 mpirun -n 64 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=4.mpi=16
8 mpirun -n 128 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=8.mpi=16
16 mpirun -n 256 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=16.mpi=16
32 mpirun -n 512 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=32.mpi=16
64 mpirun -n 1024 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=cpu.kind=strong.size=8M.node=64.mpi=16

Run commands and logfile links for column Kokkos/OMP

1 mpirun -n 8 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=1.mpi=8.thread=2
2 mpirun -n 16 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=2.mpi=8.thread=2
4 mpirun -n 32 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=4.mpi=8.thread=2
8 mpirun -n 64 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=8.mpi=8.thread=2
16 mpirun -n 128 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=16.mpi=8.thread=2
32 mpirun -n 256 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=32.mpi=8.thread=2
64 mpirun -n 512 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=64.mpi=8.thread=2

Run commands and logfile links for column Kokkos/serial

1 mpirun -n 16 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=1.mpi=16
2 mpirun -n 32 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=2.mpi=16
4 mpirun -n 64 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=4.mpi=16
8 mpirun -n 128 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=8.mpi=16
16 mpirun -n 256 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=16.mpi=16
32 mpirun -n 512 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=32.mpi=16
64 mpirun -n 1024 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.free.steps -log log.sparta.date=23Dec17.model=free.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=64.mpi=16