Strong scaling performance, Sphere benchmark, SandyBridge, 8M particles
Performance in millions of particle-timesteps / second / node

Nodes CPU (mpi) Kokkos/OMP (mpi,thread) Kokkos/serial (mpi)
1 114.6 (16) 89.67 (8,2) 106.9 (16)
2 132.8 (16) 78.14 (8,2) 127.7 (16)
4 133.1 (16) 113.3 (8,2) 136.1 (16)
8 148 (16) 100.9 (8,2) 133.3 (16)
16 142.1 (16) 109.3 (8,2) 128.4 (16)
32 142.3 (16) 78.73 (8,2) 133.8 (16)
64 118.8 (16) 78.24 (8,2) 110 (16)

Run commands and logfile links for column CPU

1 mpirun -n 16 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=1.mpi=16
2 mpirun -n 32 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=2.mpi=16
4 mpirun -n 64 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=4.mpi=16
8 mpirun -n 128 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=8.mpi=16
16 mpirun -n 256 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=16.mpi=16
32 mpirun -n 512 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=32.mpi=16
64 mpirun -n 1024 -N 16 --bind-to core spa_chama_cpu -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=8M.node=64.mpi=16

Run commands and logfile links for column Kokkos/OMP

1 mpirun -n 8 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=1.mpi=8.thread=2
2 mpirun -n 16 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=2.mpi=8.thread=2
4 mpirun -n 32 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=4.mpi=8.thread=2
8 mpirun -n 64 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=8.mpi=8.thread=2
16 mpirun -n 128 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=16.mpi=8.thread=2
32 mpirun -n 256 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=32.mpi=8.thread=2
64 mpirun -n 512 -N 8 --bind-to socket spa_chama_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_omp.kind=strong.size=8M.node=64.mpi=8.thread=2

Run commands and logfile links for column Kokkos/serial

1 mpirun -n 16 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=1.mpi=16
2 mpirun -n 32 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=2.mpi=16
4 mpirun -n 64 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=4.mpi=16
8 mpirun -n 128 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=8.mpi=16
16 mpirun -n 256 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=16.mpi=16
32 mpirun -n 512 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=32.mpi=16
64 mpirun -n 1024 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 128 -v y 80 -v z 80 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=8M.node=64.mpi=16