Strong scaling performance, Sphere benchmark, 64M particles
Performance in millions of particle-timesteps / second / node

Nodes SandyBridge Haswell Broadwell KNL K80-1 P100-1
1 61.36 (CPU,mpi=16) 140.3 (Kokkos/OMP,mpi=32,hyper=2,thread=2) 148.9 (Kokkos/serial,mpi=72,hyper=2) 211.6 (Kokkos/serial/KNL,mpi=256,hyper=4) 159.4 (Kokkos/Cuda,mpi=2) 266 (Kokkos/Cuda,mpi=1)
2 67.24 (CPU,mpi=16) 147.2 (Kokkos/serial,mpi=64,hyper=2) 158.4 (Kokkos/serial,mpi=72,hyper=2) 217 (Kokkos/KNL,mpi=128,thread=2,hyper=4) 161 (Kokkos/Cuda,mpi=2) 265 (Kokkos/Cuda,mpi=1)
4 73.75 (CPU,mpi=16) 165.6 (Kokkos/serial,mpi=64,hyper=2) 182 (Kokkos/serial,mpi=72,hyper=2) 224.9 (Kokkos/KNL,mpi=64,thread=4,hyper=4) 128.2 (Kokkos/Cuda,mpi=2) None
8 86.57 (CPU,mpi=16) 182.1 (Kokkos/OMP,mpi=32,hyper=2,thread=2) 201.4 (Kokkos/serial,mpi=72,hyper=2) 225.2 (Kokkos/KNL,mpi=32,thread=8,hyper=4) None None
16 100.2 (CPU,mpi=16) 195.9 (CPU,mpi=64,hyper=2) 217.9 (CPU,mpi=72,hyper=2) 217.6 (Kokkos/KNL,mpi=32,thread=8,hyper=4) None None
32 101 (Kokkos/serial,mpi=16) 202.8 (CPU,mpi=64,hyper=2) 205.3 (CPU,mpi=72,hyper=2) 193.7 (Kokkos/KNL,mpi=64,thread=4,hyper=4) None None
64 98.39 (CPU,mpi=16) 199.7 (CPU,mpi=64,hyper=2) 222.6 (CPU,mpi=72,hyper=2) 143.6 (Kokkos/KNL,mpi=64,thread=4,hyper=4) None None

Run commands and logfile links for column SandyBridge

1 mpirun -n 16 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=1.mpi=16
2 mpirun -n 32 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=2.mpi=16
4 mpirun -n 64 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=4.mpi=16
8 mpirun -n 128 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=8.mpi=16
16 mpirun -n 256 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=16.mpi=16
32 mpirun -n 512 -N 16 --bind-to core spa_chama_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=kokkos_serial.kind=strong.size=64M.node=32.mpi=16
64 mpirun -n 1024 -N 16 --bind-to core spa_chama_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=chama.pkg=cpu.kind=strong.size=64M.node=64.mpi=16

Run commands and logfile links for column Haswell

1 setenv OMP_NUM_THREADS 2; srun -n 32 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=strong.size=64M.node=1.mpi=32.thread=2.hyper=2
2 srun -n 128 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=strong.size=64M.node=2.mpi=64.hyper=2
4 srun -n 256 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial.kind=strong.size=64M.node=4.mpi=64.hyper=2
8 setenv OMP_NUM_THREADS 2; srun -n 256 -C haswell --ntasks-per-node 32 --cpu_bind=cores -c 2 ./spa_mutrino_kokkos_omp -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_omp.kind=strong.size=64M.node=8.mpi=32.thread=2.hyper=2
16 srun -n 1024 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=strong.size=64M.node=16.mpi=64.hyper=2
32 srun -n 2048 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=strong.size=64M.node=32.mpi=64.hyper=2
64 srun -n 4096 -C haswell --ntasks-per-node 64 --cpu_bind=rank -c 1 ./spa_mutrino_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=cpu.kind=strong.size=64M.node=64.mpi=64.hyper=2

Run commands and logfile links for column Broadwell

1 mpiexec -np 72 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=kokkos_serial.kind=strong.size=64M.node=1.mpi=72.hyper=2
2 mpiexec -np 144 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=kokkos_serial.kind=strong.size=64M.node=2.mpi=72.hyper=2
4 mpiexec -np 288 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=kokkos_serial.kind=strong.size=64M.node=4.mpi=72.hyper=2
8 mpiexec -np 576 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_kokkos_serial -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=kokkos_serial.kind=strong.size=64M.node=8.mpi=72.hyper=2
16 mpiexec -np 1152 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=cpu.kind=strong.size=64M.node=16.mpi=72.hyper=2
32 mpiexec -np 2304 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=cpu.kind=strong.size=64M.node=32.mpi=72.hyper=2
64 mpiexec -np 4608 -npernode 72 --oversubscribe --bind-to core ./spa_serrano_cpu -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=serrano.pkg=cpu.kind=strong.size=64M.node=64.mpi=72.hyper=2

Run commands and logfile links for column KNL

1 srun -n 256 -C knl --ntasks-per-node 256 --cpu_bind=threads -c 1 ./spa_mutrino_kokkos_serial_knl -sf kk -k on -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_serial_knl.kind=strong.size=64M.node=1.mpi=256.hyper=4
2 setenv OMP_NUM_THREADS 2; srun -n 256 -C knl --ntasks-per-node 128 --cpu_bind=threads -c 2 ./spa_mutrino_kokkos_knl -sf kk -k on t 2 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=2.mpi=128.thread=2.hyper=4
4 setenv OMP_NUM_THREADS 4; srun -n 256 -C knl --ntasks-per-node 64 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_knl -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=4.mpi=64.thread=4.hyper=4
8 setenv OMP_NUM_THREADS 8; srun -n 256 -C knl --ntasks-per-node 32 --cpu_bind=cores -c 8 ./spa_mutrino_kokkos_knl -sf kk -k on t 8 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=8.mpi=32.thread=8.hyper=4
16 setenv OMP_NUM_THREADS 8; srun -n 512 -C knl --ntasks-per-node 32 --cpu_bind=cores -c 8 ./spa_mutrino_kokkos_knl -sf kk -k on t 8 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=16.mpi=32.thread=8.hyper=4
32 setenv OMP_NUM_THREADS 4; srun -n 2048 -C knl --ntasks-per-node 64 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_knl -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=32.mpi=64.thread=4.hyper=4
64 setenv OMP_NUM_THREADS 4; srun -n 4096 -C knl --ntasks-per-node 64 --cpu_bind=cores -c 4 ./spa_mutrino_kokkos_knl -sf kk -k on t 4 -pk kokkos reduction parallel/reduce comm classic -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.steps -log log.sparta.date=23Dec17.model=sphere.machine=mutrino.pkg=kokkos_knl.kind=strong.size=64M.node=64.mpi=64.thread=4.hyper=4

Run commands and logfile links for column K80-1

1 mpirun -np 4 --npersocket 1 --bind-to core spa_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos reduction atomic comm threaded -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.gpu.steps -log log.sparta.date=23Dec17.model=sphere.machine=ride80.pkg=kokkos_cuda.kind=strong.size=64M.node=2.mpi=2.gpu=2
2 mpirun -np 8 --npersocket 1 --bind-to core spa_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos reduction atomic comm threaded -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.gpu.steps -log log.sparta.date=23Dec17.model=sphere.machine=ride80.pkg=kokkos_cuda.kind=strong.size=64M.node=4.mpi=2.gpu=2
4 mpirun -np 16 --npersocket 1 --bind-to core spa_ride80_kokkos_cuda -sf kk -k on g 2 -pk kokkos reduction atomic comm threaded -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.gpu.steps -log log.sparta.date=23Dec17.model=sphere.machine=ride80.pkg=kokkos_cuda.kind=strong.size=64M.node=8.mpi=2.gpu=2
8 None
16 None
32 None
64 None

Run commands and logfile links for column P100-1

1 mpirun -np 4 --npernode 1 --bind-to core spa_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos reduction atomic comm threaded -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.gpu.steps -log log.sparta.date=23Dec17.model=sphere.machine=ride100.pkg=kokkos_cuda.kind=strong.size=64M.node=4.mpi=1.gpu=1
2 mpirun -np 8 --npernode 1 --bind-to core spa_ride100_kokkos_cuda -sf kk -k on g 1 -pk kokkos reduction atomic comm threaded -v x 256 -v y 160 -v z 160 -v t 100 -in in.sphere.gpu.steps -log log.sparta.date=23Dec17.model=sphere.machine=ride100.pkg=kokkos_cuda.kind=strong.size=64M.node=8.mpi=1.gpu=1
4 None
8 None
16 None
32 None
64 None