console:/ # dd if=/dev/zero of=/memtest/testfile bs=1M count=500 500+0 records in 500+0 records out 524288000 bytes (500 M) copied, 0.782312 s, 639 M/s console:/ # dd if=/dev/zero of=/memtest/testfile bs=500M count=1 1+0 records in 1+0 records out 524288000 bytes (500 M) copied, 1.272919 s, 393 M/s console:/ # dd if=/dev/zero of=/memtest/testfile bs=512K count=1000 1000+0 records in 1000+0 records out 524288000 bytes (500 M) copied, 0.794319 s, 629 M/s
console:/ # dd if=/memtest/testfile of=/dev/null bs=1M count=500 500+0 records in 500+0 records out 524288000 bytes (500 M) copied, 0.340277 s, 1.4 G/s console:/ # dd if=/memtest/testfile of=/dev/null bs=500M count=1 < 1+0 records in 1+0 records out 524288000 bytes (500 M) copied, 0.682501 s, 733 M/s console:/ # dd if=/memtest/testfile of=/dev/null bs=512K count=1000 < 1000+0 records in 1000+0 records out 524288000 bytes (500 M) copied, 0.226277 s, 2.1 G/s
console:/data/local/tmp # ./stream_benchmark ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 21376 microseconds. (= 21376 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 10748.4 0.015937 0.014886 0.017156 Scale: 8149.9 0.020030 0.019632 0.020463 Add: 9070.6 0.027085 0.026459 0.028024 Triad: 8255.9 0.029891 0.029070 0.030490 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
SHELL
Copy操作最为简单,它先访问一个内存单元读出其中的值,再将值写入到另一个内存单元。
Scale操作先从内存单元读出其中的值,作一个乘法运算,再将结果写入到另一个内存单元。
Add操作先从内存单元读出两个值,做加法运算, 再将结果写入到另一个内存单元。
Triad的中文含义是将三个组合起来,在本测试中表示的意思是将Copy、Scale、Add三种操作组合起来进行测试。具体操作方式是:先从内存单元中中读两个值a、b,对其进行乘加混合运算(a + 因子 * b ) ,将运算结果写入到另一个内存单元。