Benchmarks
This document presents benchmarks conducted to evaluate various operations on a specific system configuration. The system details are as follows:
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3296/23H2/2023Update/SunValley3)
AMD Ryzen 9 7940HS w/ Radeon 780M Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
[Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Scalar : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT
Vector128 : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX
Vector256 : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
Vector512 : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
The system supports vectorization up to 512 bits. Benchmarks are conducted both with and without vectorization for each supported vectorization size.
Benchmark Jobs
Four distinct jobs are included in the benchmarks:
- Scalar: No SIMD support
- Vector128: using 128-bit SIMD support
- Vector256: using 256-bit SIMD support
- Vector512: using 512-bit SIMD support
Benchmark Scenarios
The benchmarks encompass the following scenarios:
- Baseline_*: Simple iteration without explicit optimizations
- LINQ_*: using LINQ (when available)
- System_*: using
System.Numerics.Tensors
- NetFabric_*: using
NetFabric.Numerics.Tensors
The source code for the benchmarks can be accessed here.
Test Cases and Considerations
Benchmarks were conducted on small source spans (5 items) as well as larger ones (100 items). Various scenarios of operators and their applications were covered. The baseline for each scenario involved equivalent operations using a for
loop, with no optimizations applied other than those automatically added by the JIT compiler.
It's important to note that while these benchmarks provide insights into the performance characteristics of the library, they are not exhaustive and are not intended to serve as a comprehensive performance analysis. They aim to offer a general understanding of the library's performance under different scenarios.
Results
Negate
Applying a vectorizable unary operator on a span.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 45.573 ns | 0.2966 ns | baseline |
System_Double | Scalar | Double | 100 | 29.756 ns | 0.1381 ns | 1.53x faster |
NetFabric_Double | Scalar | Double | 100 | 26.658 ns | 0.1550 ns | 1.71x faster |
Baseline_Double | Vector128 | Double | 100 | 47.152 ns | 2.1215 ns | 1.06x slower |
System_Double | Vector128 | Double | 100 | 12.626 ns | 0.1176 ns | 3.61x faster |
NetFabric_Double | Vector128 | Double | 100 | 14.744 ns | 0.2733 ns | 3.09x faster |
Baseline_Double | Vector256 | Double | 100 | 32.666 ns | 0.3011 ns | 1.40x faster |
System_Double | Vector256 | Double | 100 | 10.927 ns | 0.0521 ns | 4.17x faster |
NetFabric_Double | Vector256 | Double | 100 | 9.304 ns | 0.0298 ns | 4.90x faster |
Baseline_Double | Vector512 | Double | 100 | 47.302 ns | 1.9834 ns | 1.05x slower |
System_Double | Vector512 | Double | 100 | 11.154 ns | 0.0398 ns | 4.09x faster |
NetFabric_Double | Vector512 | Double | 100 | 9.813 ns | 0.0464 ns | 4.64x faster |
Baseline_Float | Scalar | Float | 100 | 45.061 ns | 0.2763 ns | baseline |
System_Float | Scalar | Float | 100 | 29.691 ns | 0.6771 ns | 1.52x faster |
NetFabric_Float | Scalar | Float | 100 | 27.657 ns | 0.4055 ns | 1.63x faster |
Baseline_Float | Vector128 | Float | 100 | 47.998 ns | 2.1608 ns | 1.05x slower |
System_Float | Vector128 | Float | 100 | 9.232 ns | 0.0872 ns | 4.88x faster |
NetFabric_Float | Vector128 | Float | 100 | 9.161 ns | 0.0973 ns | 4.92x faster |
Baseline_Float | Vector256 | Float | 100 | 32.591 ns | 0.2715 ns | 1.38x faster |
System_Float | Vector256 | Float | 100 | 8.053 ns | 0.0349 ns | 5.59x faster |
NetFabric_Float | Vector256 | Float | 100 | 7.088 ns | 0.0503 ns | 6.35x faster |
Baseline_Float | Vector512 | Float | 100 | 47.134 ns | 2.0154 ns | 1.05x slower |
System_Float | Vector512 | Float | 100 | 8.638 ns | 0.0700 ns | 5.22x faster |
NetFabric_Float | Vector512 | Float | 100 | 8.214 ns | 0.0763 ns | 5.48x faster |
Baseline_Half | Scalar | Half | 100 | 767.767 ns | 2.5521 ns | baseline |
System_Half | Scalar | Half | 100 | 760.971 ns | 1.9508 ns | 1.01x faster |
NetFabric_Half | Scalar | Half | 100 | 758.768 ns | 2.0240 ns | 1.01x faster |
Baseline_Half | Vector128 | Half | 100 | 642.664 ns | 0.7075 ns | 1.19x faster |
System_Half | Vector128 | Half | 100 | 639.995 ns | 2.0392 ns | 1.20x faster |
NetFabric_Half | Vector128 | Half | 100 | 644.216 ns | 2.7896 ns | 1.19x faster |
Baseline_Half | Vector256 | Half | 100 | 647.412 ns | 4.6062 ns | 1.19x faster |
System_Half | Vector256 | Half | 100 | 639.948 ns | 4.1512 ns | 1.20x faster |
NetFabric_Half | Vector256 | Half | 100 | 642.387 ns | 3.0127 ns | 1.20x faster |
Baseline_Half | Vector512 | Half | 100 | 644.374 ns | 2.6637 ns | 1.19x faster |
System_Half | Vector512 | Half | 100 | 644.819 ns | 6.5954 ns | 1.19x faster |
NetFabric_Half | Vector512 | Half | 100 | 649.468 ns | 3.9853 ns | 1.18x faster |
Baseline_Int | Scalar | Int | 100 | 35.906 ns | 0.4704 ns | baseline |
System_Int | Scalar | Int | 100 | 29.608 ns | 0.7775 ns | 1.21x faster |
NetFabric_Int | Scalar | Int | 100 | 30.740 ns | 0.2080 ns | 1.17x faster |
Baseline_Int | Vector128 | Int | 100 | 36.527 ns | 0.3254 ns | 1.02x slower |
System_Int | Vector128 | Int | 100 | 9.831 ns | 0.0608 ns | 3.65x faster |
NetFabric_Int | Vector128 | Int | 100 | 13.548 ns | 0.0619 ns | 2.65x faster |
Baseline_Int | Vector256 | Int | 100 | 36.767 ns | 0.4214 ns | 1.03x slower |
System_Int | Vector256 | Int | 100 | 7.482 ns | 0.0695 ns | 4.80x faster |
NetFabric_Int | Vector256 | Int | 100 | 12.252 ns | 0.0617 ns | 2.93x faster |
Baseline_Int | Vector512 | Int | 100 | 36.467 ns | 0.1628 ns | 1.02x slower |
System_Int | Vector512 | Int | 100 | 12.493 ns | 0.0274 ns | 2.87x faster |
NetFabric_Int | Vector512 | Int | 100 | 11.466 ns | 0.0437 ns | 3.13x faster |
Baseline_Long | Scalar | Long | 100 | 36.301 ns | 0.2350 ns | baseline |
System_Long | Scalar | Long | 100 | 29.256 ns | 0.1246 ns | 1.24x faster |
NetFabric_Long | Scalar | Long | 100 | 31.201 ns | 0.1614 ns | 1.16x faster |
Baseline_Long | Vector128 | Long | 100 | 36.455 ns | 0.2352 ns | 1.00x slower |
System_Long | Vector128 | Long | 100 | 14.563 ns | 0.1236 ns | 2.49x faster |
NetFabric_Long | Vector128 | Long | 100 | 18.382 ns | 0.1982 ns | 1.98x faster |
Baseline_Long | Vector256 | Long | 100 | 36.484 ns | 0.3588 ns | 1.00x slower |
System_Long | Vector256 | Long | 100 | 11.685 ns | 0.0431 ns | 3.11x faster |
NetFabric_Long | Vector256 | Long | 100 | 10.807 ns | 0.0716 ns | 3.36x faster |
Baseline_Long | Vector512 | Long | 100 | 36.663 ns | 0.2058 ns | 1.01x slower |
System_Long | Vector512 | Long | 100 | 10.474 ns | 0.0600 ns | 3.47x faster |
NetFabric_Long | Vector512 | Long | 100 | 10.635 ns | 0.1332 ns | 3.41x faster |
Baseline_Short | Scalar | Short | 100 | 40.951 ns | 0.3544 ns | baseline |
System_Short | Scalar | Short | 100 | 40.886 ns | 0.4036 ns | 1.00x faster |
NetFabric_Short | Scalar | Short | 100 | 36.349 ns | 0.2416 ns | 1.13x faster |
Baseline_Short | Vector128 | Short | 100 | 41.558 ns | 0.2594 ns | 1.01x slower |
System_Short | Vector128 | Short | 100 | 7.253 ns | 0.0736 ns | 5.65x faster |
NetFabric_Short | Vector128 | Short | 100 | 8.133 ns | 0.0460 ns | 5.04x faster |
Baseline_Short | Vector256 | Short | 100 | 41.490 ns | 0.3011 ns | 1.01x slower |
System_Short | Vector256 | Short | 100 | 5.168 ns | 0.0518 ns | 7.93x faster |
NetFabric_Short | Vector256 | Short | 100 | 6.172 ns | 0.0951 ns | 6.64x faster |
Baseline_Short | Vector512 | Short | 100 | 41.288 ns | 0.1954 ns | 1.01x slower |
System_Short | Vector512 | Short | 100 | 5.517 ns | 0.0320 ns | 7.43x faster |
NetFabric_Short | Vector512 | Short | 100 | 6.571 ns | 0.0211 ns | 6.24x faster |
Add
Applying a vectorizable binary operator on two spans.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 42.838 ns | 0.5474 ns | baseline |
System_Double | Scalar | Double | 100 | 30.240 ns | 0.1343 ns | 1.42x faster |
NetFabric_Double | Scalar | Double | 100 | 27.300 ns | 0.1176 ns | 1.57x faster |
Baseline_Double | Vector128 | Double | 100 | 47.762 ns | 0.6866 ns | 1.12x slower |
System_Double | Vector128 | Double | 100 | 20.282 ns | 0.0841 ns | 2.11x faster |
NetFabric_Double | Vector128 | Double | 100 | 20.364 ns | 0.1156 ns | 2.10x faster |
Baseline_Double | Vector256 | Double | 100 | 47.177 ns | 0.3005 ns | 1.10x slower |
System_Double | Vector256 | Double | 100 | 16.653 ns | 0.0518 ns | 2.57x faster |
NetFabric_Double | Vector256 | Double | 100 | 13.296 ns | 0.0712 ns | 3.22x faster |
Baseline_Double | Vector512 | Double | 100 | 47.769 ns | 0.4299 ns | 1.12x slower |
System_Double | Vector512 | Double | 100 | 17.823 ns | 0.0964 ns | 2.40x faster |
NetFabric_Double | Vector512 | Double | 100 | 13.988 ns | 0.0918 ns | 3.06x faster |
Baseline_Float | Scalar | Float | 100 | 46.910 ns | 0.2536 ns | baseline |
System_Float | Scalar | Float | 100 | 30.420 ns | 0.1175 ns | 1.54x faster |
NetFabric_Float | Scalar | Float | 100 | 27.097 ns | 0.1320 ns | 1.73x faster |
Baseline_Float | Vector128 | Float | 100 | 40.948 ns | 0.6031 ns | 1.14x faster |
System_Float | Vector128 | Float | 100 | 14.220 ns | 0.0738 ns | 3.30x faster |
NetFabric_Float | Vector128 | Float | 100 | 12.781 ns | 0.0379 ns | 3.67x faster |
Baseline_Float | Vector256 | Float | 100 | 40.751 ns | 0.3792 ns | 1.15x faster |
System_Float | Vector256 | Float | 100 | 10.541 ns | 0.0632 ns | 4.45x faster |
NetFabric_Float | Vector256 | Float | 100 | 10.306 ns | 0.0569 ns | 4.55x faster |
Baseline_Float | Vector512 | Float | 100 | 40.859 ns | 0.5954 ns | 1.15x faster |
System_Float | Vector512 | Float | 100 | 11.316 ns | 0.0517 ns | 4.15x faster |
NetFabric_Float | Vector512 | Float | 100 | 10.274 ns | 0.0545 ns | 4.57x faster |
Baseline_Half | Scalar | Half | 100 | 999.391 ns | 0.9589 ns | baseline |
System_Half | Scalar | Half | 100 | 997.099 ns | 3.8572 ns | 1.00x faster |
NetFabric_Half | Scalar | Half | 100 | 995.374 ns | 3.8658 ns | 1.00x faster |
Baseline_Half | Vector128 | Half | 100 | 911.213 ns | 3.2180 ns | 1.10x faster |
System_Half | Vector128 | Half | 100 | 885.748 ns | 2.5029 ns | 1.13x faster |
NetFabric_Half | Vector128 | Half | 100 | 887.482 ns | 3.0970 ns | 1.13x faster |
Baseline_Half | Vector256 | Half | 100 | 915.089 ns | 6.6291 ns | 1.09x faster |
System_Half | Vector256 | Half | 100 | 887.685 ns | 3.2944 ns | 1.13x faster |
NetFabric_Half | Vector256 | Half | 100 | 887.629 ns | 3.2086 ns | 1.13x faster |
Baseline_Half | Vector512 | Half | 100 | 912.210 ns | 2.0226 ns | 1.10x faster |
System_Half | Vector512 | Half | 100 | 886.560 ns | 2.3231 ns | 1.13x faster |
NetFabric_Half | Vector512 | Half | 100 | 885.757 ns | 1.8336 ns | 1.13x faster |
Baseline_Int | Scalar | Int | 100 | 47.679 ns | 0.3331 ns | baseline |
System_Int | Scalar | Int | 100 | 37.232 ns | 0.1545 ns | 1.28x faster |
NetFabric_Int | Scalar | Int | 100 | 33.680 ns | 0.1329 ns | 1.42x faster |
Baseline_Int | Vector128 | Int | 100 | 50.704 ns | 0.1286 ns | 1.06x slower |
System_Int | Vector128 | Int | 100 | 15.799 ns | 0.0674 ns | 3.02x faster |
NetFabric_Int | Vector128 | Int | 100 | 13.488 ns | 0.1433 ns | 3.54x faster |
Baseline_Int | Vector256 | Int | 100 | 48.321 ns | 0.3998 ns | 1.01x slower |
System_Int | Vector256 | Int | 100 | 12.430 ns | 0.0301 ns | 3.84x faster |
NetFabric_Int | Vector256 | Int | 100 | 10.519 ns | 0.1974 ns | 4.53x faster |
Baseline_Int | Vector512 | Int | 100 | 48.420 ns | 0.3729 ns | 1.02x slower |
System_Int | Vector512 | Int | 100 | 13.326 ns | 0.0623 ns | 3.58x faster |
NetFabric_Int | Vector512 | Int | 100 | 10.261 ns | 0.0552 ns | 4.65x faster |
Baseline_Long | Scalar | Long | 100 | 47.667 ns | 0.3977 ns | baseline |
System_Long | Scalar | Long | 100 | 36.912 ns | 0.1346 ns | 1.29x faster |
NetFabric_Long | Scalar | Long | 100 | 35.274 ns | 0.1377 ns | 1.35x faster |
Baseline_Long | Vector128 | Long | 100 | 47.490 ns | 0.6803 ns | 1.00x faster |
System_Long | Vector128 | Long | 100 | 20.943 ns | 0.0900 ns | 2.28x faster |
NetFabric_Long | Vector128 | Long | 100 | 25.661 ns | 0.1937 ns | 1.86x faster |
Baseline_Long | Vector256 | Long | 100 | 48.330 ns | 0.6507 ns | 1.01x slower |
System_Long | Vector256 | Long | 100 | 17.558 ns | 0.0552 ns | 2.71x faster |
NetFabric_Long | Vector256 | Long | 100 | 14.723 ns | 0.1270 ns | 3.24x faster |
Baseline_Long | Vector512 | Long | 100 | 47.401 ns | 0.5282 ns | 1.01x faster |
System_Long | Vector512 | Long | 100 | 17.561 ns | 0.0920 ns | 2.71x faster |
NetFabric_Long | Vector512 | Long | 100 | 13.943 ns | 0.0869 ns | 3.42x faster |
Baseline_Short | Scalar | Short | 100 | 52.775 ns | 0.2422 ns | baseline |
System_Short | Scalar | Short | 100 | 45.858 ns | 0.1945 ns | 1.15x faster |
NetFabric_Short | Scalar | Short | 100 | 43.392 ns | 0.1510 ns | 1.22x faster |
Baseline_Short | Vector128 | Short | 100 | 54.653 ns | 1.5272 ns | 1.04x slower |
System_Short | Vector128 | Short | 100 | 9.671 ns | 0.0582 ns | 5.45x faster |
NetFabric_Short | Vector128 | Short | 100 | 11.470 ns | 0.1263 ns | 4.60x faster |
Baseline_Short | Vector256 | Short | 100 | 52.536 ns | 0.3891 ns | 1.00x faster |
System_Short | Vector256 | Short | 100 | 6.239 ns | 0.0472 ns | 8.46x faster |
NetFabric_Short | Vector256 | Short | 100 | 9.148 ns | 0.0540 ns | 5.77x faster |
Baseline_Short | Vector512 | Short | 100 | 52.830 ns | 0.3333 ns | 1.00x slower |
System_Short | Vector512 | Short | 100 | 7.545 ns | 0.0464 ns | 7.00x faster |
NetFabric_Short | Vector512 | Short | 100 | 9.329 ns | 0.0400 ns | 5.66x faster |
Min
Applying a vectorizable binary operator with propagation of NaN
.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 77.822 ns | 0.3262 ns | baseline |
System_Double | Scalar | Double | 100 | 68.871 ns | 0.4840 ns | 1.13x faster |
NetFabric_Double | Scalar | Double | 100 | 86.981 ns | 0.6853 ns | 1.12x slower |
Baseline_Double | Vector128 | Double | 100 | 96.596 ns | 1.7143 ns | 1.24x slower |
System_Double | Vector128 | Double | 100 | 34.331 ns | 0.2659 ns | 2.27x faster |
NetFabric_Double | Vector128 | Double | 100 | 39.739 ns | 0.2777 ns | 1.96x faster |
Baseline_Double | Vector256 | Double | 100 | 82.876 ns | 1.2040 ns | 1.06x slower |
System_Double | Vector256 | Double | 100 | 20.034 ns | 0.1042 ns | 3.89x faster |
NetFabric_Double | Vector256 | Double | 100 | 21.825 ns | 0.1232 ns | 3.57x faster |
Baseline_Double | Vector512 | Double | 100 | 51.904 ns | 0.2927 ns | 1.50x faster |
System_Double | Vector512 | Double | 100 | 27.181 ns | 0.1602 ns | 2.86x faster |
NetFabric_Double | Vector512 | Double | 100 | 21.818 ns | 0.0917 ns | 3.57x faster |
Baseline_Float | Scalar | Float | 100 | 78.342 ns | 0.8300 ns | baseline |
System_Float | Scalar | Float | 100 | 68.697 ns | 0.4501 ns | 1.14x faster |
NetFabric_Float | Scalar | Float | 100 | 85.124 ns | 0.6446 ns | 1.09x slower |
Baseline_Float | Vector128 | Float | 100 | 101.167 ns | 0.5612 ns | 1.29x slower |
System_Float | Vector128 | Float | 100 | 19.894 ns | 0.1286 ns | 3.94x faster |
NetFabric_Float | Vector128 | Float | 100 | 21.724 ns | 0.1460 ns | 3.61x faster |
Baseline_Float | Vector256 | Float | 100 | 102.121 ns | 1.1186 ns | 1.30x slower |
System_Float | Vector256 | Float | 100 | 13.085 ns | 0.1873 ns | 5.99x faster |
NetFabric_Float | Vector256 | Float | 100 | 14.734 ns | 0.0741 ns | 5.32x faster |
Baseline_Float | Vector512 | Float | 100 | 51.630 ns | 0.2217 ns | 1.52x faster |
System_Float | Vector512 | Float | 100 | 13.855 ns | 0.0906 ns | 5.65x faster |
NetFabric_Float | Vector512 | Float | 100 | 13.948 ns | 0.0720 ns | 5.61x faster |
Baseline_Half | Scalar | Half | 100 | 883.888 ns | 2.8508 ns | baseline |
System_Half | Scalar | Half | 100 | 880.562 ns | 1.4729 ns | 1.00x faster |
NetFabric_Half | Scalar | Half | 100 | 893.825 ns | 2.5456 ns | 1.01x slower |
Baseline_Half | Vector128 | Half | 100 | 757.449 ns | 4.0445 ns | 1.17x faster |
System_Half | Vector128 | Half | 100 | 763.809 ns | 2.6164 ns | 1.16x faster |
NetFabric_Half | Vector128 | Half | 100 | 760.726 ns | 4.4007 ns | 1.16x faster |
Baseline_Half | Vector256 | Half | 100 | 756.132 ns | 3.4296 ns | 1.17x faster |
System_Half | Vector256 | Half | 100 | 764.713 ns | 4.1691 ns | 1.16x faster |
NetFabric_Half | Vector256 | Half | 100 | 759.633 ns | 2.6422 ns | 1.16x faster |
Baseline_Half | Vector512 | Half | 100 | 865.753 ns | 2.2094 ns | 1.02x faster |
System_Half | Vector512 | Half | 100 | 863.612 ns | 2.5026 ns | 1.02x faster |
NetFabric_Half | Vector512 | Half | 100 | 867.104 ns | 1.8447 ns | 1.02x faster |
Baseline_Int | Scalar | Int | 100 | 51.976 ns | 0.3824 ns | baseline |
System_Int | Scalar | Int | 100 | 35.131 ns | 0.2025 ns | 1.48x faster |
NetFabric_Int | Scalar | Int | 100 | 33.484 ns | 0.1669 ns | 1.55x faster |
Baseline_Int | Vector128 | Int | 100 | 51.267 ns | 0.2203 ns | 1.01x faster |
System_Int | Vector128 | Int | 100 | 14.950 ns | 0.1110 ns | 3.48x faster |
NetFabric_Int | Vector128 | Int | 100 | 12.695 ns | 0.0619 ns | 4.09x faster |
Baseline_Int | Vector256 | Int | 100 | 52.505 ns | 0.4324 ns | 1.01x slower |
System_Int | Vector256 | Int | 100 | 12.779 ns | 0.0542 ns | 4.07x faster |
NetFabric_Int | Vector256 | Int | 100 | 10.157 ns | 0.0634 ns | 5.12x faster |
Baseline_Int | Vector512 | Int | 100 | 52.323 ns | 0.1651 ns | 1.01x slower |
System_Int | Vector512 | Int | 100 | 13.103 ns | 0.0907 ns | 3.97x faster |
NetFabric_Int | Vector512 | Int | 100 | 10.275 ns | 0.0453 ns | 5.06x faster |
Baseline_Long | Scalar | Long | 100 | 52.713 ns | 0.4538 ns | baseline |
System_Long | Scalar | Long | 100 | 34.924 ns | 0.3090 ns | 1.51x faster |
NetFabric_Long | Scalar | Long | 100 | 33.423 ns | 0.0927 ns | 1.58x faster |
Baseline_Long | Vector128 | Long | 100 | 75.910 ns | 2.1183 ns | 1.45x slower |
System_Long | Vector128 | Long | 100 | 21.505 ns | 0.0783 ns | 2.45x faster |
NetFabric_Long | Vector128 | Long | 100 | 25.719 ns | 0.2887 ns | 2.05x faster |
Baseline_Long | Vector256 | Long | 100 | 75.608 ns | 1.5240 ns | 1.43x slower |
System_Long | Vector256 | Long | 100 | 17.885 ns | 0.0486 ns | 2.95x faster |
NetFabric_Long | Vector256 | Long | 100 | 15.290 ns | 0.0587 ns | 3.45x faster |
Baseline_Long | Vector512 | Long | 100 | 74.174 ns | 0.3423 ns | 1.41x slower |
System_Long | Vector512 | Long | 100 | 18.898 ns | 0.1029 ns | 2.79x faster |
NetFabric_Long | Vector512 | Long | 100 | 14.279 ns | 0.1114 ns | 3.69x faster |
Baseline_Short | Scalar | Short | 100 | 53.586 ns | 0.3659 ns | baseline |
System_Short | Scalar | Short | 100 | 46.125 ns | 0.3830 ns | 1.16x faster |
NetFabric_Short | Scalar | Short | 100 | 41.945 ns | 0.5680 ns | 1.28x faster |
Baseline_Short | Vector128 | Short | 100 | 59.540 ns | 0.5334 ns | 1.11x slower |
System_Short | Vector128 | Short | 100 | 10.624 ns | 0.0495 ns | 5.04x faster |
NetFabric_Short | Vector128 | Short | 100 | 12.153 ns | 0.0755 ns | 4.41x faster |
Baseline_Short | Vector256 | Short | 100 | 59.430 ns | 0.4117 ns | 1.11x slower |
System_Short | Vector256 | Short | 100 | 6.224 ns | 0.0330 ns | 8.62x faster |
NetFabric_Short | Vector256 | Short | 100 | 9.478 ns | 0.0505 ns | 5.65x faster |
Baseline_Short | Vector512 | Short | 100 | 59.545 ns | 0.4133 ns | 1.11x slower |
System_Short | Vector512 | Short | 100 | 7.552 ns | 0.0554 ns | 7.10x faster |
NetFabric_Short | Vector512 | Short | 100 | 8.903 ns | 0.0664 ns | 6.02x faster |
Add Value
Applying a vectorizable binary operator on a span and a fixed scalar value.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 30.491 ns | 0.2496 ns | baseline |
System_Double | Scalar | Double | 100 | 30.656 ns | 0.2597 ns | 1.01x slower |
NetFabric_Double | Scalar | Double | 100 | 23.163 ns | 0.2204 ns | 1.32x faster |
Baseline_Double | Vector128 | Double | 100 | 28.773 ns | 0.2517 ns | 1.06x faster |
System_Double | Vector128 | Double | 100 | 14.443 ns | 0.0950 ns | 2.11x faster |
NetFabric_Double | Vector128 | Double | 100 | 15.381 ns | 0.1444 ns | 1.99x faster |
Baseline_Double | Vector256 | Double | 100 | 28.832 ns | 0.2141 ns | 1.06x faster |
System_Double | Vector256 | Double | 100 | 10.916 ns | 0.0433 ns | 2.79x faster |
NetFabric_Double | Vector256 | Double | 100 | 9.413 ns | 0.1037 ns | 3.24x faster |
Baseline_Double | Vector512 | Double | 100 | 28.806 ns | 0.3041 ns | 1.06x faster |
System_Double | Vector512 | Double | 100 | 11.155 ns | 0.0419 ns | 2.73x faster |
NetFabric_Double | Vector512 | Double | 100 | 10.368 ns | 0.0690 ns | 2.94x faster |
Baseline_Float | Scalar | Float | 100 | 31.094 ns | 0.3901 ns | baseline |
System_Float | Scalar | Float | 100 | 29.603 ns | 0.3048 ns | 1.05x faster |
NetFabric_Float | Scalar | Float | 100 | 23.029 ns | 0.1304 ns | 1.35x faster |
Baseline_Float | Vector128 | Float | 100 | 48.725 ns | 1.8406 ns | 1.56x slower |
System_Float | Vector128 | Float | 100 | 9.178 ns | 0.0628 ns | 3.39x faster |
NetFabric_Float | Vector128 | Float | 100 | 9.732 ns | 0.1437 ns | 3.20x faster |
Baseline_Float | Vector256 | Float | 100 | 28.264 ns | 0.2363 ns | 1.10x faster |
System_Float | Vector256 | Float | 100 | 8.336 ns | 0.0831 ns | 3.73x faster |
NetFabric_Float | Vector256 | Float | 100 | 6.873 ns | 0.0598 ns | 4.52x faster |
Baseline_Float | Vector512 | Float | 100 | 48.878 ns | 1.4262 ns | 1.57x slower |
System_Float | Vector512 | Float | 100 | 8.364 ns | 0.0216 ns | 3.71x faster |
NetFabric_Float | Vector512 | Float | 100 | 8.289 ns | 0.0368 ns | 3.75x faster |
Baseline_Half | Scalar | Half | 100 | 992.662 ns | 4.3637 ns | baseline |
System_Half | Scalar | Half | 100 | 983.645 ns | 3.8075 ns | 1.01x faster |
NetFabric_Half | Scalar | Half | 100 | 1,000.728 ns | 3.2709 ns | 1.01x slower |
Baseline_Half | Vector128 | Half | 100 | 894.462 ns | 2.2043 ns | 1.11x faster |
System_Half | Vector128 | Half | 100 | 875.850 ns | 2.5469 ns | 1.13x faster |
NetFabric_Half | Vector128 | Half | 100 | 877.771 ns | 2.2985 ns | 1.13x faster |
Baseline_Half | Vector256 | Half | 100 | 895.050 ns | 1.3509 ns | 1.11x faster |
System_Half | Vector256 | Half | 100 | 875.080 ns | 2.0255 ns | 1.13x faster |
NetFabric_Half | Vector256 | Half | 100 | 879.065 ns | 1.8564 ns | 1.13x faster |
Baseline_Half | Vector512 | Half | 100 | 895.426 ns | 1.7073 ns | 1.11x faster |
System_Half | Vector512 | Half | 100 | 877.551 ns | 3.3208 ns | 1.13x faster |
NetFabric_Half | Vector512 | Half | 100 | 876.934 ns | 2.6832 ns | 1.13x faster |
Baseline_Int | Scalar | Int | 100 | 44.760 ns | 0.6570 ns | baseline |
System_Int | Scalar | Int | 100 | 36.937 ns | 0.1208 ns | 1.21x faster |
NetFabric_Int | Scalar | Int | 100 | 33.508 ns | 0.1485 ns | 1.34x faster |
Baseline_Int | Vector128 | Int | 100 | 35.064 ns | 0.3169 ns | 1.28x faster |
System_Int | Vector128 | Int | 100 | 11.550 ns | 0.0417 ns | 3.88x faster |
NetFabric_Int | Vector128 | Int | 100 | 10.248 ns | 0.0604 ns | 4.37x faster |
Baseline_Int | Vector256 | Int | 100 | 34.246 ns | 0.1734 ns | 1.31x faster |
System_Int | Vector256 | Int | 100 | 8.310 ns | 0.0370 ns | 5.39x faster |
NetFabric_Int | Vector256 | Int | 100 | 7.636 ns | 0.0398 ns | 5.86x faster |
Baseline_Int | Vector512 | Int | 100 | 34.726 ns | 0.1851 ns | 1.29x faster |
System_Int | Vector512 | Int | 100 | 12.269 ns | 0.0535 ns | 3.65x faster |
NetFabric_Int | Vector512 | Int | 100 | 10.999 ns | 0.0325 ns | 4.07x faster |
Baseline_Long | Scalar | Long | 100 | 45.567 ns | 1.5187 ns | baseline |
System_Long | Scalar | Long | 100 | 36.564 ns | 0.2023 ns | 1.24x faster |
NetFabric_Long | Scalar | Long | 100 | 33.437 ns | 0.1774 ns | 1.36x faster |
Baseline_Long | Vector128 | Long | 100 | 34.554 ns | 0.3545 ns | 1.31x faster |
System_Long | Vector128 | Long | 100 | 13.107 ns | 0.0771 ns | 3.46x faster |
NetFabric_Long | Vector128 | Long | 100 | 24.331 ns | 0.2522 ns | 1.86x faster |
Baseline_Long | Vector256 | Long | 100 | 34.498 ns | 0.4009 ns | 1.32x faster |
System_Long | Vector256 | Long | 100 | 11.127 ns | 0.0865 ns | 4.07x faster |
NetFabric_Long | Vector256 | Long | 100 | 9.689 ns | 0.0644 ns | 4.67x faster |
Baseline_Long | Vector512 | Long | 100 | 34.803 ns | 0.1882 ns | 1.30x faster |
System_Long | Vector512 | Long | 100 | 10.955 ns | 0.0720 ns | 4.14x faster |
NetFabric_Long | Vector512 | Long | 100 | 10.202 ns | 0.0799 ns | 4.44x faster |
Baseline_Short | Scalar | Short | 100 | 39.386 ns | 0.4163 ns | baseline |
System_Short | Scalar | Short | 100 | 40.853 ns | 0.1484 ns | 1.04x slower |
NetFabric_Short | Scalar | Short | 100 | 36.261 ns | 0.1594 ns | 1.09x faster |
Baseline_Short | Vector128 | Short | 100 | 39.172 ns | 0.2904 ns | 1.01x faster |
System_Short | Vector128 | Short | 100 | 6.859 ns | 0.0512 ns | 5.74x faster |
NetFabric_Short | Vector128 | Short | 100 | 7.834 ns | 0.1299 ns | 5.03x faster |
Baseline_Short | Vector256 | Short | 100 | 39.163 ns | 0.5241 ns | 1.01x faster |
System_Short | Vector256 | Short | 100 | 5.024 ns | 0.0321 ns | 7.84x faster |
NetFabric_Short | Vector256 | Short | 100 | 6.008 ns | 0.0401 ns | 6.55x faster |
Baseline_Short | Vector512 | Short | 100 | 38.894 ns | 0.5157 ns | 1.01x faster |
System_Short | Vector512 | Short | 100 | 5.725 ns | 0.0396 ns | 6.88x faster |
NetFabric_Short | Vector512 | Short | 100 | 5.886 ns | 0.0306 ns | 6.69x faster |
AddMultiply
Applying a vectorizable ternary operator on three spans.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 53.194 ns | 0.5459 ns | baseline |
System_Double | Scalar | Double | 100 | 40.778 ns | 0.1742 ns | 1.30x faster |
NetFabric_Double | Scalar | Double | 100 | 38.343 ns | 0.1443 ns | 1.39x faster |
Baseline_Double | Vector128 | Double | 100 | 53.476 ns | 0.4960 ns | 1.01x slower |
System_Double | Vector128 | Double | 100 | 28.606 ns | 0.1557 ns | 1.86x faster |
NetFabric_Double | Vector128 | Double | 100 | 28.967 ns | 0.1387 ns | 1.84x faster |
Baseline_Double | Vector256 | Double | 100 | 53.079 ns | 0.2441 ns | 1.00x faster |
System_Double | Vector256 | Double | 100 | 19.916 ns | 0.0737 ns | 2.67x faster |
NetFabric_Double | Vector256 | Double | 100 | 18.981 ns | 0.0716 ns | 2.80x faster |
Baseline_Double | Vector512 | Double | 100 | 52.691 ns | 0.4285 ns | 1.01x faster |
System_Double | Vector512 | Double | 100 | 22.735 ns | 0.1027 ns | 2.34x faster |
NetFabric_Double | Vector512 | Double | 100 | 18.626 ns | 0.0902 ns | 2.86x faster |
Baseline_Float | Scalar | Float | 100 | 51.615 ns | 0.2995 ns | baseline |
System_Float | Scalar | Float | 100 | 40.463 ns | 0.1519 ns | 1.28x faster |
NetFabric_Float | Scalar | Float | 100 | 38.171 ns | 0.0796 ns | 1.35x faster |
Baseline_Float | Vector128 | Float | 100 | 52.654 ns | 0.1693 ns | 1.02x slower |
System_Float | Vector128 | Float | 100 | 19.467 ns | 0.0949 ns | 2.65x faster |
NetFabric_Float | Vector128 | Float | 100 | 18.488 ns | 0.0830 ns | 2.79x faster |
Baseline_Float | Vector256 | Float | 100 | 52.849 ns | 0.3430 ns | 1.02x slower |
System_Float | Vector256 | Float | 100 | 16.036 ns | 0.0519 ns | 3.22x faster |
NetFabric_Float | Vector256 | Float | 100 | 14.800 ns | 0.0487 ns | 3.49x faster |
Baseline_Float | Vector512 | Float | 100 | 53.026 ns | 0.2116 ns | 1.03x slower |
System_Float | Vector512 | Float | 100 | 16.049 ns | 0.1000 ns | 3.22x faster |
NetFabric_Float | Vector512 | Float | 100 | 15.217 ns | 0.0473 ns | 3.39x faster |
Baseline_Half | Scalar | Half | 100 | 1,618.204 ns | 3.5398 ns | baseline |
System_Half | Scalar | Half | 100 | 2,381.759 ns | 2.3965 ns | 1.47x slower |
NetFabric_Half | Scalar | Half | 100 | 2,375.177 ns | 2.8007 ns | 1.47x slower |
Baseline_Half | Vector128 | Half | 100 | 1,455.613 ns | 6.4194 ns | 1.11x faster |
System_Half | Vector128 | Half | 100 | 2,278.200 ns | 14.9670 ns | 1.41x slower |
NetFabric_Half | Vector128 | Half | 100 | 2,272.810 ns | 9.7295 ns | 1.40x slower |
Baseline_Half | Vector256 | Half | 100 | 1,459.872 ns | 9.3435 ns | 1.11x faster |
System_Half | Vector256 | Half | 100 | 2,272.492 ns | 14.8960 ns | 1.40x slower |
NetFabric_Half | Vector256 | Half | 100 | 2,270.518 ns | 5.6785 ns | 1.40x slower |
Baseline_Half | Vector512 | Half | 100 | 1,454.237 ns | 6.5861 ns | 1.11x faster |
System_Half | Vector512 | Half | 100 | 2,275.960 ns | 12.6515 ns | 1.41x slower |
NetFabric_Half | Vector512 | Half | 100 | 2,277.877 ns | 13.1558 ns | 1.41x slower |
Baseline_Int | Scalar | Int | 100 | 59.095 ns | 0.4008 ns | baseline |
System_Int | Scalar | Int | 100 | 52.993 ns | 0.2280 ns | 1.12x faster |
NetFabric_Int | Scalar | Int | 100 | 45.341 ns | 0.1997 ns | 1.30x faster |
Baseline_Int | Vector128 | Int | 100 | 59.250 ns | 0.5185 ns | 1.00x slower |
System_Int | Vector128 | Int | 100 | 19.522 ns | 0.2302 ns | 3.03x faster |
NetFabric_Int | Vector128 | Int | 100 | 18.133 ns | 0.0982 ns | 3.26x faster |
Baseline_Int | Vector256 | Int | 100 | 59.131 ns | 0.4722 ns | 1.00x slower |
System_Int | Vector256 | Int | 100 | 15.924 ns | 0.0793 ns | 3.71x faster |
NetFabric_Int | Vector256 | Int | 100 | 13.997 ns | 0.0616 ns | 4.22x faster |
Baseline_Int | Vector512 | Int | 100 | 58.863 ns | 0.3824 ns | 1.00x faster |
System_Int | Vector512 | Int | 100 | 14.588 ns | 0.1058 ns | 4.05x faster |
NetFabric_Int | Vector512 | Int | 100 | 14.360 ns | 0.1070 ns | 4.11x faster |
Baseline_Long | Scalar | Long | 100 | 59.422 ns | 0.7016 ns | baseline |
System_Long | Scalar | Long | 100 | 52.869 ns | 0.4472 ns | 1.12x faster |
NetFabric_Long | Scalar | Long | 100 | 45.050 ns | 0.1657 ns | 1.32x faster |
Baseline_Long | Vector128 | Long | 100 | 59.398 ns | 0.5420 ns | 1.00x faster |
System_Long | Vector128 | Long | 100 | 353.783 ns | 1.3794 ns | 5.95x slower |
NetFabric_Long | Vector128 | Long | 100 | 258.917 ns | 1.1508 ns | 4.36x slower |
Baseline_Long | Vector256 | Long | 100 | 59.045 ns | 0.2597 ns | 1.01x faster |
System_Long | Vector256 | Long | 100 | 363.682 ns | 1.2358 ns | 6.12x slower |
NetFabric_Long | Vector256 | Long | 100 | 147.048 ns | 0.6580 ns | 2.48x slower |
Baseline_Long | Vector512 | Long | 100 | 59.265 ns | 0.5245 ns | 1.00x faster |
System_Long | Vector512 | Long | 100 | 22.449 ns | 0.1203 ns | 2.65x faster |
NetFabric_Long | Vector512 | Long | 100 | 144.334 ns | 1.3654 ns | 2.43x slower |
Baseline_Short | Scalar | Short | 100 | 85.215 ns | 0.5693 ns | baseline |
System_Short | Scalar | Short | 100 | 79.092 ns | 0.4696 ns | 1.08x faster |
NetFabric_Short | Scalar | Short | 100 | 80.144 ns | 0.5933 ns | 1.06x faster |
Baseline_Short | Vector128 | Short | 100 | 83.962 ns | 0.3168 ns | 1.02x faster |
System_Short | Vector128 | Short | 100 | 12.624 ns | 0.0586 ns | 6.75x faster |
NetFabric_Short | Vector128 | Short | 100 | 13.848 ns | 0.1233 ns | 6.15x faster |
Baseline_Short | Vector256 | Short | 100 | 84.149 ns | 0.3873 ns | 1.01x faster |
System_Short | Vector256 | Short | 100 | 9.702 ns | 0.0640 ns | 8.78x faster |
NetFabric_Short | Vector256 | Short | 100 | 11.736 ns | 0.0789 ns | 7.26x faster |
Baseline_Short | Vector512 | Short | 100 | 84.477 ns | 0.7474 ns | 1.01x faster |
System_Short | Vector512 | Short | 100 | 19.894 ns | 0.1089 ns | 4.28x faster |
NetFabric_Short | Vector512 | Short | 100 | 12.510 ns | 0.0460 ns | 6.81x faster |
DegreesToRadians
Applying a vectorizable ternary operator on a span and two fixed scalar values.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 105.363 ns | 0.6721 ns | baseline |
System_Double | Scalar | Double | 100 | 104.955 ns | 0.9296 ns | 1.00x faster |
NetFabric_Double | Scalar | Double | 100 | 108.631 ns | 0.4020 ns | 1.03x slower |
Baseline_Double | Vector128 | Double | 100 | 104.213 ns | 0.4783 ns | 1.01x faster |
System_Double | Vector128 | Double | 100 | 70.537 ns | 0.5717 ns | 1.49x faster |
NetFabric_Double | Vector128 | Double | 100 | 50.372 ns | 0.3891 ns | 2.09x faster |
Baseline_Double | Vector256 | Double | 100 | 104.494 ns | 0.6641 ns | 1.01x faster |
System_Double | Vector256 | Double | 100 | 50.886 ns | 0.3590 ns | 2.07x faster |
NetFabric_Double | Vector256 | Double | 100 | 23.781 ns | 0.1212 ns | 4.43x faster |
Baseline_Double | Vector512 | Double | 100 | 105.602 ns | 0.7539 ns | 1.00x slower |
System_Double | Vector512 | Double | 100 | 183.516 ns | 0.8624 ns | 1.74x slower |
NetFabric_Double | Vector512 | Double | 100 | 23.738 ns | 0.0542 ns | 4.44x faster |
Baseline_Float | Scalar | Float | 100 | 64.749 ns | 0.2191 ns | baseline |
System_Float | Scalar | Float | 100 | 65.010 ns | 0.2194 ns | 1.00x slower |
NetFabric_Float | Scalar | Float | 100 | 65.193 ns | 0.3212 ns | 1.01x slower |
Baseline_Float | Vector128 | Float | 100 | 64.669 ns | 0.2393 ns | 1.00x faster |
System_Float | Vector128 | Float | 100 | 39.316 ns | 0.1564 ns | 1.65x faster |
NetFabric_Float | Vector128 | Float | 100 | 15.905 ns | 0.0988 ns | 4.07x faster |
Baseline_Float | Vector256 | Float | 100 | 64.664 ns | 0.3021 ns | 1.00x faster |
System_Float | Vector256 | Float | 100 | 29.404 ns | 0.3665 ns | 2.20x faster |
NetFabric_Float | Vector256 | Float | 100 | 9.523 ns | 0.0493 ns | 6.80x faster |
Baseline_Float | Vector512 | Float | 100 | 64.599 ns | 0.2632 ns | 1.00x faster |
System_Float | Vector512 | Float | 100 | 77.022 ns | 0.7295 ns | 1.19x slower |
NetFabric_Float | Vector512 | Float | 100 | 12.352 ns | 0.0653 ns | 5.24x faster |
Baseline_Half | Scalar | Half | 100 | 999.253 ns | 6.6407 ns | baseline |
System_Half | Scalar | Half | 100 | 1,009.173 ns | 4.2703 ns | 1.01x slower |
NetFabric_Half | Scalar | Half | 100 | 2,561.898 ns | 7.8650 ns | 2.56x slower |
Baseline_Half | Vector128 | Half | 100 | 905.334 ns | 3.8491 ns | 1.10x faster |
System_Half | Vector128 | Half | 100 | 892.921 ns | 1.9791 ns | 1.12x faster |
NetFabric_Half | Vector128 | Half | 100 | 2,420.872 ns | 9.8031 ns | 2.42x slower |
Baseline_Half | Vector256 | Half | 100 | 903.875 ns | 2.1924 ns | 1.11x faster |
System_Half | Vector256 | Half | 100 | 892.802 ns | 1.8886 ns | 1.12x faster |
NetFabric_Half | Vector256 | Half | 100 | 2,418.932 ns | 8.0544 ns | 2.42x slower |
Baseline_Half | Vector512 | Half | 100 | 906.720 ns | 4.1213 ns | 1.10x faster |
System_Half | Vector512 | Half | 100 | 894.048 ns | 3.8254 ns | 1.12x faster |
NetFabric_Half | Vector512 | Half | 100 | 2,424.812 ns | 13.1376 ns | 2.43x slower |
Sum
Applying a vectorizable aggregation operator on a span.
It additionally compares with the performance of LINQ's Sum()
. However, it's worth noting that this method lacks support for the types short
and Half
. In such instances, LINQ's Aggregate()
is employed instead.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 47.326 ns | 0.5851 ns | baseline |
System_Double | Scalar | Double | 100 | 39.147 ns | 0.7875 ns | 1.21x faster |
NetFabric_Double | Scalar | Double | 100 | 61.619 ns | 0.2195 ns | 1.30x slower |
Baseline_Double | Vector128 | Double | 100 | 42.618 ns | 0.3416 ns | 1.11x faster |
System_Double | Vector128 | Double | 100 | 11.973 ns | 0.1163 ns | 3.95x faster |
NetFabric_Double | Vector128 | Double | 100 | 29.314 ns | 0.5716 ns | 1.62x faster |
Baseline_Double | Vector256 | Double | 100 | 42.449 ns | 0.2493 ns | 1.11x faster |
System_Double | Vector256 | Double | 100 | 6.136 ns | 0.0649 ns | 7.71x faster |
NetFabric_Double | Vector256 | Double | 100 | 11.261 ns | 0.1015 ns | 4.20x faster |
Baseline_Double | Vector512 | Double | 100 | 42.334 ns | 0.3927 ns | 1.12x faster |
System_Double | Vector512 | Double | 100 | 5.625 ns | 0.0293 ns | 8.42x faster |
NetFabric_Double | Vector512 | Double | 100 | 13.771 ns | 0.1643 ns | 3.44x faster |
Baseline_Float | Scalar | Float | 100 | 42.699 ns | 0.2814 ns | baseline |
System_Float | Scalar | Float | 100 | 38.877 ns | 0.1420 ns | 1.10x faster |
NetFabric_Float | Scalar | Float | 100 | 61.195 ns | 1.1528 ns | 1.43x slower |
Baseline_Float | Vector128 | Float | 100 | 42.135 ns | 0.1684 ns | 1.01x faster |
System_Float | Vector128 | Float | 100 | 6.284 ns | 0.0545 ns | 6.79x faster |
NetFabric_Float | Vector128 | Float | 100 | 10.830 ns | 0.0863 ns | 3.94x faster |
Baseline_Float | Vector256 | Float | 100 | 42.783 ns | 0.2612 ns | 1.00x slower |
System_Float | Vector256 | Float | 100 | 4.567 ns | 0.0313 ns | 9.35x faster |
NetFabric_Float | Vector256 | Float | 100 | 7.253 ns | 0.0512 ns | 5.88x faster |
Baseline_Float | Vector512 | Float | 100 | 42.215 ns | 0.2125 ns | 1.01x faster |
System_Float | Vector512 | Float | 100 | 4.829 ns | 0.0459 ns | 8.84x faster |
NetFabric_Float | Vector512 | Float | 100 | 8.068 ns | 0.0579 ns | 5.29x faster |
Baseline_Half | Scalar | Half | 100 | 1,248.404 ns | 3.8695 ns | baseline |
System_Half | Scalar | Half | 100 | 1,261.046 ns | 3.3137 ns | 1.01x slower |
NetFabric_Half | Scalar | Half | 100 | 1,246.705 ns | 5.5448 ns | 1.00x faster |
Baseline_Half | Vector128 | Half | 100 | 1,209.489 ns | 3.9070 ns | 1.03x faster |
System_Half | Vector128 | Half | 100 | 1,226.410 ns | 1.3758 ns | 1.02x faster |
NetFabric_Half | Vector128 | Half | 100 | 1,213.822 ns | 8.9948 ns | 1.03x faster |
Baseline_Half | Vector256 | Half | 100 | 1,208.729 ns | 2.2773 ns | 1.03x faster |
System_Half | Vector256 | Half | 100 | 1,228.855 ns | 3.7491 ns | 1.02x faster |
NetFabric_Half | Vector256 | Half | 100 | 1,208.195 ns | 1.4436 ns | 1.03x faster |
Baseline_Half | Vector512 | Half | 100 | 1,236.813 ns | 12.7384 ns | 1.01x faster |
System_Half | Vector512 | Half | 100 | 1,249.052 ns | 11.0453 ns | 1.00x slower |
NetFabric_Half | Vector512 | Half | 100 | 1,251.404 ns | 26.3882 ns | 1.01x slower |
Baseline_Int | Scalar | Int | 100 | 27.755 ns | 0.1998 ns | baseline |
LINQ_Int | Scalar | Int | 100 | 26.892 ns | 0.1653 ns | 1.03x faster |
System_Int | Scalar | Int | 100 | 29.587 ns | 0.1430 ns | 1.07x slower |
NetFabric_Int | Scalar | Int | 100 | 26.959 ns | 0.1393 ns | 1.03x faster |
Baseline_Int | Vector128 | Int | 100 | 27.303 ns | 0.2011 ns | 1.02x faster |
LINQ_Int | Vector128 | Int | 100 | 8.549 ns | 0.0409 ns | 3.25x faster |
System_Int | Vector128 | Int | 100 | 5.547 ns | 0.0283 ns | 5.00x faster |
NetFabric_Int | Vector128 | Int | 100 | 11.656 ns | 0.0453 ns | 2.38x faster |
Baseline_Int | Vector256 | Int | 100 | 27.162 ns | 0.1590 ns | 1.02x faster |
LINQ_Int | Vector256 | Int | 100 | 6.058 ns | 0.0549 ns | 4.58x faster |
System_Int | Vector256 | Int | 100 | 4.551 ns | 0.0410 ns | 6.10x faster |
NetFabric_Int | Vector256 | Int | 100 | 6.314 ns | 0.0908 ns | 4.40x faster |
Baseline_Int | Vector512 | Int | 100 | 45.141 ns | 0.3433 ns | 1.63x slower |
LINQ_Int | Vector512 | Int | 100 | 5.990 ns | 0.0493 ns | 4.64x faster |
System_Int | Vector512 | Int | 100 | 4.382 ns | 0.0712 ns | 6.34x faster |
NetFabric_Int | Vector512 | Int | 100 | 6.208 ns | 0.0640 ns | 4.47x faster |
Baseline_Long | Scalar | Long | 100 | 45.088 ns | 0.3162 ns | baseline |
LINQ_Long | Scalar | Long | 100 | 26.957 ns | 0.1937 ns | 1.67x faster |
System_Long | Scalar | Long | 100 | 30.365 ns | 0.2084 ns | 1.48x faster |
NetFabric_Long | Scalar | Long | 100 | 27.270 ns | 0.2998 ns | 1.65x faster |
Baseline_Long | Vector128 | Long | 100 | 45.053 ns | 0.1710 ns | 1.00x faster |
LINQ_Long | Vector128 | Long | 100 | 28.189 ns | 0.6928 ns | 1.62x faster |
System_Long | Vector128 | Long | 100 | 8.667 ns | 0.0958 ns | 5.20x faster |
NetFabric_Long | Vector128 | Long | 100 | 23.970 ns | 0.2298 ns | 1.88x faster |
Baseline_Long | Vector256 | Long | 100 | 44.878 ns | 0.2091 ns | 1.00x faster |
LINQ_Long | Vector256 | Long | 100 | 8.948 ns | 0.0359 ns | 5.04x faster |
System_Long | Vector256 | Long | 100 | 6.441 ns | 0.0544 ns | 7.00x faster |
NetFabric_Long | Vector256 | Long | 100 | 11.967 ns | 0.0714 ns | 3.77x faster |
Baseline_Long | Vector512 | Long | 100 | 27.126 ns | 0.1424 ns | 1.66x faster |
LINQ_Long | Vector512 | Long | 100 | 9.445 ns | 0.0874 ns | 4.77x faster |
System_Long | Vector512 | Long | 100 | 9.783 ns | 0.2136 ns | 4.62x faster |
NetFabric_Long | Vector512 | Long | 100 | 11.440 ns | 0.1473 ns | 3.94x faster |
Baseline_Short | Scalar | Short | 100 | 39.626 ns | 0.1431 ns | baseline |
System_Short | Scalar | Short | 100 | 41.646 ns | 0.1554 ns | 1.05x slower |
NetFabric_Short | Scalar | Short | 100 | 45.522 ns | 0.4657 ns | 1.15x slower |
Baseline_Short | Vector128 | Short | 100 | 45.966 ns | 0.2654 ns | 1.16x slower |
System_Short | Vector128 | Short | 100 | 4.729 ns | 0.0337 ns | 8.38x faster |
NetFabric_Short | Vector128 | Short | 100 | 7.206 ns | 0.0413 ns | 5.50x faster |
Baseline_Short | Vector256 | Short | 100 | 45.998 ns | 0.2975 ns | 1.16x slower |
System_Short | Vector256 | Short | 100 | 3.947 ns | 0.0294 ns | 10.04x faster |
NetFabric_Short | Vector256 | Short | 100 | 4.827 ns | 0.0566 ns | 8.21x faster |
Baseline_Short | Vector512 | Short | 100 | 38.637 ns | 0.2469 ns | 1.03x faster |
System_Short | Vector512 | Short | 100 | 4.740 ns | 0.0400 ns | 8.36x faster |
NetFabric_Short | Vector512 | Short | 100 | 4.960 ns | 0.0394 ns | 7.99x faster |
Sum2D
Applying a vectorizable aggregation operator on a span with two contiguos values for each element.
It also compares to the performance of LINQ's Aggregate()
, as LINQ's Sum()
does not support non-native numeric types.
Method | Job | Categories | Count | Mean | StdDev | Median | Ratio |
---|---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 52.96 ns | 0.183 ns | 52.99 ns | baseline |
NetFabric_Double | Scalar | Double | 100 | 35.17 ns | 0.153 ns | 35.18 ns | 1.51x faster |
Baseline_Double | Vector128 | Double | 100 | 53.88 ns | 0.522 ns | 53.94 ns | 1.02x slower |
NetFabric_Double | Vector128 | Double | 100 | 48.78 ns | 0.257 ns | 48.70 ns | 1.09x faster |
Baseline_Double | Vector256 | Double | 100 | 54.02 ns | 0.499 ns | 53.96 ns | 1.02x slower |
NetFabric_Double | Vector256 | Double | 100 | 24.94 ns | 0.093 ns | 24.97 ns | 2.12x faster |
Baseline_Double | Vector512 | Double | 100 | 53.78 ns | 0.319 ns | 53.67 ns | 1.01x slower |
NetFabric_Double | Vector512 | Double | 100 | 25.16 ns | 0.064 ns | 25.19 ns | 2.10x faster |
Baseline_Float | Scalar | Float | 100 | 222.75 ns | 0.541 ns | 222.69 ns | baseline |
NetFabric_Float | Scalar | Float | 100 | 36.00 ns | 0.063 ns | 36.00 ns | 6.18x faster |
Baseline_Float | Vector128 | Float | 100 | 222.88 ns | 0.807 ns | 222.87 ns | 1.00x slower |
NetFabric_Float | Vector128 | Float | 100 | 28.59 ns | 0.117 ns | 28.60 ns | 7.79x faster |
Baseline_Float | Vector256 | Float | 100 | 222.69 ns | 0.407 ns | 222.74 ns | 1.00x slower |
NetFabric_Float | Vector256 | Float | 100 | 16.06 ns | 0.072 ns | 16.05 ns | 13.87x faster |
Baseline_Float | Vector512 | Float | 100 | 222.47 ns | 0.646 ns | 222.54 ns | 1.00x faster |
NetFabric_Float | Vector512 | Float | 100 | 16.09 ns | 0.047 ns | 16.08 ns | 13.84x faster |
Baseline_Half | Scalar | Half | 100 | 2,009.62 ns | 1.739 ns | 2,009.29 ns | baseline |
NetFabric_Half | Scalar | Half | 100 | 2,031.01 ns | 2.899 ns | 2,029.84 ns | 1.01x slower |
Baseline_Half | Vector128 | Half | 100 | 1,791.11 ns | 3.574 ns | 1,791.90 ns | 1.12x faster |
NetFabric_Half | Vector128 | Half | 100 | 1,789.25 ns | 6.415 ns | 1,789.96 ns | 1.12x faster |
Baseline_Half | Vector256 | Half | 100 | 1,790.95 ns | 2.931 ns | 1,791.40 ns | 1.12x faster |
NetFabric_Half | Vector256 | Half | 100 | 1,786.29 ns | 8.328 ns | 1,785.79 ns | 1.13x faster |
Baseline_Half | Vector512 | Half | 100 | 1,792.03 ns | 5.916 ns | 1,792.21 ns | 1.12x faster |
NetFabric_Half | Vector512 | Half | 100 | 1,784.97 ns | 2.908 ns | 1,785.13 ns | 1.13x faster |
Baseline_Int | Scalar | Int | 100 | 155.52 ns | 1.035 ns | 155.73 ns | baseline |
NetFabric_Int | Scalar | Int | 100 | 46.21 ns | 0.312 ns | 46.16 ns | 3.37x faster |
Baseline_Int | Vector128 | Int | 100 | 155.49 ns | 0.525 ns | 155.47 ns | 1.00x faster |
NetFabric_Int | Vector128 | Int | 100 | 18.09 ns | 0.191 ns | 18.02 ns | 8.60x faster |
Baseline_Int | Vector256 | Int | 100 | 155.41 ns | 0.830 ns | 155.68 ns | 1.00x faster |
NetFabric_Int | Vector256 | Int | 100 | 16.31 ns | 0.083 ns | 16.30 ns | 9.54x faster |
Baseline_Int | Vector512 | Int | 100 | 155.47 ns | 0.662 ns | 155.39 ns | 1.00x faster |
NetFabric_Int | Vector512 | Int | 100 | 15.23 ns | 0.163 ns | 15.22 ns | 10.21x faster |
Baseline_Long | Scalar | Long | 100 | 46.74 ns | 0.214 ns | 46.76 ns | baseline |
NetFabric_Long | Scalar | Long | 100 | 44.23 ns | 0.180 ns | 44.27 ns | 1.06x faster |
Baseline_Long | Vector128 | Long | 100 | 47.42 ns | 0.634 ns | 47.11 ns | 1.01x slower |
NetFabric_Long | Vector128 | Long | 100 | 33.77 ns | 0.260 ns | 33.78 ns | 1.38x faster |
Baseline_Long | Vector256 | Long | 100 | 46.83 ns | 0.193 ns | 46.80 ns | 1.00x slower |
NetFabric_Long | Vector256 | Long | 100 | 24.60 ns | 0.166 ns | 24.55 ns | 1.90x faster |
Baseline_Long | Vector512 | Long | 100 | 46.75 ns | 0.240 ns | 46.73 ns | 1.00x slower |
NetFabric_Long | Vector512 | Long | 100 | 16.93 ns | 0.201 ns | 16.87 ns | 2.76x faster |
Baseline_Short | Scalar | Short | 100 | 206.23 ns | 0.565 ns | 206.34 ns | baseline |
NetFabric_Short | Scalar | Short | 100 | 70.70 ns | 3.636 ns | 69.14 ns | 2.81x faster |
Baseline_Short | Vector128 | Short | 100 | 206.65 ns | 0.640 ns | 206.62 ns | 1.00x slower |
NetFabric_Short | Vector128 | Short | 100 | 15.54 ns | 0.122 ns | 15.52 ns | 13.27x faster |
Baseline_Short | Vector256 | Short | 100 | 206.70 ns | 0.756 ns | 206.53 ns | 1.00x slower |
NetFabric_Short | Vector256 | Short | 100 | 19.11 ns | 0.077 ns | 19.12 ns | 10.79x faster |
Baseline_Short | Vector512 | Short | 100 | 206.66 ns | 1.016 ns | 206.27 ns | 1.00x slower |
NetFabric_Short | Vector512 | Short | 100 | 17.90 ns | 0.213 ns | 17.78 ns | 11.52x faster |
Sum3D
Applying a vectorizable aggregation operator on a span with three contiguos values for each element.
It also compares to the performance of LINQ's Aggregate()
, as LINQ's Sum()
does not support non-native numeric types.
Method | Job | Categories | Count | Mean | StdDev | Median | Ratio |
---|---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 63.13 ns | 1.191 ns | 62.88 ns | baseline |
NetFabric_Double | Scalar | Double | 100 | 64.18 ns | 0.660 ns | 64.02 ns | 1.02x slower |
Baseline_Double | Vector128 | Double | 100 | 64.47 ns | 0.678 ns | 64.20 ns | 1.02x slower |
NetFabric_Double | Vector128 | Double | 100 | 136.92 ns | 2.452 ns | 137.40 ns | 2.17x slower |
Baseline_Double | Vector256 | Double | 100 | 65.29 ns | 1.422 ns | 65.45 ns | 1.04x slower |
NetFabric_Double | Vector256 | Double | 100 | 77.85 ns | 3.706 ns | 77.32 ns | 1.20x slower |
Baseline_Double | Vector512 | Double | 100 | 65.87 ns | 1.049 ns | 66.15 ns | 1.04x slower |
NetFabric_Double | Vector512 | Double | 100 | 78.31 ns | 1.251 ns | 78.17 ns | 1.24x slower |
Baseline_Float | Scalar | Float | 100 | 64.87 ns | 1.669 ns | 65.03 ns | baseline |
NetFabric_Float | Scalar | Float | 100 | 66.72 ns | 0.964 ns | 67.09 ns | 1.02x slower |
Baseline_Float | Vector128 | Float | 100 | 66.07 ns | 1.175 ns | 66.50 ns | 1.01x slower |
NetFabric_Float | Vector128 | Float | 100 | 70.49 ns | 1.354 ns | 70.22 ns | 1.08x slower |
Baseline_Float | Vector256 | Float | 100 | 64.62 ns | 0.888 ns | 64.97 ns | 1.01x faster |
NetFabric_Float | Vector256 | Float | 100 | 47.88 ns | 0.768 ns | 47.76 ns | 1.37x faster |
Baseline_Float | Vector512 | Float | 100 | 65.29 ns | 1.584 ns | 65.19 ns | 1.01x slower |
NetFabric_Float | Vector512 | Float | 100 | 47.94 ns | 0.577 ns | 48.03 ns | 1.36x faster |
Baseline_Half | Scalar | Half | 100 | 3,027.82 ns | 11.188 ns | 3,028.59 ns | baseline |
NetFabric_Half | Scalar | Half | 100 | 3,011.36 ns | 27.446 ns | 3,002.45 ns | 1.01x faster |
Baseline_Half | Vector128 | Half | 100 | 2,697.99 ns | 23.967 ns | 2,692.15 ns | 1.12x faster |
NetFabric_Half | Vector128 | Half | 100 | 2,648.99 ns | 16.402 ns | 2,648.01 ns | 1.14x faster |
Baseline_Half | Vector256 | Half | 100 | 2,674.79 ns | 14.089 ns | 2,675.71 ns | 1.13x faster |
NetFabric_Half | Vector256 | Half | 100 | 2,657.25 ns | 22.551 ns | 2,652.22 ns | 1.14x faster |
Baseline_Half | Vector512 | Half | 100 | 2,678.09 ns | 13.898 ns | 2,678.40 ns | 1.13x faster |
NetFabric_Half | Vector512 | Half | 100 | 2,652.20 ns | 17.853 ns | 2,650.41 ns | 1.14x faster |
Baseline_Int | Scalar | Int | 100 | 55.13 ns | 0.827 ns | 54.86 ns | baseline |
NetFabric_Int | Scalar | Int | 100 | 77.42 ns | 2.504 ns | 77.02 ns | 1.41x slower |
Baseline_Int | Vector128 | Int | 100 | 56.04 ns | 0.775 ns | 55.79 ns | 1.02x slower |
NetFabric_Int | Vector128 | Int | 100 | 57.54 ns | 0.452 ns | 57.52 ns | 1.04x slower |
Baseline_Int | Vector256 | Int | 100 | 56.18 ns | 0.528 ns | 56.14 ns | 1.02x slower |
NetFabric_Int | Vector256 | Int | 100 | 46.41 ns | 1.023 ns | 46.17 ns | 1.18x faster |
Baseline_Int | Vector512 | Int | 100 | 56.17 ns | 0.538 ns | 55.81 ns | 1.02x slower |
NetFabric_Int | Vector512 | Int | 100 | 47.44 ns | 1.304 ns | 47.39 ns | 1.17x faster |
Baseline_Long | Scalar | Long | 100 | 55.58 ns | 0.647 ns | 55.60 ns | baseline |
NetFabric_Long | Scalar | Long | 100 | 79.05 ns | 3.349 ns | 78.05 ns | 1.45x slower |
Baseline_Long | Vector128 | Long | 100 | 53.84 ns | 0.598 ns | 53.97 ns | 1.03x faster |
NetFabric_Long | Vector128 | Long | 100 | 115.99 ns | 4.999 ns | 114.15 ns | 2.20x slower |
Baseline_Long | Vector256 | Long | 100 | 54.18 ns | 1.042 ns | 53.96 ns | 1.03x faster |
NetFabric_Long | Vector256 | Long | 100 | 65.29 ns | 0.788 ns | 64.94 ns | 1.17x slower |
Baseline_Long | Vector512 | Long | 100 | 53.62 ns | 0.473 ns | 53.47 ns | 1.04x faster |
NetFabric_Long | Vector512 | Long | 100 | 64.74 ns | 0.961 ns | 64.99 ns | 1.17x slower |
Baseline_Short | Scalar | Short | 100 | 104.98 ns | 2.734 ns | 103.79 ns | baseline |
NetFabric_Short | Scalar | Short | 100 | 116.80 ns | 0.871 ns | 116.76 ns | 1.11x slower |
Baseline_Short | Vector128 | Short | 100 | 105.19 ns | 1.461 ns | 104.58 ns | 1.00x slower |
NetFabric_Short | Vector128 | Short | 100 | 54.18 ns | 2.162 ns | 53.45 ns | 1.93x faster |
Baseline_Short | Vector256 | Short | 100 | 104.76 ns | 0.832 ns | 104.62 ns | 1.00x faster |
NetFabric_Short | Vector256 | Short | 100 | 51.95 ns | 0.549 ns | 51.96 ns | 2.02x faster |
Baseline_Short | Vector512 | Short | 100 | 104.62 ns | 0.647 ns | 104.67 ns | 1.00x faster |
NetFabric_Short | Vector512 | Short | 100 | 52.01 ns | 0.386 ns | 51.97 ns | 2.02x faster |
Sum4D
Applying a vectorizable aggregation operator on a span with four contiguos values for each element.
It also compares to the performance of LINQ's Aggregate()
, as LINQ's Sum()
does not support non-native numeric types.
Method | Job | Categories | Count | Mean | StdDev | Median | Ratio |
---|---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 70.27 ns | 1.159 ns | 70.54 ns | baseline |
NetFabric_Double | Scalar | Double | 100 | 74.65 ns | 1.107 ns | 74.23 ns | 1.06x slower |
Baseline_Double | Vector128 | Double | 100 | 66.23 ns | 0.673 ns | 66.16 ns | 1.06x faster |
NetFabric_Double | Vector128 | Double | 100 | 266.95 ns | 3.067 ns | 267.10 ns | 3.80x slower |
Baseline_Double | Vector256 | Double | 100 | 70.21 ns | 2.883 ns | 70.53 ns | 1.05x faster |
NetFabric_Double | Vector256 | Double | 100 | 51.89 ns | 0.490 ns | 51.98 ns | 1.35x faster |
Baseline_Double | Vector512 | Double | 100 | 71.16 ns | 1.484 ns | 71.40 ns | 1.01x slower |
NetFabric_Double | Vector512 | Double | 100 | 53.24 ns | 1.668 ns | 52.35 ns | 1.32x faster |
Baseline_Float | Scalar | Float | 100 | 70.08 ns | 1.180 ns | 70.32 ns | baseline |
NetFabric_Float | Scalar | Float | 100 | 78.04 ns | 0.886 ns | 78.15 ns | 1.11x slower |
Baseline_Float | Vector128 | Float | 100 | 70.11 ns | 1.266 ns | 70.35 ns | 1.00x slower |
NetFabric_Float | Vector128 | Float | 100 | 51.27 ns | 0.503 ns | 51.20 ns | 1.37x faster |
Baseline_Float | Vector256 | Float | 100 | 70.51 ns | 0.976 ns | 70.50 ns | 1.01x slower |
NetFabric_Float | Vector256 | Float | 100 | 28.12 ns | 0.812 ns | 27.82 ns | 2.50x faster |
Baseline_Float | Vector512 | Float | 100 | 71.39 ns | 1.198 ns | 71.67 ns | 1.02x slower |
NetFabric_Float | Vector512 | Float | 100 | 28.20 ns | 0.806 ns | 27.82 ns | 2.47x faster |
Baseline_Half | Scalar | Half | 100 | 3,619.66 ns | 24.005 ns | 3,625.74 ns | baseline |
NetFabric_Half | Scalar | Half | 100 | 4,107.22 ns | 39.515 ns | 4,099.78 ns | 1.14x slower |
Baseline_Half | Vector128 | Half | 100 | 2,901.28 ns | 36.231 ns | 2,907.40 ns | 1.25x faster |
NetFabric_Half | Vector128 | Half | 100 | 3,621.06 ns | 50.761 ns | 3,612.47 ns | 1.00x slower |
Baseline_Half | Vector256 | Half | 100 | 2,926.03 ns | 25.950 ns | 2,928.69 ns | 1.24x faster |
NetFabric_Half | Vector256 | Half | 100 | 3,641.74 ns | 34.245 ns | 3,637.19 ns | 1.01x slower |
Baseline_Half | Vector512 | Half | 100 | 2,904.28 ns | 25.974 ns | 2,903.98 ns | 1.25x faster |
NetFabric_Half | Vector512 | Half | 100 | 3,620.77 ns | 30.743 ns | 3,612.92 ns | 1.00x slower |
Baseline_Int | Scalar | Int | 100 | 64.53 ns | 1.218 ns | 64.64 ns | baseline |
NetFabric_Int | Scalar | Int | 100 | 91.91 ns | 0.867 ns | 92.21 ns | 1.42x slower |
Baseline_Int | Vector128 | Int | 100 | 66.54 ns | 1.323 ns | 66.69 ns | 1.03x slower |
NetFabric_Int | Vector128 | Int | 100 | 35.50 ns | 0.474 ns | 35.67 ns | 1.82x faster |
Baseline_Int | Vector256 | Int | 100 | 66.49 ns | 0.943 ns | 66.77 ns | 1.03x slower |
NetFabric_Int | Vector256 | Int | 100 | 19.73 ns | 0.320 ns | 19.69 ns | 3.27x faster |
Baseline_Int | Vector512 | Int | 100 | 67.16 ns | 2.221 ns | 66.95 ns | 1.04x slower |
NetFabric_Int | Vector512 | Int | 100 | 19.66 ns | 0.254 ns | 19.65 ns | 3.29x faster |
Baseline_Long | Scalar | Long | 100 | 68.33 ns | 1.170 ns | 68.47 ns | baseline |
NetFabric_Long | Scalar | Long | 100 | 92.38 ns | 3.018 ns | 91.48 ns | 1.36x slower |
Baseline_Long | Vector128 | Long | 100 | 65.33 ns | 0.962 ns | 65.22 ns | 1.05x faster |
NetFabric_Long | Vector128 | Long | 100 | 224.12 ns | 2.034 ns | 223.69 ns | 3.28x slower |
Baseline_Long | Vector256 | Long | 100 | 66.62 ns | 1.365 ns | 66.62 ns | 1.03x faster |
NetFabric_Long | Vector256 | Long | 100 | 36.37 ns | 0.685 ns | 36.52 ns | 1.88x faster |
Baseline_Long | Vector512 | Long | 100 | 66.71 ns | 1.552 ns | 66.69 ns | 1.02x faster |
NetFabric_Long | Vector512 | Long | 100 | 36.98 ns | 0.629 ns | 37.20 ns | 1.85x faster |
Baseline_Short | Scalar | Short | 100 | 96.96 ns | 1.010 ns | 97.01 ns | baseline |
NetFabric_Short | Scalar | Short | 100 | 137.71 ns | 2.069 ns | 138.09 ns | 1.42x slower |
Baseline_Short | Vector128 | Short | 100 | 96.97 ns | 0.634 ns | 96.82 ns | 1.00x slower |
NetFabric_Short | Vector128 | Short | 100 | 22.32 ns | 0.225 ns | 22.30 ns | 4.35x faster |
Baseline_Short | Vector256 | Short | 100 | 96.67 ns | 2.173 ns | 96.08 ns | 1.00x slower |
NetFabric_Short | Vector256 | Short | 100 | 18.00 ns | 0.273 ns | 17.98 ns | 5.38x faster |
Baseline_Short | Vector512 | Short | 100 | 95.26 ns | 1.179 ns | 94.80 ns | 1.02x faster |
NetFabric_Short | Vector512 | Short | 100 | 21.14 ns | 0.326 ns | 21.10 ns | 4.58x faster |
Min aggregation
Applying a vectorizable aggregation operator with propagation os NaN
.
Method | Job | Categories | Count | Mean | StdDev | Ratio |
---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 88.695 ns | 0.4477 ns | baseline |
System_Double | Scalar | Double | 100 | 77.290 ns | 0.5142 ns | 1.15x faster |
NetFabric_Double | Scalar | Double | 100 | 101.372 ns | 0.8989 ns | 1.14x slower |
Baseline_Double | Vector128 | Double | 100 | 99.235 ns | 0.8462 ns | 1.12x slower |
System_Double | Vector128 | Double | 100 | 37.595 ns | 0.3627 ns | 2.36x faster |
NetFabric_Double | Vector128 | Double | 100 | 74.457 ns | 0.2241 ns | 1.19x faster |
Baseline_Double | Vector256 | Double | 100 | 99.448 ns | 1.1002 ns | 1.12x slower |
System_Double | Vector256 | Double | 100 | 19.433 ns | 0.1401 ns | 4.57x faster |
NetFabric_Double | Vector256 | Double | 100 | 37.875 ns | 0.2884 ns | 2.34x faster |
Baseline_Double | Vector512 | Double | 100 | 72.462 ns | 0.2434 ns | 1.22x faster |
System_Double | Vector512 | Double | 100 | 28.316 ns | 0.1643 ns | 3.13x faster |
NetFabric_Double | Vector512 | Double | 100 | 38.506 ns | 0.2427 ns | 2.30x faster |
Baseline_Float | Scalar | Float | 100 | 100.707 ns | 0.3968 ns | baseline |
System_Float | Scalar | Float | 100 | 74.501 ns | 0.4437 ns | 1.35x faster |
NetFabric_Float | Scalar | Float | 100 | 88.893 ns | 0.3883 ns | 1.13x faster |
Baseline_Float | Vector128 | Float | 100 | 93.727 ns | 0.3726 ns | 1.07x faster |
System_Float | Vector128 | Float | 100 | 17.961 ns | 0.1359 ns | 5.61x faster |
NetFabric_Float | Vector128 | Float | 100 | 36.605 ns | 0.1388 ns | 2.75x faster |
Baseline_Float | Vector256 | Float | 100 | 96.138 ns | 0.7183 ns | 1.05x faster |
System_Float | Vector256 | Float | 100 | 10.558 ns | 0.0520 ns | 9.54x faster |
NetFabric_Float | Vector256 | Float | 100 | 30.799 ns | 0.1294 ns | 3.27x faster |
Baseline_Float | Vector512 | Float | 100 | 72.845 ns | 0.2197 ns | 1.38x faster |
System_Float | Vector512 | Float | 100 | 14.931 ns | 0.2321 ns | 6.75x faster |
NetFabric_Float | Vector512 | Float | 100 | 23.989 ns | 0.0737 ns | 4.20x faster |
Baseline_Half | Scalar | Half | 100 | 1,152.279 ns | 5.4053 ns | baseline |
System_Half | Scalar | Half | 100 | 180.660 ns | 0.6765 ns | 6.38x faster |
NetFabric_Half | Scalar | Half | 100 | 1,128.877 ns | 4.8094 ns | 1.02x faster |
Baseline_Half | Vector128 | Half | 100 | 1,096.543 ns | 5.6732 ns | 1.05x faster |
System_Half | Vector128 | Half | 100 | 180.565 ns | 1.6048 ns | 6.38x faster |
NetFabric_Half | Vector128 | Half | 100 | 1,089.019 ns | 5.2743 ns | 1.06x faster |
Baseline_Half | Vector256 | Half | 100 | 1,096.248 ns | 6.2655 ns | 1.05x faster |
System_Half | Vector256 | Half | 100 | 179.587 ns | 0.4680 ns | 6.42x faster |
NetFabric_Half | Vector256 | Half | 100 | 1,083.846 ns | 5.0029 ns | 1.06x faster |
Baseline_Half | Vector512 | Half | 100 | 1,264.763 ns | 3.0543 ns | 1.10x slower |
System_Half | Vector512 | Half | 100 | 179.811 ns | 0.7011 ns | 6.41x faster |
NetFabric_Half | Vector512 | Half | 100 | 1,263.840 ns | 3.9612 ns | 1.10x slower |
Baseline_Int | Scalar | Int | 100 | 34.827 ns | 0.1450 ns | baseline |
System_Int | Scalar | Int | 100 | 34.970 ns | 0.4225 ns | 1.00x slower |
NetFabric_Int | Scalar | Int | 100 | 35.142 ns | 0.2962 ns | 1.01x slower |
Baseline_Int | Vector128 | Int | 100 | 35.000 ns | 0.2081 ns | 1.00x slower |
System_Int | Vector128 | Int | 100 | 6.652 ns | 0.0848 ns | 5.24x faster |
NetFabric_Int | Vector128 | Int | 100 | 14.140 ns | 0.0707 ns | 2.46x faster |
Baseline_Int | Vector256 | Int | 100 | 34.743 ns | 0.2514 ns | 1.00x faster |
System_Int | Vector256 | Int | 100 | 3.935 ns | 0.0226 ns | 8.85x faster |
NetFabric_Int | Vector256 | Int | 100 | 9.600 ns | 0.0822 ns | 3.63x faster |
Baseline_Int | Vector512 | Int | 100 | 35.223 ns | 0.3104 ns | 1.01x slower |
System_Int | Vector512 | Int | 100 | 2.844 ns | 0.0248 ns | 12.25x faster |
NetFabric_Int | Vector512 | Int | 100 | 9.713 ns | 0.0722 ns | 3.59x faster |
Baseline_Long | Scalar | Long | 100 | 34.862 ns | 0.2250 ns | baseline |
System_Long | Scalar | Long | 100 | 35.469 ns | 0.5156 ns | 1.02x slower |
NetFabric_Long | Scalar | Long | 100 | 35.329 ns | 0.2556 ns | 1.01x slower |
Baseline_Long | Vector128 | Long | 100 | 71.426 ns | 0.3344 ns | 2.05x slower |
System_Long | Vector128 | Long | 100 | 20.049 ns | 0.1214 ns | 1.74x faster |
NetFabric_Long | Vector128 | Long | 100 | 27.073 ns | 0.1602 ns | 1.29x faster |
Baseline_Long | Vector256 | Long | 100 | 71.732 ns | 0.5353 ns | 2.06x slower |
System_Long | Vector256 | Long | 100 | 8.965 ns | 0.0369 ns | 3.89x faster |
NetFabric_Long | Vector256 | Long | 100 | 14.426 ns | 0.1072 ns | 2.42x faster |
Baseline_Long | Vector512 | Long | 100 | 71.470 ns | 0.3791 ns | 2.05x slower |
System_Long | Vector512 | Long | 100 | 6.234 ns | 0.0301 ns | 5.59x faster |
NetFabric_Long | Vector512 | Long | 100 | 13.922 ns | 0.0954 ns | 2.50x faster |
Baseline_Short | Scalar | Short | 100 | 39.547 ns | 0.1179 ns | baseline |
System_Short | Scalar | Short | 100 | 39.819 ns | 0.2629 ns | 1.01x slower |
NetFabric_Short | Scalar | Short | 100 | 39.546 ns | 0.3363 ns | 1.00x slower |
Baseline_Short | Vector128 | Short | 100 | 40.087 ns | 0.1960 ns | 1.01x slower |
System_Short | Vector128 | Short | 100 | 3.567 ns | 0.0207 ns | 11.09x faster |
NetFabric_Short | Vector128 | Short | 100 | 10.459 ns | 0.0921 ns | 3.78x faster |
Baseline_Short | Vector256 | Short | 100 | 40.558 ns | 0.4019 ns | 1.02x slower |
System_Short | Vector256 | Short | 100 | 2.834 ns | 0.0335 ns | 13.94x faster |
NetFabric_Short | Vector256 | Short | 100 | 12.507 ns | 0.1092 ns | 3.16x faster |
Baseline_Short | Vector512 | Short | 100 | 40.248 ns | 0.3349 ns | 1.02x slower |
System_Short | Vector512 | Short | 100 | 2.710 ns | 0.0450 ns | 14.63x faster |
NetFabric_Short | Vector512 | Short | 100 | 13.168 ns | 0.1512 ns | 3.00x faster |
MinMax
Applying two vectorizable aggregation operators on a single iteration of a span.
Method | Job | Categories | Count | Mean | StdDev | Median | Ratio |
---|---|---|---|---|---|---|---|
Baseline_Double | Scalar | Double | 100 | 179.35 ns | 0.721 ns | 179.23 ns | baseline |
NetFabric_Double | Scalar | Double | 100 | 280.46 ns | 1.142 ns | 280.48 ns | 1.56x slower |
Baseline_Double | Vector128 | Double | 100 | 162.75 ns | 0.844 ns | 162.67 ns | 1.10x faster |
NetFabric_Double | Vector128 | Double | 100 | 29.13 ns | 0.211 ns | 29.10 ns | 6.16x faster |
Baseline_Double | Vector256 | Double | 100 | 163.13 ns | 1.237 ns | 162.72 ns | 1.10x faster |
NetFabric_Double | Vector256 | Double | 100 | 23.43 ns | 0.106 ns | 23.45 ns | 7.65x faster |
Baseline_Double | Vector512 | Double | 100 | 85.79 ns | 0.456 ns | 85.89 ns | 2.09x faster |
NetFabric_Double | Vector512 | Double | 100 | 21.53 ns | 0.147 ns | 21.57 ns | 8.33x faster |
Baseline_Float | Scalar | Float | 100 | 170.32 ns | 4.771 ns | 168.32 ns | baseline |
NetFabric_Float | Scalar | Float | 100 | 278.95 ns | 0.886 ns | 278.88 ns | 1.64x slower |
Baseline_Float | Vector128 | Float | 100 | 159.66 ns | 0.784 ns | 159.44 ns | 1.06x faster |
NetFabric_Float | Vector128 | Float | 100 | 23.01 ns | 0.243 ns | 23.11 ns | 7.38x faster |
Baseline_Float | Vector256 | Float | 100 | 160.10 ns | 1.203 ns | 160.16 ns | 1.07x faster |
NetFabric_Float | Vector256 | Float | 100 | 40.18 ns | 0.178 ns | 40.20 ns | 4.25x faster |
Baseline_Float | Vector512 | Float | 100 | 85.27 ns | 0.305 ns | 85.35 ns | 2.00x faster |
NetFabric_Float | Vector512 | Float | 100 | 17.24 ns | 0.068 ns | 17.24 ns | 9.91x faster |
Baseline_Half | Scalar | Half | 100 | 1,556.85 ns | 8.238 ns | 1,555.36 ns | baseline |
NetFabric_Half | Scalar | Half | 100 | 424.22 ns | 2.930 ns | 424.24 ns | 3.67x faster |
Baseline_Half | Vector128 | Half | 100 | 1,326.75 ns | 6.993 ns | 1,326.94 ns | 1.17x faster |
NetFabric_Half | Vector128 | Half | 100 | 443.36 ns | 3.124 ns | 442.03 ns | 3.51x faster |
Baseline_Half | Vector256 | Half | 100 | 1,329.31 ns | 8.359 ns | 1,328.91 ns | 1.17x faster |
NetFabric_Half | Vector256 | Half | 100 | 445.48 ns | 2.852 ns | 445.42 ns | 3.49x faster |
Baseline_Half | Vector512 | Half | 100 | 1,683.51 ns | 5.413 ns | 1,684.22 ns | 1.08x slower |
NetFabric_Half | Vector512 | Half | 100 | 443.25 ns | 3.760 ns | 443.83 ns | 3.52x faster |
Baseline_Int | Scalar | Int | 100 | 56.99 ns | 0.471 ns | 56.94 ns | baseline |
NetFabric_Int | Scalar | Int | 100 | 56.65 ns | 0.396 ns | 56.69 ns | 1.01x faster |
Baseline_Int | Vector128 | Int | 100 | 74.98 ns | 2.101 ns | 74.10 ns | 1.32x slower |
NetFabric_Int | Vector128 | Int | 100 | 15.97 ns | 0.068 ns | 15.97 ns | 3.57x faster |
Baseline_Int | Vector256 | Int | 100 | 58.38 ns | 0.607 ns | 58.35 ns | 1.02x slower |
NetFabric_Int | Vector256 | Int | 100 | 13.69 ns | 0.158 ns | 13.66 ns | 4.16x faster |
Baseline_Int | Vector512 | Int | 100 | 58.12 ns | 0.684 ns | 57.98 ns | 1.02x slower |
NetFabric_Int | Vector512 | Int | 100 | 13.79 ns | 0.087 ns | 13.78 ns | 4.14x faster |
Baseline_Long | Scalar | Long | 100 | 65.65 ns | 6.839 ns | 62.71 ns | baseline |
NetFabric_Long | Scalar | Long | 100 | 59.87 ns | 0.458 ns | 59.96 ns | 1.11x faster |
Baseline_Long | Vector128 | Long | 100 | 116.27 ns | 0.414 ns | 116.27 ns | 1.77x slower |
NetFabric_Long | Vector128 | Long | 100 | 33.53 ns | 0.188 ns | 33.51 ns | 1.97x faster |
Baseline_Long | Vector256 | Long | 100 | 133.37 ns | 0.618 ns | 133.28 ns | 2.02x slower |
NetFabric_Long | Vector256 | Long | 100 | 19.29 ns | 0.081 ns | 19.32 ns | 3.43x faster |
Baseline_Long | Vector512 | Long | 100 | 133.90 ns | 0.909 ns | 133.86 ns | 2.03x slower |
NetFabric_Long | Vector512 | Long | 100 | 18.21 ns | 0.248 ns | 18.15 ns | 3.63x faster |
Baseline_Short | Scalar | Short | 100 | 57.21 ns | 0.209 ns | 57.17 ns | baseline |
NetFabric_Short | Scalar | Short | 100 | 77.70 ns | 1.397 ns | 77.86 ns | 1.36x slower |
Baseline_Short | Vector128 | Short | 100 | 99.30 ns | 0.467 ns | 99.44 ns | 1.74x slower |
NetFabric_Short | Vector128 | Short | 100 | 15.76 ns | 0.133 ns | 15.73 ns | 3.63x faster |
Baseline_Short | Vector256 | Short | 100 | 73.83 ns | 0.381 ns | 73.89 ns | 1.29x slower |
NetFabric_Short | Vector256 | Short | 100 | 23.80 ns | 0.355 ns | 23.95 ns | 2.40x faster |
Baseline_Short | Vector512 | Short | 100 | 74.44 ns | 0.724 ns | 74.45 ns | 1.30x slower |
NetFabric_Short | Vector512 | Short | 100 | 23.79 ns | 0.333 ns | 23.66 ns | 2.41x faster |