LSP Plugins: версия 1.1.4

SadKo · 29 сен 2018

Начиная с релиза 1.1.4 я выпустил весь набор плагинов LSP Plugins под лицензией GNU LGPL v3!

Release 1.1.4 is coming out with a lot of new changes!
First of all, LSP Plugins became completely open source and are licensed under terms of GNU LGPL v3 license!
Additionally, experimental support of ARMv7-A architecture added, basicly for Raspberry Pi 3B/3B+ devices.

The overall list of changes is listed below:

Changed licensing to GNU Lesser General Public License version 3 (GNU LGPL v3).

Moved code repository to GitHub while keeping release history.

Implemented linear impulse response profiler plugin.

Added basic Raspberry Pi 3B/3B+ (ARMv7A) support (experimental).

Implemented unit testing subsystem.

Implemented performance testing subsystem.

Implemented manual testing subsystem.

Fixed and optimized convolution algorithm for convolver module that produced invalid output.

Added LSPC file format implementation.

Added LSPC file format support to convolver plugins.

Huge refactoring: DSP code moved from core to separate subtree.

Partially implemented NEON SIMD instruction support for some DSP assembly functions for ARMv7A architecture.

Fixed bugs in some DSP oversampling routines.

Optimized complex multiplication functions.

Implemented additional complex number routines.

Implemented additional functions to DSP core.

Fixed compilation warnings and errors emitted by the GCC 8 compiler.

Updated development documentation.

Demo for a new plugin:

Rel · 30 сен 2018

а в формате Linux VST есть? мой любимый Cocos Reaper на линуксах поддерживает пока только LVST формат... можно канеш через Jack накрутить систему, но проще было бы сразу в DAW использовать...

SadKo · 30 сен 2018

Rel сказал(а): ↑

а в формате Linux VST есть? мой любимый Cocos Reaper на линуксах поддерживает пока только LVST формат... можно канеш через Jack накрутить систему, но проще было бы сразу в DAW использовать...
Нажмите, чтобы раскрыть...

Есть конечно.

SadKo · 21 дек 2018

А тем временем, после трёх лет разработки я выпустил новый релиз коллекции плагинов LSP Plugins - 1.1.5.
Релиз для скачивания: https://github.com/sadko4u/lsp-plugins/releases/tag/lsp-plugins-1.1.5

Изменения по релизу 1.1.5:

Implemented stereo version of Profiler plugin.

Added 'Spectralizer' and 'Mastering' modes to the Spectrum Analyzer plugin series.

All SIMD-optimized DSP code now ported to ARMv7A architecture and optimized using ARM NEON instruction set.

Added Frame Buffer primitive support by plugins and widgets.

Implemented RGBA and HSLA color manipulation routines for point array rendering optimizations.

Extended unit and performance test coverage.

Enabled RELRO and PIE option for binaries, simplified build system.

Implemented optimized DSP functions for minimum and maximum search.

Implemented optimized DSP functions for static biquad processing, dynamic biquad processing, dynamic bilinear transformation.

Extended DSP code with different set of software rendering functions that enhance visual effects.

Added support of FreeBSD operating system (plugins are available for building in FreeBSD ports).

Improved build process, added possibility to specify PREFIX variable for installing into specified path instead of /usr/local.

Fixed building issues under Ubuntu Linux related to compiler and linker flags reordering.

Fixed system character set detection on certain systems that caused disappearing of text labels in the UI.

Fixed window decorating issue under the i3 window manager.

Fixed biquad filter processing routines that could cause memory corruption and/or invalid behaviour in certain circumstances.

Fixed serious memory corruption in SSE implementation of fast convolution routines that could cause spontaneous crashes of convolvers.

Fixed buffer underflow in Convolver module that could cause memory corruption and spontaneous crashes of host.

UbIvItS · 21 дек 2018

SadKo, а сравнивал производительность с аналогами?

SadKo · 22 дек 2018

UbIvItS сказал(а): ↑

SadKo, а сравнивал производительность с аналогами?
Нажмите, чтобы раскрыть...

С аналогами - нет, с нативной реализацией - да. Есть performance test на каждую функцию, реализующую тот или иной алгоритм. Прирост разный, от 50% до 900%, в зависимости от реализуемой функции.

UbIvItS · 22 дек 2018

SadKo сказал(а): ↑

Прирост разный, от 50% до 900%, в зависимости от реализуемой функции.
Нажмите, чтобы раскрыть...

любопытный разброс. как ты можешь объяснить причину таких тормозов у натива?

SadKo · 22 дек 2018

UbIvItS сказал(а): ↑

любопытный разброс. как ты можешь объяснить причину таких тормозов у натива?
Нажмите, чтобы раскрыть...

Самый первый аргумент - потому что мы работаем с большими пачками данных, то есть обрабатываем массивы, а не отдельные значения.
Также всё зависит от скорости работы SIMD-инструкций. Какие-то работают эффективнее, какие-то менее эффективно. Где-то большой оверхед по инструкциям на применение вычислений по условию.
Где-то функции немного упрощены по сравнению с реализацией в стандартной библиотеке (не учитываются отдельные значения типа +inf, -inf, nan и т.д.), то есть проверка корректности данных делегируется вызывающему коду.
В последних моделях Intel и AMD, кстати, AVX-инструкции работают эффективнее SSE-инструкций.
На AMD Bulldozer/Piledriver было наоборот: только некоторые кейсы с AVX работали быстрее, на остальных провал по сравнению с SSE3-реализацией (в которой появились регистры xmm8-xmm15, и буст получался за счёт параллельного выполнения инструкций вычисления 8-элементных векторов, а не 4-элементных).
Методика тестирования: для каждого теста подгатавливаются выровненные в памяти буфера для входных данных и выходных данных. Входные данные заполняются случайными значениями в допустимом диапаазоне.
Далее для каждой функции запускается тест, который работает по следующему алгоритму:
- запускает внешний цикл, который собирает статистику и ограничивает время выполнения внутреннего цикла N секундами.
- запускает внутренний цикл, который выполняется M раз на одну итерацию внешнего цикла, и в каждой итерации вызывает одну и ту же функцию, передавая указатели на буферы входных и выходных данных.
После выполнения теста статистика собирается и преобразуется к удобочитаемому виду.

Пример без оверхеда по условным вычислениям. Нужно посчитать логарифмы.
Нативная реализация: https://github.com/sadko4u/lsp-plug...419a4754/include/dsp/arch/native/pmath.h#L249
Реализация на SSE2: https://github.com/sadko4u/lsp-plugins/blob/master/include/dsp/arch/x86/sse2/pmath/log.h
Реализация на AVX2 и AVX2+FMA3: https://github.com/sadko4u/lsp-plugins/blob/master/include/dsp/arch/x86/avx2/pmath/log.h
Реализация на ARM NEON: https://github.com/sadko4u/lsp-plugins/blob/master/include/dsp/arch/arm/neon-d32/pmath/log.h
Performance-тест: https://github.com/sadko4u/lsp-plugins/blob/master/src/test/ptest/dsp/pmath/log.cpp

Ну и результаты Performance-теста (отрывки):

Код (Text):

--------------------------------------------------------------------------------

LSP version: 1.1.5

--------------------------------------------------------------------------------

CPU information:

Architecture: x86_64

CPU string: AMD Ryzen 7 2700 Eight-Core Processor

CPU model: vendor=AMD, family=0x17, model=0x8

Features: FPU CMOV MMX FXSAVE SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 SSE4A XSAVE FMA3 AVX AVX2

--------------------------------------------------------------------------------

┌Case────────────────────────┬Time[s]┬────Iter┬Samp[s]┬─────Est┬─Perf[i/s]┬Cost[us/i]┬─Rel[%]┐

│native::logb1 x 4096 │ 5.02│ 112000│ 5.00│ 111649│ 22329.91│ 44.7830│ 100.00│

│sse2::logb1 x 4096 │ 5.00│ 747000│ 5.00│ 746700│ 149340.14│ 6.6961│ 668.79│

│avx2::x64_logb1 x 4096 │ 5.00│ 1097000│ 5.00│ 1096636│ 219327.27│ 4.5594│ 982.21│

│avx2::x64_logb1_fma3 x 4096 │ 5.00│ 1143000│ 5.00│ 1142456│ 228491.28│ 4.3765│1023.25│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logb2 x 4096 │ 5.01│ 113000│ 5.00│ 112842│ 22568.51│ 44.3095│ 100.00│

│sse2::logb2 x 4096 │ 5.00│ 790000│ 5.00│ 789686│ 157937.36│ 6.3316│ 699.81│

│avx2::x64_logb2 x 4096 │ 5.00│ 1182000│ 5.00│ 1181961│ 236392.25│ 4.2303│1047.44│

│avx2::x64_logb2_fma3 x 4096 │ 5.00│ 1248000│ 5.00│ 1247031│ 249406.31│ 4.0095│1105.11│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::loge1 x 4096 │ 5.03│ 124000│ 5.00│ 123184│ 24636.98│ 40.5894│ 100.00│

│sse2::loge1 x 4096 │ 5.01│ 749000│ 5.00│ 748214│ 149642.82│ 6.6826│ 607.39│

│avx2::x64_loge1 x 4096 │ 5.00│ 1099000│ 5.00│ 1098541│ 219708.29│ 4.5515│ 891.78│

│avx2::x64_loge1_fma3 x 4096 │ 5.00│ 1097000│ 5.00│ 1096590│ 219318.19│ 4.5596│ 890.20│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::loge2 x 4096 │ 5.03│ 123000│ 5.00│ 122298│ 24459.68│ 40.8836│ 100.00│

│sse2::loge2 x 4096 │ 5.01│ 792000│ 5.00│ 791205│ 158241.13│ 6.3195│ 646.95│

│avx2::x64_loge2 x 4096 │ 5.00│ 1190000│ 5.00│ 1189911│ 237982.34│ 4.2020│ 972.96│

│avx2::x64_loge2_fma3 x 4096 │ 5.00│ 1194000│ 5.00│ 1193091│ 238618.32│ 4.1908│ 975.56│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logd1 x 4096 │ 5.06│ 82000│ 5.00│ 81093│ 16218.80│ 61.6569│ 100.00│

│sse2::logd1 x 4096 │ 5.00│ 740000│ 5.00│ 739481│ 147896.32│ 6.7615│ 911.88│

│avx2::x64_logd1 x 4096 │ 5.00│ 1090000│ 5.00│ 1089058│ 217811.77│ 4.5911│1342.96│

│avx2::x64_logd1_fma3 x 4096 │ 5.00│ 1133000│ 5.00│ 1132365│ 226473.08│ 4.4155│1396.36│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logd2 x 4096 │ 5.01│ 82000│ 5.00│ 81812│ 16362.48│ 61.1154│ 100.00│

│sse2::logd2 x 4096 │ 5.00│ 784000│ 5.00│ 783723│ 156744.73│ 6.3798│ 957.95│

│avx2::x64_logd2 x 4096 │ 5.00│ 1185000│ 5.00│ 1184242│ 236848.42│ 4.2221│1447.51│

│avx2::x64_logd2_fma3 x 4096 │ 5.00│ 1239000│ 5.00│ 1238033│ 247606.72│ 4.0387│1513.26│

└────────────────────────────┴───────┴────────┴───────┴────────┴──────────┴──────────┴───────┘

--------------------------------------------------------------------------------

CPU information:

Architecture: x86_64

CPU string: AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

CPU model: vendor=AMD, family=0xf, model=0x6b

Features: FPU CMOV MMX SSE SSE2 SSE3

--------------------------------------------------------------------------------

┌Case─────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│native::logb1 x 4096 │ 5.04│ 27000│ 5.00│ 26795│ 5359.17│ 186.5961│100.00│

│sse2::logb1 x 4096 │ 5.01│ 142000│ 5.00│ 141836│ 28367.28│ 35.2519│529.32│

├─────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logb2 x 4096 │ 5.15│ 28000│ 5.00│ 27181│ 5436.35│ 183.9470│100.00│

│sse2::logb2 x 4096 │ 5.02│ 152000│ 5.00│ 151520│ 30304.04│ 32.9989│557.43│

├─────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::loge1 x 4096 │ 5.12│ 30000│ 5.00│ 29303│ 5860.62│ 170.6303│100.00│

│sse2::loge1 x 4096 │ 5.03│ 140000│ 5.00│ 139087│ 27817.59│ 35.9485│474.65│

├─────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::loge2 x 4096 │ 5.01│ 29000│ 5.00│ 28954│ 5790.93│ 172.6840│100.00│

│sse2::loge2 x 4096 │ 5.02│ 147000│ 5.00│ 146375│ 29275.03│ 34.1588│505.53│

├─────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logd1 x 4096 │ 5.09│ 20000│ 5.00│ 19628│ 3925.75│ 254.7286│100.00│

│sse2::logd1 x 4096 │ 5.01│ 142000│ 5.00│ 141792│ 28358.54│ 35.2627│722.37│

├─────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logd2 x 4096 │ 5.14│ 19000│ 5.00│ 18473│ 3694.76│ 270.6535│100.00│

│sse2::logd2 x 4096 │ 5.03│ 152000│ 5.00│ 151114│ 30222.94│ 33.0875│817.99│

└─────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘

--------------------------------------------------------------------------------

LSP version: 1.1.5

--------------------------------------------------------------------------------

CPU information:

Architecture: armv7a

CPU string: Cortex-A53

CPU model: vendor=0x41, architecture=7, variant=0, part=0xd03, revision=4

Features: HALF THUMB FAST_MULT VFP EDSP NEON VFPv3 TLS VFPv4 IDIVA IDIVT VFPD32 LPAE

--------------------------------------------------------------------------------

Statistics of performance test 'dsp.pmath.log':

┌Case───────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│native::logb1 x 4096 │ 5.50│ 11000│ 5.00│ 10003│ 2000.71│ 499.8217│100.00│

│neon_d32::logb1 x 4096 │ 5.02│ 76000│ 5.00│ 75738│ 15147.62│ 66.0170│757.11│

├───────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logb2 x 4096 │ 5.46│ 11000│ 5.00│ 10072│ 2014.59│ 496.3800│100.00│

│neon_d32::logb2 x 4096 │ 5.05│ 81000│ 5.00│ 80183│ 16036.67│ 62.3571│796.03│

├───────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::loge1 x 4096 │ 5.10│ 11000│ 5.00│ 10776│ 2155.29│ 463.9757│100.00│

│neon_d32::loge1 x 4096 │ 5.02│ 75000│ 5.00│ 74747│ 14949.54│ 66.8917│693.62│

├───────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::loge2 x 4096 │ 5.03│ 11000│ 5.00│ 10941│ 2188.35│ 456.9663│100.00│

│neon_d32::loge2 x 4096 │ 5.06│ 80000│ 5.00│ 79074│ 15814.85│ 63.2317│722.69│

├───────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logd1 x 4096 │ 5.09│ 9000│ 5.00│ 8839│ 1767.99│ 565.6131│100.00│

│neon_d32::logd1 x 4096 │ 5.02│ 75000│ 5.00│ 74751│ 14950.31│ 66.8883│845.61│

├───────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤

│native::logd2 x 4096 │ 5.03│ 9000│ 5.00│ 8951│ 1790.22│ 558.5911│100.00│

│neon_d32::logd2 x 4096 │ 5.00│ 79000│ 5.00│ 78970│ 15794.01│ 63.3152│882.24│

└───────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘

Пример с оверхедом по условным вычислениям. Нужно установить значение тона (Hue) цвета и прозрачности (Alpha) в зависимости от значения входного параметра.
Нативная реализация: https://github.com/sadko4u/lsp-plug...nclude/dsp/arch/native/graphics/effects.h#L17
Реализация на SSE2: https://github.com/sadko4u/lsp-plug...lude/dsp/arch/x86/sse2/graphics/effects.h#L81
Реализация на AVX2 и AVX2+FMA3: https://github.com/sadko4u/lsp-plug...ude/dsp/arch/x86/avx2/graphics/effects.h#L121
Реализация на ARM NEON: https://github.com/sadko4u/lsp-plug.../dsp/arch/arm/neon-d32/graphics/effects.h#L99
Performance-тест: https://github.com/sadko4u/lsp-plugins/blob/master/src/test/ptest/dsp/graphics/effects.cpp

Ну и результаты Performance-теста (отрывки):

Код (Text):

--------------------------------------------------------------------------------

CPU information:

Architecture: x86_64

CPU string: AMD Ryzen 7 2700 Eight-Core Processor

CPU model: vendor=AMD, family=0x17, model=0x8

Features: FPU CMOV MMX FXSAVE SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 SSE4A XSAVE FMA3 AVX AVX2

--------------------------------------------------------------------------------

┌Case────────────────────────────┬Time[s]┬─────Iter┬Samp[s]┬──────Est┬──Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│static::eff_hsla_hue x 4096 │ 5.04│ 645000│ 5.00│ 640159│ 128031.82│ 7.8106│115.41│

│native::eff_hsla_hue x 4096 │ 5.00│ 555000│ 5.00│ 554670│ 110934.19│ 9.0144│100.00│

│sse2::eff_hsla_hue x 4096 │ 5.02│ 1145000│ 5.00│ 1140338│ 228067.70│ 4.3847│205.59│

│avx2::x64_eff_hsla_hue x 4096 │ 5.00│ 1410000│ 5.00│ 1409742│ 281948.46│ 3.5467│254.16│

└────────────────────────────────┴───────┴─────────┴───────┴─────────┴───────────┴──────────┴──────┘

--------------------------------------------------------------------------------

CPU information:

Architecture: x86_64

CPU string: AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

CPU model: vendor=AMD, family=0xf, model=0x6b

Features: FPU CMOV MMX SSE SSE2 SSE3

--------------------------------------------------------------------------------

┌Case──────────────────────────┬Time[s]┬────Iter┬Samp[s]┬─────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│static::eff_hsla_hue x 4096 │ 5.15│ 155000│ 5.00│ 150626│ 30125.22│ 33.1948│108.21│

│native::eff_hsla_hue x 4096 │ 5.10│ 160000│ 5.00│ 156923│ 31384.70│ 31.8627│112.73│

│sse2::eff_hsla_hue x 4096 │ 5.03│ 140000│ 5.00│ 139199│ 27839.84│ 35.9198│100.00│

└──────────────────────────────┴───────┴────────┴───────┴────────┴──────────┴──────────┴──────┘

--------------------------------------------------------------------------------

CPU information:

Architecture: armv7a

CPU string: Cortex-A53

CPU model: vendor=0x41, architecture=7, variant=0, part=0xd03, revision=4

Features: HALF THUMB FAST_MULT VFP EDSP NEON VFPv3 TLS VFPv4 IDIVA IDIVT VFPD32 LPAE

--------------------------------------------------------------------------------

┌Case────────────────────────────┬Time[s]┬────Iter┬Samp[s]┬─────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│static::eff_hsla_hue x 4096 │ 5.35│ 55000│ 5.00│ 51445│ 10289.11│ 97.1902│111.10│

│native::eff_hsla_hue x 4096 │ 5.40│ 50000│ 5.00│ 46304│ 9260.87│ 107.9812│100.00│

│neon_d32::eff_hsla_hue x 4096 │ 5.09│ 185000│ 5.00│ 181792│ 36358.60│ 27.5038│392.60│

└────────────────────────────────┴───────┴────────┴───────┴────────┴──────────┴──────────┴──────┘

Полные логи теста во вложении.

Особенно интересно наблюдать, как бодаются Athlon 4400+ и Raspberry Pi 3B+ в плане SIMD-производительности.

UbIvItS · 22 дек 2018

SadKo, короче, весь секрет пролегает в правильном использование асма на векторных регистрах а под интел оптимазил?

SadKo · 22 дек 2018

UbIvItS сказал(а): ↑

SadKo, короче, весь секрет пролегает в правильном использование асма на векторных регистрах а под интел оптимазил?
Нажмите, чтобы раскрыть...

На интел примерно те же показатели, где-то чуть лучше, где-то чуть хуже. Но блок вещественной арифметики при меньших тактовых частотах даёт лучшую производительность на ядро, это факт.
В понедельник могу прогнать те же тесты на Core i5 и сюда кинуть результат.

SadKo · 24 дек 2018

Результаты тестов на Core i5, как обещал.

Код (Text):

--------------------------------------------------------------------------------

LSP version: 1.1.5

--------------------------------------------------------------------------------

CPU information:

Architecture: x86_64

CPU string: Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz

CPU model: vendor=Intel, family=0x6, model=0x5e

Features: FPU CMOV MMX FXSAVE SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 XSAVE FMA3 AVX AVX2

--------------------------------------------------------------------------------

┌Case────────────────────────┬Time[s]┬────Iter┬Samp[s]┬─────Est┬─Perf[i/s]┬Cost[us/i]┬─Rel[%]┐

│native::logb1 x 4096 │ 5.00│ 225000│ 5.00│ 224958│ 44991.70│ 22.2263│ 100.00│

│sse2::logb1 x 4096 │ 5.01│ 736000│ 5.00│ 735021│ 147004.25│ 6.8025│ 326.74│

│avx2::x64_logb1 x 4096 │ 5.00│ 1485000│ 5.00│ 1484246│ 296849.26│ 3.3687│ 659.79│

│avx2::x64_logb1_fma3 x 4096 │ 5.00│ 1545000│ 5.00│ 1544477│ 308895.41│ 3.2373│ 686.56│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logb2 x 4096 │ 5.02│ 214000│ 5.00│ 213108│ 42621.74│ 23.4622│ 100.00│

│sse2::logb2 x 4096 │ 5.01│ 766000│ 5.00│ 765153│ 153030.78│ 6.5346│ 359.04│

│avx2::x64_logb2 x 4096 │ 5.00│ 1574000│ 5.00│ 1573301│ 314660.23│ 3.1780│ 738.26│

│avx2::x64_logb2_fma3 x 4096 │ 5.00│ 1622000│ 5.00│ 1621427│ 324285.59│ 3.0837│ 760.85│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::loge1 x 4096 │ 5.01│ 295000│ 5.00│ 294492│ 58898.48│ 16.9784│ 100.00│

│sse2::loge1 x 4096 │ 5.00│ 709000│ 5.00│ 708318│ 141663.78│ 7.0590│ 240.52│

│avx2::x64_loge1 x 4096 │ 5.00│ 1397000│ 5.00│ 1396231│ 279246.36│ 3.5811│ 474.11│

│avx2::x64_loge1_fma3 x 4096 │ 5.00│ 1480000│ 5.00│ 1479668│ 295933.77│ 3.3791│ 502.45│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::loge2 x 4096 │ 5.00│ 304000│ 5.00│ 303898│ 60779.66│ 16.4529│ 100.00│

│sse2::loge2 x 4096 │ 5.01│ 753000│ 5.00│ 752085│ 150417.06│ 6.6482│ 247.48│

│avx2::x64_loge2 x 4096 │ 5.00│ 1507000│ 5.00│ 1506335│ 301267.02│ 3.3193│ 495.67│

│avx2::x64_loge2_fma3 x 4096 │ 5.00│ 1557000│ 5.00│ 1556885│ 311377.14│ 3.2115│ 512.30│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logd1 x 4096 │ 5.00│ 143000│ 5.00│ 142940│ 28588.10│ 34.9796│ 100.00│

│sse2::logd1 x 4096 │ 5.00│ 727000│ 5.00│ 726274│ 145254.95│ 6.8844│ 508.10│

│avx2::x64_logd1 x 4096 │ 5.00│ 1453000│ 5.00│ 1452972│ 290594.48│ 3.4412│1016.49│

│avx2::x64_logd1_fma3 x 4096 │ 5.00│ 1487000│ 5.00│ 1486092│ 297218.58│ 3.3645│1039.66│

├────────────────────────────┼───────┼────────┼───────┼────────┼──────────┼──────────┼───────┤

│native::logd2 x 4096 │ 5.03│ 141000│ 5.00│ 140285│ 28057.05│ 35.6417│ 100.00│

│sse2::logd2 x 4096 │ 5.00│ 729000│ 5.00│ 728900│ 145780.03│ 6.8597│ 519.58│

│avx2::x64_logd2 x 4096 │ 5.00│ 1504000│ 5.00│ 1503280│ 300656.11│ 3.3261│1071.59│

│avx2::x64_logd2_fma3 x 4096 │ 5.00│ 1549000│ 5.00│ 1548845│ 309769.09│ 3.2282│1104.07│

└────────────────────────────┴───────┴────────┴───────┴────────┴──────────┴──────────┴───────┘

┌Case────────────────────────────┬Time[s]┬─────Iter┬Samp[s]┬──────Est┬──Perf[i/s]┬Cost[us/i]┬Rel[%]┐

│static::eff_hsla_hue x 4096 │ 5.02│ 590000│ 5.00│ 587100│ 117420.16│ 8.5164│100.00│

│native::eff_hsla_hue x 4096 │ 5.02│ 615000│ 5.00│ 612775│ 122555.03│ 8.1596│104.37│

│sse2::eff_hsla_hue x 4096 │ 5.01│ 905000│ 5.00│ 902356│ 180471.33│ 5.5410│153.70│

│avx2::x64_eff_hsla_hue x 4096 │ 5.01│ 1890000│ 5.00│ 1886800│ 377360.07│ 2.6500│321.38│

└────────────────────────────────┴───────┴─────────┴───────┴─────────┴───────────┴──────────┴──────┘

Полные логи в аттаче.

Войти или зарегистрироваться

LSP Plugins: версия 1.1.4

SadKo Владимир Садовников

Rel Well-Known Member

SadKo Владимир Садовников

SadKo Владимир Садовников

UbIvItS Well-Known Member

SadKo Владимир Садовников

UbIvItS Well-Known Member

SadKo Владимир Садовников

Вложения:

test-results.zip

UbIvItS Well-Known Member

SadKo Владимир Садовников

SadKo Владимир Садовников

Вложения:

core-i5-6400.log.zip

Войти или зарегистрироваться

LSP Plugins: версия 1.1.4

SadKo Владимир Садовников

Rel Well-Known Member

SadKo Владимир Садовников

SadKo Владимир Садовников

UbIvItS Well-Known Member

SadKo Владимир Садовников

UbIvItS Well-Known Member

SadKo Владимир Садовников

Вложения:

test-results.zip

UbIvItS Well-Known Member

SadKo Владимир Садовников

SadKo Владимир Садовников

Вложения:

core-i5-6400.log.zip

Быстрый поиск