I have been working on a RAM-only optimized k32 plotter that is lock-free and scales linearly with each thread. It is extremely cache friendly, and as you can see from the htop output, threads spend barely any time in kernel mode. There is a bit more room for improvement. Moving on to Phase 3 now with an optimization plan already in hand.
The video shown is running on an 64-thread ARM CPU.
Fx calculation is not using any SIMD. I did not see any significant gain (sometimes regression) on the NEON versions of blake3 for the way it is used in the plotting mechanism.
57 views
2588
1009
4 weeks ago 00:05:13 1
🔶ПИРОГ (ЗАПЕКАНКА) С КОНСЕРВИРОВАННОЙ РЫБОЙ 🔶
4 weeks ago 00:18:41 1
ПОДСТАВИЛИ ПОЛ СЕРВЕРА | MTA PROVINCE
4 weeks ago 00:04:11 1
ДОБАВЬТЕ МАНКУ В ЯБЛОКИ ! НИКТО НЕ ВЕРИТ ЧТО Я ГОТОВЛЮ ИХ ТАК ПРОСТО!..
4 weeks ago 00:05:38 1
Как запечь говядину в духовке, что мясо было мягким.
4 weeks ago 00:10:40 1
Как БЫСТРО сесть на продольный ШПАГАТ / Видео урок