V3 Audio offloading with RDMA Principles - Aimed at extending offloading capacities to SiMD, Coral EdgeTPU, NPU & GPU 2025
Audio offloading with RDMA Principles - Aimed at extending offloading capacities to SiMD, Coral EdgeTPU, NPU & GPU 2025 (c)RS
Part of the JIT Compiler Dongle works, Latency is an issue, Using the GPU & CPU & Dongles such as Coral.AI EdgeTPU & Intel Movidius for Mice, Camera's, Displays through HDMI & VESA DP,..
The future of device offloading is arranged here.:
The primary goal is to Enable Low Latency Hardware Offloading to enable software stacks like Bluetooth, Codecs & Device Drivers & Firmware, Dolby Audio, DTS, Realtek, Creative Logic & so on..
The point of the GPU / Processor offloading for Audio & Video & 3D Content such as Chrome & OS is that latency should be low,..
Primary compatibility with OS Stack { DirectX, Khronos, OpenCL, Direct Compute & Storage API }
P2P must not consume more than 5% of the total resource usage of the package,..
IO & DMA must be fast,..
The package should load & unload & process fast!
The core Module Kernel shall therefore be fast & small
Package core groups
Offloaded Processing 'KC'
Processing with endline automated returns from DirectCompute/OpenCL Kernel Package
The package looks as follows
Diagram 'KC'
(Kernel Module) Static Code Object Device work package = KSCOD = 'KC'
<<< Data flow >>>
Device + RAM : DMA : 'KC' CPU / Motherboard-IP Core 'KC' : DMA : Device + RAM 'KC' (Kernel) Static Code Object Device work package
If you are familiar with RDMA the package is:
Automated DMA to DMA with direct read & write from RAM Arrays,
Signalling is IO & DMA in cache pointers with P2P Pointer tables & Ethernet addressing & Direct unloading & loading on ETA or time of arrival..
Internal device ALU / DMA Cache frame & RAM usage/Storage buffer frame cache.
Rupert S
*
Offloading, How you run it:
Auto Device configurations should be easy!..
Automate common BitDepth & Buffer Length for quality &or the number of Hardware Accelerators
Auto should pick quality first, I mean 16Bit is not everybody's tea! & Dolby Atmos & DTS Default to 16Bit!,..
But if they did, device selection will be easy!
The scenario is that CPU/NPU/GPU/USB are not the only options for offloading,
Offloading latency & error statistics have to be cached, So if the option is auto,..
Automatic Device Selection & Multiplication
Selection of the BitDepth {64Bit, 32Bit, 24Bit, 16Bi} lead to a device flow path with advice in Control Panels,..
Example Offload options (there are more)
https://www.w3.org/TR/webcodecs/#definitions
https://www.w3.org/TR/webgpu/#intro
https://www.w3.org/TR/webnn/#intro
https://www.phoronix.com/news/Device-Memory-TCP-TX-Linux-6.16
https://www.phoronix.com/news/IO_uring-ZCRX-DMA-BUF
Rupert S
