Skip to content

fix(quantize): Overhaul INT8 static quantization to use QDQ format, unlocking TensorRT/NPU hardware acceleration#307

Open
jay7-tech wants to merge 2 commits intoopencv:mainfrom
jay7-tech:feature/gsoc-qdq-hardware-accel
Open

fix(quantize): Overhaul INT8 static quantization to use QDQ format, unlocking TensorRT/NPU hardware acceleration#307
jay7-tech wants to merge 2 commits intoopencv:mainfrom
jay7-tech:feature/gsoc-qdq-hardware-accel

Conversation

@jay7-tech
Copy link

Here Problem

When generating INT8 ONNX models using the quantize-ort.py script, edge hardware accelerators (TensorRT, TIM-VX, CUDA NPUs) silently fallback to scalar CPU execution.

The script explicitly forces the legacy QuantFormat.QOperator. This permanently blocks dynamic operator fusion on modern execution providers.

Solution

QDQ (QuantizeLinear/DequantizeLinear) is currently the industry-standard layout required for these accelerators to compile fused INT8 engines. I refactored the pipeline to dynamically map and output QDQ tensor graphs (quant_format=QuantFormat.QDQ), which instantly unblocks native hardware acceleration for edge endpoints.

…rch64 hardware to resolve quantization paradox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant