fix(quantize): Overhaul INT8 static quantization to use QDQ format, unlocking TensorRT/NPU hardware acceleration by jay7-tech · Pull Request #307 · opencv/opencv_zoo

jay7-tech · 2026-03-04T15:51:12Z

Here Problem

When generating INT8 ONNX models using the quantize-ort.py script, edge hardware accelerators (TensorRT, TIM-VX, CUDA NPUs) silently fallback to scalar CPU execution.

The script explicitly forces the legacy QuantFormat.QOperator. This permanently blocks dynamic operator fusion on modern execution providers.

Solution

QDQ (QuantizeLinear/DequantizeLinear) is currently the industry-standard layout required for these accelerators to compile fused INT8 engines. I refactored the pipeline to dynamically map and output QDQ tensor graphs (quant_format=QuantFormat.QDQ), which instantly unblocks native hardware acceleration for edge endpoints.

…rch64 hardware to resolve quantization paradox

…nlocking TensorRT/NPU hardware acceleration

jay7-tech added 2 commits March 4, 2026 20:07

fix(benchmark): Auto-route INT8 precision to NPU/TIM-VX backend on aa…

d69cf6d

…rch64 hardware to resolve quantization paradox

fix(quantize): Overhaul INT8 static quantization to use QDQ format, u…

0b543a5

…nlocking TensorRT/NPU hardware acceleration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantize): Overhaul INT8 static quantization to use QDQ format, unlocking TensorRT/NPU hardware acceleration#307

fix(quantize): Overhaul INT8 static quantization to use QDQ format, unlocking TensorRT/NPU hardware acceleration#307
jay7-tech wants to merge 2 commits intoopencv:mainfrom
jay7-tech:feature/gsoc-qdq-hardware-accel

jay7-tech commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jay7-tech commented Mar 4, 2026

Here Problem

Solution

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant