pccx vision-v001 — CNN inference track on KV260¶

The vision-v001 line is a parallel product line on the same KV260 substrate, scoped to vision workloads (CNN-class classification, object detection). It shares the W4A8 NPU substrate with the LLM line but covers a distinct workload family. Active code is not yet present in this docs site — this page is a placeholder while the upstream RTL repository stabilises.

Working state¶

working repository name: pccx-vision-v001, public URL TBD
working staging repository: pccx-vision-v001-staging (private)
shared substrate with v002 — same KV260 board, same W4A8 weight × activation ratio, same L2 URAM organisation
distinct dataflow — dense-conv tile reuse instead of token-by-token KV streaming; the GEMM systolic + GEMV hybrid is reused for conv
first model candidates — ResNet18, YOLOv8n, MobileNetV3 (smallest-footprint variants first; choice is driven by the L2 URAM and DSP budget on KV260)

Position relative to existing FPGA vision IPs¶

The KV260 vision landscape today is occupied by two large families of IP: vendor-supplied DPU IP (INT8-only) and arbitrary-bit streaming-dataflow accelerators driven by quantisation-aware training (QAT). The vision-v001 track positions itself differently from both, on three axes.

Table 2 vision-v001 differentiation matrix (landscape context, not a benchmark)¶
Axis	Commodity DPU IP (INT8-only)	Streaming-dataflow QAT (FINN-class)	vision-v001 (this track)
Quantisation	INT8 weights / INT8 activations	arbitrary 2–8 bit, QAT	W4A8 — INT4 weight × INT8 act
Dataflow	heterogeneous micro-coded PE mix	per-layer streaming dataflow	unified GEMM systolic + GEMV
Reuse with LLM line	none	none	same RTL substrate as v002
Per-layer efficiency	reported 5–9× variance across stages	high but model-specific build	designed for tile-reuse uniformity
Source	vendor reference	open-source (e.g. FINN), QAT toolchain	open-source, KV260-native

The differentiation is posture, not a benchmark — no FPS, latency, power, or accuracy figure is claimed on this page until the release evidence checklist gates it in. References to existing vendor / QAT IPs are landscape context only.

Reference points from public benchmarks¶

The figures below are publicly reported benchmarks of the two families above. They are not vision-v001 results; they exist solely to anchor the differentiation matrix in numbers external readers can verify.

Vendor DPU IP on KV260 (Xilinx DPUCZDX8G B4096) — MobileNet V1 ≈ 187 FPS / 5.3 ms latency; ResNet-50 ≈ 62 FPS / 16.1 ms latency. Per-layer efficiency varies between input layers (≈ 54–259 GOP/s) and mid-stage 3 × 3 convolutions (≈ 470–500 GOP/s) — a 5–9 × spread on the same model.
Streaming-dataflow QAT (FINN) — ResNet-50 ≈ 2 000 FPS at ≈ 70 W on Alveo U250 (~400 K LUTs); the same approach scaled down to KV260 (~256 K LUTs) typically lands 2–4 × lower.
W4A8 vision regime — INT8 dominates the KV260 model zoo; sub-4-bit (binary / ternary / 2-bit) work appears under the streaming-dataflow line. The W4A8 weight × activation regime that pccx v002 uses for LLMs has not yet been published as a KV260 vision benchmark.

These reference points anchor the differentiation matrix; they do not constitute a benchmark of vision-v001 itself, and no vision-v001 figure is claimed on this docs site until the release evidence checklist gates it in.

Status snapshot¶

Table 3 vision-v001 layer status (placeholder)¶
Layer	State
RTL	not yet committed in the vision-v001 repository
Driver / runtime	scope TBD; conv-specific path likely diverges from v002
Verification	inherits the v002 evidence checklist; per-model golden vectors TBD
FPS claim	none — TBD
mAP / Top-1 claim	none — TBD
Bitstream	none — TBD

This page will be expanded as the upstream RTL repository lands and the first model is profiled.

pccx vision-v001 — CNN inference track on KV260¶

Working state¶

Position relative to existing FPGA vision IPs¶

Reference points from public benchmarks¶

Status snapshot¶

See also¶