CUDA Userspace: OpenCV, OpenGL, EGL, TensorRT, VPI
Stock JetPack ships these libraries without their CUDA and NVIDIA backends enabled. This document covers the gap, the fix, and the verification for the Jetson Orin NX 16GB on L4T R36.4.3 / JetPack 6.2; it is aimed at whoever maintains the image’s GPU userspace. The stack is verified on the reference device (14/14 verify_opengl_cuda.sh checks).
Related: AV_STACK.md (Phase 5 contents), SAMPLES.md (run commands), ZEDX_METIS_CPP.md (C++ samples that exercise this stack live: CUDA preprocessing plus NVENC recording via --record), BENCHMARKS.md (measured rates), TROUBLESHOOTING.md P-2 through P-6.
The problem
| Library | Stock state | Symptom |
|---|---|---|
python3-opencv (apt) | No CUDA, no cuDNN, no GStreamer | cv2.cuda.getCudaEnabledDeviceCount() == 0; DNN runs on CPU |
libGL.so / libEGL.so | Mesa software path | OpenGL renderer is llvmpipe, no GPU offload |
| TensorRT | Installed but unused | Models compile but never get exercised |
| VPI 3.x | Installed but unused | PVA and ISP backends never selected |
| cuDNN | Installed but version not pinned | OpenCV linked against the wrong cuDNN soname, import fails |
Without the fix, vision code that should run on the Ampere GPU falls back to the A78AE CPU at 10 to 30 times the latency, which is unacceptable for real-time perception.
What this build provides
| Detail | Value |
|---|---|
| OpenCV version | 4.10.0 (pinned via OPENCV_VERSION env) |
| CUDA version | 12.6 (L4T R36.4.3 system CUDA; do not pip-install a different version) |
| cuDNN | libcudnn9, cuDNN 9.3 (L4T R36.4.3 / JetPack 6.2; verify: dpkg -l \| grep libcudnn9-cuda-12) |
| CUDA arch | CUDA_ARCH_BIN=8.7 (sm_87, Orin NX / Nano Ampere GPU) |
| PTX | none (CUDA_ARCH_PTX=""): faster startup, target-locked binary |
| Build location | on-device; build_opencv_cuda.sh runs on the Jetson |
| Build time | ~45 to 60 min on Orin NX (on-device, JOBS=$(nproc)) |
| Cache | .deb written to /opt/opencv-cache/; units 2 to N install in seconds |
The CUDA arch flag 8.7 is verified for Orin NX / Nano (GA10B, Ampere): see VERIFICATION_REPORT.md section 1.6. Use 8.6 for AGX Orin (GA10x) and 7.2 for Xavier NX (Volta).
The fix: three scripts
build_opencv_cuda.sh
Builds OpenCV from source with the correct CMake flags and installs to /usr/local. The result is cached as a .deb at /opt/opencv-cache/, so re-flashing N units does not rebuild OpenCV N times: units 2 to N install from the cache.
CMake flags that matter:
-D WITH_CUDA=ON
-D WITH_CUDNN=ON
-D OPENCV_DNN_CUDA=ON
-D ENABLE_FAST_MATH=ON
-D CUDA_FAST_MATH=ON
-D WITH_CUBLAS=ON
-D CUDA_ARCH_BIN=8.7 # Orin = sm_87
-D CUDA_ARCH_PTX="" # no PTX (faster startup, target-locked binary)
-D WITH_GSTREAMER=ON
-D WITH_NVCUVID=ON # GPU video decode
-D WITH_FFMPEG=ON
-D WITH_OPENGL=ON
-D BUILD_opencv_python3=ON
-D PYTHON3_EXECUTABLE=/opt/av-env/bin/python # installs into the AV venv
-D OPENCV_GENERATE_PKGCONFIG=ON
-D OPENCV_ENABLE_NONFREE=ON
Build time is ~45 to 60 min on Orin. The first flash takes the hit; subsequent flashes pull the .deb and install in seconds.
Override knobs (env):
OPENCV_VERSION=4.10.0 # default
CUDA_ARCH_BIN=8.7 # 8.7 for Orin; 8.6 for AGX Orin; 7.2 for Xavier NX
JOBS=6 # default $(nproc)
WORK_DIR=/var/tmp/opencv-build
CACHE_DIR=/opt/opencv-cache
verify_opengl_cuda.sh
A read-only verifier that confirms the CUDA, OpenGL, GLES, TensorRT, VPI, cuDNN, and OpenCV stack is wired up correctly. It runs at first boot and as part of make verify.
Status note (updated 2026-06-10): the OpenCV and Python checks depend on /opt/av-env, which jetson-first-boot provisions only when the device has internet. The reference device has since completed online provisioning, with two gotchas worth knowing: (1) JetPack userspace (CUDA/cuDNN/TensorRT/VPI) is not part of the flashed image and must be installed on-device with sudo apt install nvidia-jetpack before anything CUDA works (TROUBLESHOOTING P-3); (2) the Voyager pip step used to clobber the cu126 torch wheel with a CPU-only PyPI build (TROUBLESHOOTING P-2), fixed in jetson_first_boot.sh.
Checks (all pre/post-gated by the verify framework):
| Check | What it confirms |
|---|---|
libEGL_nvidia.so.0 and related libs | NVIDIA EGL/GLES libraries installed |
eglinfo | EGL display reports NVIDIA, not Mesa |
glxinfo renderer | OpenGL renderer is NVIDIA / Tegra / Ampere |
nvcc --version | CUDA toolkit installed |
nvidia-smi or Tegra devfreq | GPU detectable |
| CUDA probe binary | nvcc compiles it, it runs, and it reports sm_87 |
trtexec | TensorRT runtime present |
pkg-config vpi | VPI installed |
libcudnn9 | cuDNN present |
cv2.cuda.getCudaEnabledDeviceCount() > 0 | OpenCV-CUDA active |
cv2.getBuildInformation() contains CUDA | Build is genuine, not a stock package |
install_av_phase5.sh
The Phase 5 orchestrator, installed and verified live on the reference device (2026-06-11; see AV_STACK.md). It runs each step with a pre-check and post-check:
- build_opencv_cuda.sh (or pulls from cache)
- verify_opengl_cuda.sh
- install_av_stack.sh (ROS 2 Humble, Isaac ROS, cuVSLAM, nvblox, Nav2)
- Installs
jetson-av-mission.service
See AV_STACK.md for the contents of step 3.
Verify on a flashed device
Steps 3 and 4 require /opt/av-env, which the first-boot service provisions once the device has internet. Provisioning is complete on the reference device; on a fresh unit whose first boot ran offline, run steps 3 and 4 after provisioning completes.
# 1. Headers found?
ls /usr/local/include/opencv4/opencv2/core.hpp
# 2. Library installed and linked?
ldconfig -p | grep libopencv_core
pkg-config --modversion opencv4
# 3. Python import + CUDA?
/opt/av-env/bin/python -c "
import cv2
print('OpenCV version:', cv2.__version__)
print('CUDA devices :', cv2.cuda.getCudaEnabledDeviceCount())
build = cv2.getBuildInformation()
for line in build.split('\n'):
if 'CUDA' in line or 'cuDNN' in line or 'GStreamer' in line:
print(line)
"
# 4. End-to-end DNN on GPU
/opt/av-env/bin/python <<'EOF'
import cv2, numpy as np
net = cv2.dnn.readNet("/path/to/yolo.onnx")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
img = np.zeros((640, 640, 3), dtype=np.uint8)
blob = cv2.dnn.blobFromImage(img, 1/255.0, (640, 640), swapRB=True, crop=False)
net.setInput(blob)
_ = net.forward()
print("OK: DNN_CUDA forward succeeded")
EOF
# 5. OpenGL / EGL
glxinfo | grep -E 'OpenGL renderer|OpenGL version'
eglinfo | grep -E 'EGL vendor|EGL version'
Troubleshooting
cv2.cuda.getCudaEnabledDeviceCount() == 0 after import cv2
Either OpenCV was rebuilt without -D WITH_CUDA=ON or the device’s CUDA runtime is incompatible. Confirm the build configuration:
/opt/av-env/bin/python -c "import cv2; print(cv2.getBuildInformation())" \
| grep -A2 'CUDA'
If it reports CUDA: NO, re-run build_opencv_cuda.sh.
ImportError: libcudnn.so.9: cannot open shared object file
cuDNN is missing. Install the L4T-shipped version that matches CUDA 12.6:
sudo apt install libcudnn9-cuda-12
glxinfo reports llvmpipe (Mesa) instead of NVIDIA
nvidia-l4t-3d-core is missing. Reinstall it:
sudo apt install --reinstall nvidia-l4t-3d-core
nvcc not found
The CUDA toolkit is not on PATH. JetPack 6.2 installs it under /usr/local/cuda-12.6/. Ensure /etc/profile.d/cuda.sh adds it to PATH, or set it manually:
export PATH=/usr/local/cuda-12.6/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH
Build runs out of disk
The default WORK_DIR=/var/tmp/opencv-build consumes ~3 GB. Check df -h /var before invoking, or point WORK_DIR at a volume with ~3 GB free:
WORK_DIR=/path/elsewhere bash scripts/build_opencv_cuda.sh
Build runs out of RAM
OpenCV with the CUDA contrib modules needs ~6 GB at peak. Lower parallelism:
JOBS=2 bash scripts/build_opencv_cuda.sh
GStreamer pipelines using nvarguscamerasrc fail to negotiate
ZED X uses nvarguscamerasrc, which produces NVMM buffers and needs the camera overlay loaded. Verify:
v4l2-ctl --list-devices # ZED X must appear
gst-launch-1.0 nvarguscamerasrc num-buffers=1 ! fakesink
If the device list is empty, the DTBO did not apply. See DRIVERS.md section 1.3. The ZED X modules are built in-tree, the overlay is staged by the repo, and end-to-end capture is verified live on the reference device (29.5 FPS stereo + CUDA depth; DRIVERS.md section 1.5), so an empty device list points at the overlay or daemon setup on your unit, not the driver stack.
Future work and known gaps
- OpenCL on Tegra: not yet enabled. Some applications prefer OpenCL over CUDA for portability. Add
-D WITH_OPENCL=ONand the Mali OpenCL libraries if you need it. - VPI sample suite: VPI ships with samples that serve as smoke tests but require a manual run today. Add them to verify_opengl_cuda.sh to automate.
- DeepStream 7.x: not installed by Phase 5. For multi-stream pipeline composition, install it manually:
sudo apt install deepstream-7.0. - TensorRT model compilation: models live in
/opt/jetson-av/models/. Pre-compile them to.enginefiles at bake time for the fastest cold start. This is currently a manual step.