首页 最新 热门 推荐

  • 首页
  • 最新
  • 热门
  • 推荐

Linux 36.3 + JetPack v6.0@jetson-inference之目标检测

  • 25-02-20 11:20
  • 2230
  • 8998
blog.csdn.net

Linux 36.3 + JetPack v6.0@jetson-inference之目标检测

  • 1. 源由
  • 2. detectnet
    • 2.1 命令选项
    • 2.2 下载模型
    • 2.3 操作示例
      • 2.3.1 单张照片
      • 2.3.2 多张照片
      • 2.3.3 视频
  • 3. 代码
    • 3.1 Python
    • 3.2 C++
  • 4. 参考资料

1. 源由

从应用角度来说,目标检测是计算机视觉里面第二个重要环节。之前的识别示例输出了表示整个输入图像的类别概率。接下来,将专注于目标检测,通过提取边界框来找到帧中各种目标的位置。与图像分类不同,目标检测网络能够在每帧中检测到多个不同的目标。

2. detectnet

detectNet对象接受图像作为输入,并输出检测到的边界框坐标列表以及它们的类别和置信度值。detectNet可以在Python和C++中使用。请参阅下面可供下载的各种预训练检测模型。默认使用的模型是基于MS COCO数据集训练的91类SSD-Mobilenet-v2模型,该模型在Jetson上结合TensorRT实现了实时推理性能。

2.1 命令选项

$ detectnet --help
usage: detectnet [--help] [--network=NETWORK] [--threshold=THRESHOLD] ...
                 input [output]

Locate objects in a video/image stream using an object detection DNN.
See below for additional arguments that may not be shown above.

positional arguments:
    input           resource URI of input stream  (see videoSource below)
    output          resource URI of output stream (see videoOutput below)

detectNet arguments:
  --network=NETWORK     pre-trained model to load, one of the following:
                            * ssd-mobilenet-v1
                            * ssd-mobilenet-v2 (default)
                            * ssd-inception-v2
                            * peoplenet
                            * peoplenet-pruned
                            * dashcamnet
                            * trafficcamnet
                            * facedetect
  --model=MODEL         path to custom model to load (caffemodel, uff, or onnx)
  --prototxt=PROTOTXT   path to custom prototxt to load (for .caffemodel only)
  --labels=LABELS       path to text file containing the labels for each class
  --input-blob=INPUT    name of the input layer (default is 'data')
  --output-cvg=COVERAGE name of the coverage/confidence output layer (default is 'coverage')
  --output-bbox=BOXES   name of the bounding output layer (default is 'bboxes')
  --mean-pixel=PIXEL    mean pixel value to subtract from input (default is 0.0)
  --confidence=CONF     minimum confidence threshold for detection (default is 0.5)
  --clustering=CLUSTER  minimum overlapping area threshold for clustering (default is 0.75)
  --alpha=ALPHA         overlay alpha blending value, range 0-255 (default: 120)
  --overlay=OVERLAY     detection overlay flags (e.g. --overlay=box,labels,conf)
                        valid combinations are:  'box', 'lines', 'labels', 'conf', 'none'
  --profile             enable layer profiling in TensorRT

objectTracker arguments:
  --tracking               flag to enable default tracker (IOU)
  --tracker=TRACKER        enable tracking with 'IOU' or 'KLT'
  --tracker-min-frames=N   the number of re-identified frames for a track to be considered valid (default: 3)
  --tracker-drop-frames=N  number of consecutive lost frames before a track is dropped (default: 15)
  --tracker-overlap=N      how much IOU overlap is required for a bounding box to be matched (default: 0.5)

videoSource arguments:
    input                resource URI of the input stream, for example:
                             * /dev/video0               (V4L2 camera #0)
                             * csi://0                   (MIPI CSI camera #0)
                             * rtp://@:1234              (RTP stream)
                             * rtsp://user:pass@ip:1234  (RTSP stream)
                             * webrtc://@:1234/my_stream (WebRTC stream)
                             * file://my_image.jpg       (image file)
                             * file://my_video.mp4       (video file)
                             * file://my_directory/      (directory of images)
  --input-width=WIDTH    explicitly request a width of the stream (optional)
  --input-height=HEIGHT  explicitly request a height of the stream (optional)
  --input-rate=RATE      explicitly request a framerate of the stream (optional)
  --input-save=FILE      path to video file for saving the input stream to disk
  --input-codec=CODEC    RTP requires the codec to be set, one of these:
                             * h264, h265
                             * vp8, vp9
                             * mpeg2, mpeg4
                             * mjpeg
  --input-decoder=TYPE   the decoder engine to use, one of these:
                             * cpu
                             * omx  (aarch64/JetPack4 only)
                             * v4l2 (aarch64/JetPack5 only)
  --input-flip=FLIP      flip method to apply to input:
                             * none (default)
                             * counterclockwise
                             * rotate-180
                             * clockwise
                             * horizontal
                             * vertical
                             * upper-right-diagonal
                             * upper-left-diagonal
  --input-loop=LOOP      for file-based inputs, the number of loops to run:
                             * -1 = loop forever
                             *  0 = don't loop (default)
                             * >0 = set number of loops

videoOutput arguments:
    output               resource URI of the output stream, for example:
                             * file://my_image.jpg       (image file)
                             * file://my_video.mp4       (video file)
                             * file://my_directory/      (directory of images)
                             * rtp://:1234    (RTP stream)
                             * rtsp://@:8554/my_stream   (RTSP stream)
                             * webrtc://@:1234/my_stream (WebRTC stream)
                             * display://0               (OpenGL window)
  --output-codec=CODEC   desired codec for compressed output streams:
                            * h264 (default), h265
                            * vp8, vp9
                            * mpeg2, mpeg4
                            * mjpeg
  --output-encoder=TYPE  the encoder engine to use, one of these:
                            * cpu
                            * omx  (aarch64/JetPack4 only)
                            * v4l2 (aarch64/JetPack5 only)
  --output-save=FILE     path to a video file for saving the compressed stream
                         to disk, in addition to the primary output above
  --bitrate=BITRATE      desired target VBR bitrate for compressed streams,
                         in bits per second. The default is 4000000 (4 Mbps)
  --headless             don't create a default OpenGL GUI window

logging arguments:
  --log-file=FILE        output destination file (default is stdout)
  --log-level=LEVEL      message output threshold, one of the following:
                             * silent
                             * error
                             * warning
                             * success
                             * info
                             * verbose (default)
                             * debug
  --verbose              enable verbose logging (same as --log-level=verbose)
  --debug                enable debug logging   (same as --log-level=debug)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115

注:关于照片、视频等基本操作,详见: 《Linux 36.3 + JetPack v6.0@jetson-inference之视频操作》

2.2 下载模型

两种方式:

  1. 创建 对象时,初始化会自动下载
  2. 通过手动将模型文件放置到data/networks/目录下

国内,由于“墙”的存在,对于我们这种处于起飞阶段的菜鸟来说就是“障碍”。有条件的朋友可以参考《apt-get通过代理更新系统》进行设置网络。

不过,NVIDIA还是很热心的帮助我们做了“Work around”,所有的模型都已经预先存放在中国大陆能访问的位置:Github - model-mirror-190618

  --network=NETWORK     pre-trained model to load, one of the following:
                            * ssd-mobilenet-v1
                            * ssd-mobilenet-v2 (default)
                            * ssd-inception-v2
                            * peoplenet
                            * peoplenet-pruned
                            * dashcamnet
                            * trafficcamnet
                            * facedetect
  --model=MODEL         path to custom model to load (caffemodel, uff, or onnx)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

根据以上Model方面信息,该命令支持:

  • ssd-mobilenet-v1
  • ssd-mobilenet-v2 (default)
  • ssd-inception-v2
  • peoplenet
  • peoplenet-pruned
  • dashcamnet
  • trafficcamnet
  • facedetect
  • 支持定制模型(需要用到通用的模型文件caffemodel, uff, or onnx)

作为示例,就下载一个SSD-Mobilenet-v2(default)模型

$ mkdir model-mirror-190618
$ cd model-mirror-190618
$ wget https://github.com/dusty-nv/jetson-inference/releases/download/model-mirror-190618/SSD-Mobilenet-v2.tar.gz
$ tar -zxvf SSD-Mobilenet-v2.tar.gz -C ../data/networks
$ cd ..
  • 1
  • 2
  • 3
  • 4
  • 5

注:这个模型文件下载要注意,将解压缩文件放置到SSD-Mobilenet-v2目录下。

2.3 操作示例

$ cd build/aarch64/bin/
  • 1

2.3.1 单张照片

# C++
$ ./detectnet --network=ssd-mobilenet-v2 images/peds_0.jpg images/test/output_detectnet_cpp.jpg
  • 1
  • 2
# Python
$ ./detectnet.py --network=ssd-mobilenet-v2 images/peds_0.jpg images/test/output_detectnet_python.jpg
  • 1
  • 2

本次CPP和Python执行概率结果一致,不像imagenet有差异。

在这里插入图片描述

2.3.2 多张照片

# C++
$ ./detectnet "images/peds_*.jpg" images/test/peds_output_detectnet_cpp_%i.jpg
  • 1
  • 2
# Python
$ ./detectnet.py "images/peds_*.jpg" images/test/peds_output_detectnet_python_%i.jpg
  • 1
  • 2

注:多张图片这里就不再放出了,感兴趣的朋友下载代码,本地运行一下即可。

2.3.3 视频

# Download test video
wget https://nvidia.box.com/shared/static/veuuimq6pwvd62p9fresqhrrmfqz0e2f.mp4 -O pedestrians.mp4
  • 1
  • 2
# C++
$ ./detectnet ../../../pedestrians.mp4 images/test/pedestrians_ssd_detectnet_cpp.mp4
  • 1
  • 2
# Python
$ ./detectnet.py ../../../pedestrians.mp4 images/test/pedestrians_ssd_detectnet_python.mp4
  • 1
  • 2

pedestrians

3. 代码

3.1 Python

Import statements
├── import sys
├── import argparse
├── from jetson_inference import detectNet
└── from jetson_utils import videoSource, videoOutput, Log

Command-line argument parsing
├── Create ArgumentParser
│   ├── description: "Locate objects in a live camera stream using an object detection DNN."
│   ├── formatter_class: argparse.RawTextHelpFormatter
│   └── epilog: detectNet.Usage() + videoSource.Usage() + videoOutput.Usage() + Log.Usage()
├── Add arguments
│   ├── input: "URI of the input stream"
│   ├── output: "URI of the output stream"
│   ├── --network: "pre-trained model to load (default: 'ssd-mobilenet-v2')"
│   ├── --overlay: "detection overlay flags (default: 'box,labels,conf')"
│   └── --threshold: "minimum detection threshold to use (default: 0.5)"
└── Parse arguments
    ├── args = parser.parse_known_args()[0]
    └── Exception handling
        ├── print("")
        └── parser.print_help()
        └── sys.exit(0)

Create video sources and outputs
├── input = videoSource(args.input, argv=sys.argv)
└── output = videoOutput(args.output, argv=sys.argv)

Load object detection network
└── net = detectNet(args.network, sys.argv, args.threshold)

# Note: Hard-code paths to load a model (commented out)
   ├── net = detectNet(model="model/ssd-mobilenet.onnx", labels="model/labels.txt", 
   ├──                 input_blob="input_0", output_cvg="scores", output_bbox="boxes", 
   └──                 threshold=args.threshold)

Process frames until EOS or user exits
└── while True:
    ├── Capture next image
    │   └── img = input.Capture()
    │       └── if img is None: # timeout
    │           └── continue
    ├── Detect objects in the image
    │   └── detections = net.Detect(img, overlay=args.overlay)
    ├── Print the detections
    │   ├── print("detected {:d} objects in image".format(len(detections)))
    │   └── for detection in detections:
    │       └── print(detection)
    ├── Render the image
    │   └── output.Render(img)
    ├── Update the title bar
    │   └── output.SetStatus("{:s} | Network {:.0f} FPS".format(args.network, net.GetNetworkFPS()))
    ├── Print performance info
    │   └── net.PrintProfilerTimes()
    └── Exit on input/output EOS
        ├── if not input.IsStreaming() or not output.IsStreaming():
        └── break
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57

3.2 C++

#include statements
├── "videoSource.h"
├── "videoOutput.h"
├── "detectNet.h"
├── "objectTracker.h"
└── <signal.h>

Global variables
└── bool signal_recieved = false;

Function definitions
├── void sig_handler(int signo)
│   └── if (signo == SIGINT)
│       ├── LogVerbose("received SIGINT\n");
│       └── signal_recieved = true;
└── int usage()
    ├── printf("usage: detectnet [--help] [--network=NETWORK] [--threshold=THRESHOLD] ...\n");
    ├── printf("                 input [output]\n\n");
    ├── printf("Locate objects in a video/image stream using an object detection DNN.\n");
    ├── printf("See below for additional arguments that may not be shown above.\n\n");
    ├── printf("positional arguments:\n");
    ├── printf("    input           resource URI of input stream  (see videoSource below)\n");
    ├── printf("    output          resource URI of output stream (see videoOutput below)\n\n");
    ├── printf("%s", detectNet::Usage());
    ├── printf("%s", objectTracker::Usage());
    ├── printf("%s", videoSource::Usage());
    ├── printf("%s", videoOutput::Usage());
    └── printf("%s", Log::Usage());

main function
├── Parse command line
│   ├── commandLine cmdLine(argc, argv);
│   └── if (cmdLine.GetFlag("help"))
│       └── return usage();
├── Attach signal handler
│   └── if (signal(SIGINT, sig_handler) == SIG_ERR)
│       └── LogError("can't catch SIGINT\n");
├── Create input stream
│   ├── videoSource* input = videoSource::Create(cmdLine, ARG_POSITION(0));
│   └── if (!input)
│       ├── LogError("detectnet:  failed to create input stream\n");
│       └── return 1;
├── Create output stream
│   ├── videoOutput* output = videoOutput::Create(cmdLine, ARG_POSITION(1));
│   └── if (!output)
│       ├── LogError("detectnet:  failed to create output stream\n");
│       └── return 1;
├── Create detection network
│   ├── detectNet* net = detectNet::Create(cmdLine);
│   └── if (!net)
│       ├── LogError("detectnet:  failed to load detectNet model\n");
│       └── return 1;
│   └── const uint32_t overlayFlags = detectNet::OverlayFlagsFromStr(cmdLine.GetString("overlay", "box,labels,conf"));
├── Processing loop
│   └── while (!signal_recieved)
│       ├── Capture next image
│       │   ├── uchar3* image = NULL;
│       │   ├── int status = 0;
│       │   ├── if (!input->Capture(&image, &status))
│       │   │   └── if (status == videoSource::TIMEOUT)
│       │   │       └── continue;
│       │   │   └── break; // EOS
│       ├── Detect objects in the frame
│       │   ├── detectNet::Detection* detections = NULL;
│       │   ├── const int numDetections = net->Detect(image, input->GetWidth(), input->GetHeight(), &detections, overlayFlags);
│       │   └── if (numDetections > 0)
│       │       └── LogVerbose("%i objects detected\n", numDetections);
│       │       └── for (int n=0; n < numDetections; n++)
│       │           ├── LogVerbose("\ndetected obj %i  class #%u (%s)  confidence=%f\n", n, detections[n].ClassID, net->GetClassDesc(detections[n].ClassID), detections[n].Confidence);
│       │           ├── LogVerbose("bounding box %i  (%.2f, %.2f)  (%.2f, %.2f)  w=%.2f  h=%.2f\n", n, detections[n].Left, detections[n].Top, detections[n].Right, detections[n].Bottom, detections[n].Width(), detections[n].Height());
│       │           └── if (detections[n].TrackID >= 0)
│       │               └── LogVerbose("tracking  ID %i  status=%i  frames=%i  lost=%i\n", detections[n].TrackID, detections[n].TrackStatus, detections[n].TrackFrames, detections[n].TrackLost);
│       ├── Render outputs
│       │   ├── if (output != NULL)
│       │   │   ├── output->Render(image, input->GetWidth(), input->GetHeight());
│       │   │   ├── char str[256];
│       │   │   ├── sprintf(str, "TensorRT %i.%i.%i | %s | Network %.0f FPS", NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, precisionTypeToStr(net->GetPrecision()), net->GetNetworkFPS());
│       │   │   ├── output->SetStatus(str);
│       │   │   └── if (!output->IsStreaming())
│       │   │       └── break;
│       └── Print out timing info
│           └── net->PrintProfilerTimes();
├── Destroy resources
│   ├── LogVerbose("detectnet:  shutting down...\n");
│   ├── SAFE_DELETE(input);
│   ├── SAFE_DELETE(output);
│   ├── SAFE_DELETE(net);
└── LogVerbose("detectnet:  shutdown complete.\n");
    └── return 0;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89

4. 参考资料

【1】jetson-inference - Locating Objects with DetectNet

注:本文转载自blog.csdn.net的lida2003的文章"https://blog.csdn.net/lida2003/article/details/139377486"。版权归原作者所有,此博客不拥有其著作权,亦不承担相应法律责任。如有侵权,请联系我们删除。
复制链接
复制链接
相关推荐
发表评论
登录后才能发表评论和回复 注册

/ 登录

评论记录:

未查询到任何数据!
回复评论:

分类栏目

后端 (14832) 前端 (14280) 移动开发 (3760) 编程语言 (3851) Java (3904) Python (3298) 人工智能 (10119) AIGC (2810) 大数据 (3499) 数据库 (3945) 数据结构与算法 (3757) 音视频 (2669) 云原生 (3145) 云平台 (2965) 前沿技术 (2993) 开源 (2160) 小程序 (2860) 运维 (2533) 服务器 (2698) 操作系统 (2325) 硬件开发 (2492) 嵌入式 (2955) 微软技术 (2769) 软件工程 (2056) 测试 (2865) 网络空间安全 (2948) 网络与通信 (2797) 用户体验设计 (2592) 学习和成长 (2593) 搜索 (2744) 开发工具 (7108) 游戏 (2829) HarmonyOS (2935) 区块链 (2782) 数学 (3112) 3C硬件 (2759) 资讯 (2909) Android (4709) iOS (1850) 代码人生 (3043) 阅读 (2841)

热门文章

101
推荐
关于我们 隐私政策 免责声明 联系我们
Copyright © 2020-2025 蚁人论坛 (iYenn.com) All Rights Reserved.
Scroll to Top