yolov7 detect.py详解

这段代码的作用是加载二次分类器（如果需要）和设置数据加载器，同时获取类别的名称和颜色信息。如果使用摄像头，则还会获取图像的路径、图像序号、原始图像和数据源的帧数。根据数据集的模式，如果是图像模式，则直接保存图像。然后，函数加载模型并初始化一些变量，包括选择设备、是否使用半精度计算、模型的步长（stride）和检查图像大小。首先，将边界框的坐标从预测图像大小调整到输入图像大小，并进行四舍五入。总的来

春暖花开*

2530人浏览 · 2023-10-05 18:51:07

春暖花开* · 2023-10-05 18:51:07 发布

def detect(save_img=False):
    source, weights, view_img, save_txt, imgsz, trace = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size, not opt.no_trace
    save_img = not opt.nosave and not source.endswith('.txt')  # save inference images
    webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
        ('rtsp://', 'rtmp://', 'http://', 'https://'))

    # Directories
    save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Initialize
    set_logging()
    device = select_device(opt.device)
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    stride = int(model.stride.max())  # model stride
    imgsz = check_img_size(imgsz, s=stride)  # check img_size

    if trace:
        model = TracedModel(model, device, opt.img_size)

    if half:
        model.half()  # to FP16

函数的参数包括save_img（是否保存推断图像），source（输入的数据源），weights（模型的权重路径），view_img（是否可视化推断图像），save_txt（是否保存推断结果文本）、imgsz（输入图像的大小）、trace（是否追踪模型）等。

函数根据输入的参数设置一些变量，比如保存目录、是否使用摄像头、是否保存图像、日志等。

然后，函数加载模型并初始化一些变量，包括选择设备、是否使用半精度计算、模型的步长（stride）和检查图像大小。

接下来，如果需要追踪模型，则对模型进行追踪处理。最后，如果需要半精度计算，则将模型转为半精度计算。

# Second-stage classifier
    classify = False
    if classify:
        modelc = load_classifier(name='resnet101', n=2)  # initialize
        modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()

    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz, stride=stride)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]

首先，代码中有一个条件判断classify，如果classify为真，则会加载一个名为resnet101的分类器作为二次分类器。加载的分类器会被部署到设备上并设置为推断模式。

接下来，代码根据是否使用摄像头来设置数据加载器。如果使用摄像头，会调用LoadStreams来读取视频流数据。如果不使用摄像头，会调用LoadImages来读取图像数据。

然后，根据模型的类别数目获取类别的名称和对应的颜色。这里使用了model的names属性，如果model有module属性，则获取model.module.names，否则获取model.names。colors是一个二维列表，每个类别对应一个随机生成的RGB颜色。这段代码的作用是加载二次分类器（如果需要）和设置数据加载器，同时获取类别的名称和颜色信息。

# Run inference
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    old_img_w = old_img_h = imgsz
    old_img_b = 1

    t0 = time.time()
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Warmup
        if device.type != 'cpu' and (old_img_b != img.shape[0] or old_img_h != img.shape[2] or old_img_w != img.shape[3]):
            old_img_b = img.shape[0]
            old_img_h = img.shape[2]
            old_img_w = img.shape[3]
            for i in range(3):
                model(img, augment=opt.augment)[0]

        # Inference
        t1 = time_synchronized()
        with torch.no_grad():   # Calculating gradients would cause a GPU memory leak
            pred = model(img, augment=opt.augment)[0]
        t2 = time_synchronized()

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t3 = time_synchronized()

        # Apply Classifier
        if classify:
            pred = apply_classifier(pred, modelc, img, im0s)

首先，代码通过创建一个全零张量来运行一次模型，以在GPU上预先分配相关内存。

然后，代码通过迭代数据集中的每张图像进行推断。对于每张图像，首先将其转换为PyTorch张量并将其送入设备中。如果使用半精度计算，则将张量转换为半精度。然后，将像素值归一化到0.0-1.0范围内。

接下来的部分是一个热身过程，用于在开始推断之前预先执行一些计算。它主要处理输入图像大小变化的情况，以避免计算中断。此步骤只在GPU上执行。

然后，代码进行真正的推断过程。首先计算推断的开始时间，然后使用torch.no_grad()上下文管理器禁用梯度计算，以避免GPU内存泄漏。通过模型进行推断得到预测结果。

接下来，应用非最大抑制算法（NMS）对预测结果进行处理，去除冗余的边界框。

最后，如果设置了classify为真，则应用分类器（二次分类器）来对预测结果进行进一步加工。这个步骤会调用apply_classifier函数。这段代码的作用是对目标进行推断，并应用非最大抑制和分类器等后处理步骤。

# Process detections
        for i, det in enumerate(pred):  # detections per image
            if webcam:  # batch_size >= 1
                p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
            else:
                p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # img.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or view_img:  # Add bbox to image
                        label = f'{names[int(cls)]} {conf:.2f}'
                        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=1)

            # Print time (inference + NMS)
            print(f'{s}Done. ({(1E3 * (t2 - t1)):.1f}ms) Inference, ({(1E3 * (t3 - t2)):.1f}ms) NMS')

            # Stream results
            if view_img:
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)  # 1 millisecond

首先，通过enumerate(pred)来遍历每个图像的检测结果。如果使用摄像头，则还会获取图像的路径、图像序号、原始图像和数据源的帧数。如果不是使用摄像头，则这些变量的值会直接赋值。

然后，将图像路径转换为Path对象，并设置保存图像的路径和保存文本结果的路径。

接下来，使用torch.tensor构建归一化增益张量gn，用于将边界框坐标从预测图像大小转换到输入图像大小。

然后，代码判断如果det（预测结果）非空，则对结果进行处理。首先，将边界框的坐标从预测图像大小调整到输入图像大小，并进行四舍五入。

然后，代码通过遍历det中每个类别，统计每个类别的检测结果个数，并将结果添加到字符串s中。

接下来，代码通过反向遍历det中的每个边界框，逐个处理边界框。如果设置了save_txt为真，则将结果写入文本文件中。如果设置了save_img或view_img为真，则将边界框和标签绘制到图像上。

然后，代码打印推断和非最大抑制的时间。

最后，如果设置了view_img为真，则显示推断结果的图像。

总的来说，这段代码的作用是处理检测结果、将结果写入文件、绘制结果框和显示结果图像。

# Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                    print(f" The image with the result is saved in: {save_path}")
                else:  # 'video' or 'stream'
                    if vid_path != save_path:  # new video
                        vid_path = save_path
                        if isinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()  # release previous video writer
                        if vid_cap:  # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:  # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                            save_path += '.mp4'
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                    vid_writer.write(im0)

    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        #print(f"Results saved to {save_dir}{s}")

    print(f'Done. ({time.time() - t0:.3f}s)')

首先，代码判断如果设置了save_img为真，则会执行保存检测结果图像的操作。根据数据集的模式，如果是图像模式，则直接保存图像。如果是视频或流模式，则首先创建一个cv2.VideoWriter对象，将视频帧写入视频文件。

接下来，代码判断如果设置了save_txt或save_img为真，则会根据具体情况打印结果的保存信息。如果设置了save_txt为真，则会打印保存的文本结果的数量和路径。如果设置了save_img为真，则会打印保存的图像结果的路径。

最后，代码通过计算总的运行时间并进行打印，表示整个目标检测流程完成。

天启AI社区

GitCode 天启AI是一款由 GitCode 团队打造的智能助手，基于先进的LLM（大语言模型）与多智能体 Agent 技术构建，致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话，还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力，真正做到“一句话，让 Al帮你完成复杂任务”。

更多推荐