从零开始用PyTorch实现YOLO v3目标检测器教程（四）

在本系列教程中，我们正在逐步学习如何使用PyTorch从零开始实现YOLO v3目标检测器。这是该教程的第四部分。在之前的部分里，我们已经构建了一个模型，该模型能针对输入图像输出多个目标检测结果。具体而言，输出是一个形状为B x 10647 x 85的张量，其中B代表批次中的图像数量，10647是每张图像预测的边界框数量，85则是每个边界框的属性数量。然而，正如第一部分所描述的，我们必须对输出进行

2501_90323865

922人浏览 · 2025-06-04 01:55:49

2501_90323865 · 2025-06-04 01:55:49 发布

然而，正如第一部分所描述的，我们必须对输出进行目标置信度阈值处理和非极大值抑制，从而得到真正的检测结果。为此，我们将在util.py文件中创建一个名为write_results的函数。

`write_results`函数解析

def write_results(prediction, confidence, num_classes, nms_conf = 0.4):

此函数接受预测结果prediction、置信度阈值confidence（目标得分阈值）、类别数量num_classes（在我们的例子中为80）以及非极大值抑制的交并比阈值nms_conf作为输入。

目标置信度阈值处理：预测张量包含B x 10647个边界框的信息。对于每个目标得分低于阈值的边界框，我们将其所有属性（代表该边界框的整行）设置为零。通过以下代码实现：
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2) prediction = prediction*conf_mask
非极大值抑制：在进行非极大值抑制之前，我们需要将边界框的属性从（中心x，中心y，高度，宽度）转换为（左上角x，左上角y，右下角x，右下角y），这样更便于计算两个边界框的交并比（IoU）。代码如下：
box_corner = prediction.new(prediction.shape) box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2) box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2) box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2) box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2) prediction[:,:,:4] = box_corner[:,:,:4]

由于每个图像中的真实检测数量可能不同，所以置信度阈值处理和非极大值抑制必须逐张图像进行，这意味着我们无法对相关操作进行矢量化，而必须遍历预测结果的第一个维度（包含批次中图像的索引）。

batch_size = prediction.size(0) write = False for ind in range(batch_size): image_pred = prediction[ind] #image Tensor #confidence threshholding #NMS

write标志用于表示我们尚未初始化output张量，这个张量将用于收集整个批次中的真实检测结果。

进入循环后，我们需要清理一些数据。注意到每个边界框行有85个属性，其中80个是类别得分。此时，我们只关心具有最大值的类别得分，所以我们从每行中移除80个类别得分，而是添加具有最大值的类别的索引以及该类别的得分。
max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1) max_conf = max_conf.float().unsqueeze(1) max_conf_score = max_conf_score.float().unsqueeze(1) seq = (image_pred[:,:5], max_conf, max_conf_score) image_pred = torch.cat(seq, 1)

接着，我们去除那些目标置信度低于阈值的边界框行。
non_zero_ind = (torch.nonzero(image_pred[:,4])) try: image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7) except: continue #For PyTorch 0.4 compatibility #Since the above code with not raise exception for no detection #as scalars are supported in PyTorch 0.4 if image_pred_.shape[0] == 0: continue

上述的try - except块用于处理没有检测到目标的情况，此时我们使用continue跳过该图像循环体的其余部分。

然后，我们获取图像中检测到的类别。
#Get the various classes detected in the image img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index

由于同一类可能有多个真实检测，我们使用unique函数获取给定图像中存在的类别。
def unique(tensor): tensor_np = tensor.cpu().numpy() unique_np = np.unique(tensor_np) unique_tensor = torch.from_numpy(unique_np) tensor_res = tensor.new(unique_tensor.shape) tensor_res.copy_(unique_tensor) return tensor_res

之后，我们按类别进行非极大值抑制。
for cls in img_classes: #perform NMS

进入这个循环后，首先提取特定类别的检测结果（由变量cls表示）。
#get the detections with one particular class cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1) class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze() image_pred_class = image_pred_[class_mask_ind].view(-1,7) #sort the detections such that the entry with the maximum objectness s#confidence is at the top conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1] image_pred_class = image_pred_class[conf_sort_index] idx = image_pred_class.size(0) #Number of detections

接着，进行非极大值抑制。
for i in range(idx): #Get the IOUs of all boxes that come after the one we are looking at #in the loop try: ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:]) except ValueError: break except IndexError: break #Zero out all the detections that have IoU > treshhold iou_mask = (ious < nms_conf).float().unsqueeze(1) image_pred_class[i+1:] *= iou_mask #Remove the non-zero entries non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze() image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

这里使用了bbox_iou函数，第一个输入是循环中由变量i索引的边界框行，第二个输入是多个边界框行的张量，该函数的输出是一个张量，包含第一个输入边界框与第二个输入中每个边界框的IoU。如果同一类别的两个边界框的IoU大于阈值，则具有较低类别置信度的边界框将被消除。我们已经对边界框进行了排序，置信度较高的排在前面。

在循环体中，ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])这行代码给出了索引为i的边界框与所有索引大于i的边界框的IoU。每次迭代中，如果任何索引大于i的边界框与索引为i的边界框的IoU大于阈值nms_thresh，则该特定边界框将被消除。

这里将计算IoU的代码放在try - catch块中，是因为循环设计为运行idx次迭代（image_pred_class中的行数）。然而，随着循环的进行，image_pred_class中可能会移除一些边界框，这意味着即使从image_pred_class中移除一个值，我们也无法进行idx次迭代，因此可能会尝试索引超出范围的值（引发IndexError），或者切片image_pred_class[i+1:]可能会返回一个空张量，从而引发ValueError。此时，我们可以确定非极大值抑制无法再移除更多边界框，于是跳出循环。

计算IoU

def bbox_iou(box1, box2): """ Returns the IoU of two bounding boxes """ #Get the coordinates of bounding boxes b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3] b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3] #get the corrdinates of the intersection rectangle inter_rect_x1 = torch.max(b1_x1, b2_x1) inter_rect_y1 = torch.max(b1_y1, b2_y1) inter_rect_x2 = torch.min(b1_x2, b2_x2) inter_rect_y2 = torch.min(b1_y2, b2_y2) #Intersection area inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0) #Union Area b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1) b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1) iou = inter_area / (b1_area + b2_area - inter_area) return iou

写入预测结果

write_results函数输出一个形状为D x 8的张量。这里D是所有图像中的真实检测数量，每个检测结果由一行表示。每个检测结果有8个属性，即检测所属批次中图像的索引、4个角坐标、目标得分、具有最大置信度的类别的得分以及该类别的索引。

与之前一样，除非有检测结果要分配给output张量，否则我们不会初始化它。一旦初始化，我们将后续的检测结果连接到它。我们使用write标志来指示张量是否已初始化。在按类别迭代的循环结束时，我们将结果检测添加到output张量中。
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image seq = batch_ind, image_pred_class if not write: output = torch.cat(seq,1) write = True else: out = torch.cat(seq,1) output = torch.cat((output,out))

在函数的末尾，我们检查output是否已初始化。如果没有初始化，意味着批次中的任何图像都没有检测到目标，在这种情况下，我们返回0。
try: return output except: return 0

至此，我们终于得到了一个以张量形式的预测结果，其中每行列出了一个预测。现在唯一剩下的就是创建一个输入管道，从磁盘读取图像，计算预测，在图像上绘制边界框，然后显示或写入这些图像，这将是我们在下一部分要做的内容。

天启AI社区

GitCode 天启AI是一款由 GitCode 团队打造的智能助手，基于先进的LLM（大语言模型）与多智能体 Agent 技术构建，致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话，还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力，真正做到“一句话，让 Al帮你完成复杂任务”。

更多推荐