![[Paper Review & Implementation] You Only Look Once: Unified, Real-Time Object Detection (YOLOv1, 2016)](/assets/img/papers/objd/yolo_v1.png)
[Paper Review & Implementation] You Only Look Once: Unified, Real-Time Object Detection (YOLOv1, 2016)
- Reference
- Implementation with PyTorch
- YOLO : Single-State Object Detection
- YOLOv1 Backbone Architecture : Feature Extractor and Region Proposals
- Customize PASCAL VOC Dataset
- Process the Predicted Bounding Boxes from YOLO Model and Obtain Final Target Boxes
- Calculate Mean Average Precison (mAP) between Predicted Boxes and Ground Truths
- Computing Loss and Optimizing the Model
- Training Result & Display Predicted Anchors from Trained Model onto Image
- You Only Look Once: Unified, Real-Time Object Detection, Joseph Redmon, 2016
- Machine-Learning-Collection/ML/Pytorch/object_detection/YOLO/
Implementation with PyTorch
I referenced the entire code from here, but I made a few modifications to optimize performance, such as replacing explicit for loops with vectorization for matrix operations.
YOLO : Single-State Object Detection
(a) : Two-stage object detection models like faster R-CNN consists of two main components : a region proposal network (RPN) and a classifier.
- RPN generates region proposals (potential bounding box locations), and these proposals are then filtered and refined to be the final target RoIs by the classifier.
(b) : In contrast, YOLO is a single-stage object detection model where region proposal stage is incorporated into a feature extractor architecture.
It divides the input image into a set of grid cells and predicts bounding boxes and class probabilities directly from each cell instead of explicitly generating region proposals.
Uses a CNN to extract features from the entire image at once and predict object detections.
YOLOv1 Backbone Architecture : Feature Extractor and Region Proposals
Figure 3. YOLOv1 Architecure (Darknet framework) : 24 conv layers followed by 2 fc layers
import torch
import torch.nn as nn
architecture_configs = [
(7, 64, 2, 3),
(3, 192, 1, 1),
(1, 128, 1, 0),
(3, 256, 1, 1),
(1, 256, 1, 0),
(3, 512, 1, 1),
[(1, 256, 1, 0), (3, 512, 1, 1), 4],
(1, 512, 1, 0),
(3, 1024, 1, 1),
[(1, 512, 1, 0), (3, 1024, 1, 1), 2],
(3, 1024, 1, 1),
(3, 1024, 2, 1),
(3, 1024, 1, 1),
(3, 1024, 1, 1),
class CNNBlock(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super(CNNBlock, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
self.batchnorm = nn.BatchNorm2d(out_channels)
self.leakyrelu = nn.LeakyReLU(0.1)
def forward(self, x):
return self.leakyrelu(self.batchnorm(self.conv(x)))
class YOLOv1(nn.Module):
def __init__(self, in_channels=3, **kwargs):
super(YOLOv1, self).__init__()
self.architecture = architecture_configs
self.in_channels = in_channels
self.darknet = self._create_conv_layers(self.architecture)
self.fcs = self._create_fcs(**kwargs)
def forward(self, x):
x = self.darknet(x)
return self.fcs(torch.flatten(x, start_dim=1))
def _create_conv_layers(self, architecture):
layers = []
in_channels = self.in_channels
for x in architecture:
if type(x) == tuple:
layers += [
in_channels, x[1], kernel_size=x[0], stride=x[2], padding=x[3],
in_channels = x[1]
elif type(x) == str:
layers += [nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))]
elif type(x) == list:
conv1 = x[0]
conv2 = x[1]
num_repeats = x[2]
for _ in range(num_repeats):
layers += [
layers += [
in_channels = conv2[1]
return nn.Sequential(*layers)
def _create_fcs(self, split_size, num_boxes, num_classes):
S, B, C = split_size, num_boxes, num_classes
return nn.Sequential(
nn.Linear(1024*S*S, 4096),
nn.Linear(4096, S*S*(B*5+C))
Input : images with shape (3, 448, 448)
- pretrained with ImageNet (224 x 224) and double the resolution of input at detection.
Inspired by Inception Net, but simply used the 1 x 1 conv layer followed 3 x 3 conv layer (basically just a bottleneck block) instead of cocatenating them in parallel.
Output : (n_batch, SS(B*5+C) = 1470)
S : grid size (here, total 49 cells from 7 x 7 grid)
B : the number of bounding boxes attached to each cell. (in paper, 2)
C : the number of classes (20)
5 predictions: x, y, w, h, and confidence score (objectness)
- x, y, w, h all normalized to fall between 0 and 1, relative to cell ratio.
Customize PASCAL VOC Dataset
class VOCDataset(torch.utils.data.Dataset):
- csv_file : contain annotations for each img and label file.
- img_dir : directory that contains all img files (3 x 224 x 224)
- label_dir : directory that contains all label files
- each line holds [class_label, x, y, width, height]
def __init__(self, csv_file, img_dir, label_dir, S=7, B=2, C=20, transform=None):
self.annotations = pd.read_csv(csv_file)
self.img_dir = img_dir
self.label_dir = label_dir
self.transform = transform
self.S = S
self.B = B
self.C = C
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
label_path = os.path.join(self.label_dir, self.annotations.iloc[index, 1])
boxes = []
with open(label_path) as f:
for label in f.readlines():
class_label, x, y, width, height = [
float(x) if float(x) != int(float(x)) else int(x)
for x in label.replace("\n", "").split()
boxes.append([class_label, x, y, width, height])
img_path = os.path.join(self.img_dir, self.annotations.iloc[index, 0])
image = Image.open(img_path)
boxes = torch.tensor(boxes)
if self.transform:
# image = self.transform(image)
image, boxes = self.transform(image, boxes)
# Convert To Cells
label_matrix = torch.zeros((self.S, self.S, self.C + 5 * self.B)) # label matrix (S, S, C+B*5)
for box in boxes:
class_label, x, y, width, height = box.tolist()
class_label = int(class_label)
# i,j represents the cell row and cell column
i, j = int(self.S * y), int(self.S * x)
x_cell, y_cell = self.S * x - j, self.S * y - i # relative to cell ratios
width_cell, height_cell = (
width * self.S,
height * self.S,
# restrict only one object per each cell
if label_matrix[i, j, 20] == 0:
label_matrix[i, j, 20] = 1 # indicating that an oject is alreay assigned in cell i, j
# Box coordinates
box_coordinates = torch.tensor(
[x_cell, y_cell, width_cell, height_cell]
label_matrix[i, j, 21:25] = box_coordinates
label_matrix[i, j, class_label] = 1 # Set one hot encoding for class_label
return image, label_matrix
Read the img and label files in ‘img_dir’ and ‘label_dir’, respectively.
All img and label files are annotated by the file in the indicated path ‘csv_file’
Read all lines in the label file using a for loop, where each line contains the information about class_label, x, y, width, height of each bounding box.
Transform the image and boxes with a defined transformer.
resize 224 x 224 images into 448 x 448 resolution and convert them in tensors.
transform = Compose([transforms.Resize((448, 448)), transforms.ToTensor()])
Convert the coordinates of each bounding box from entire image ratio into cell ratio (entire image is divided to 7 x 7 cells of grid).
Assign an object (bounding box) to each cell of grid. (one per a cell)
Generate a 7 x 7 label matrix for the given image, where each cell (i, j) holds the labels (class, x, y, width, height) of the assigned bounding box.
Process the Predicted Bounding Boxes from YOLO Model and Obtain Final Target Boxes
def get_bboxes(n_batch, loader, model, iou_threshold, threshold, device, box_format='center'):
pred_boxes, gt_boxes = [], []
model = model.to(device)
model.eval() # make sure to turn off the train mode before getting final bboxes.
for batch_idx, (img, labels) in enumerate(loader):
# add training index (later will be used to calculate mAP)
train_idxs = (torch.arange(n_batch) + (batch_idx*n_batch)).unsqueeze(-1).unsqueeze(-1).repeat(1, 49, 1)
img = img.to(device)
labels = labels.to(device)
with torch.no_grad():
preds = model(img) # (n_batch, S*S*(C+B*5))
S = 7
n_batch = len(preds)
pred_bboxes = convert_cellboxes(preds).reshape(n_batch, S*S, -1)
gt_bboxes = convert_cellboxes(labels).reshape(n_batch, S*S, -1)
pred_bboxes = torch.concat([train_idxs, pred_bboxes], dim=-1)
gt_bboxes = torch.concat([train_idxs, gt_bboxes], dim=-1)
pred_boxes += non_max_suppression(pred_bboxes, iou_threshold, threshold, "center")
for bb in [box[np.where(box[:, 2] > threshold)[0]] for box in gt_bboxes]:
gt_boxes += [b for b in bb]
# if batch_idx == 10: break
return pred_boxes, gt_boxes
Step 1 : Convert bounding boxes relative to cell ratio into entire image ratio
def convert_cellboxes(predictions, S=7):
Convert output of yolo v1
(n_batch, 7*7*(5*B+C)) -> (n_batch, 7, 7, [pred_clss, best_confidence_score, center coordinates])
- Convert bounding boxes with grid split size S relative to cell ratio into entire image ratio.
predictions = predictions.to("cpu")
batch_size = predictions.shape[0]
predictions = predictions.reshape(batch_size, 7, 7, 30)
bboxes1 = predictions[..., 21:25] # 1 bbox
bboxes2 = predictions[..., 26:30] # 2 bbox
# select best bounding box with highest confidence score among 2 candidates
scores = torch.cat((predictions[..., 20].unsqueeze(0), predictions[..., 25].unsqueeze(0)), dim=0) # 2 x (n_batch x 7 x 7)
best_box = scores.argmax(0).unsqueeze(-1) # n_batch x 7 x 7 x 1
best_boxes = bboxes1 * (1 - best_box) + best_box * bboxes2
cell_indices = torch.arange(7).repeat(batch_size, 7, 1).unsqueeze(-1)
x = (1/S) * (best_boxes[..., :1] + cell_indices) # (0 + x0, 1 + x1, 2 + x2, ... 6 + x6)*(1/S)
y = (1/S) * (best_boxes[..., 1:2] + cell_indices.permute(0, 2, 1, 3))
w_y = (1/S) * best_boxes[..., 2:4]
converted_bboxes = torch.cat((x, y, w_y), dim=-1)
pred_class = predictions[..., :20].argmax(-1).unsqueeze(-1) # class with best pred scores.
best_conf = torch.max(predictions[..., 20], predictions[..., 25]).unsqueeze(-1) # best confidence score
converted_preds = torch.cat((pred_class, best_conf, converted_bboxes), dim=-1)
return converted_preds
Only select the anchor with best confidence score out of the two attached anchors per cell and mask the other.
Normalize the coordinates, width, and height of the bounding boxes to be bounded between 0 and 1, all relative to entire image ratio.
Set the label of each bounding box as the one with best predicted score.
Output : (n_batch, 7, 7, [best_pred_clss, best_confidence_score, (x, y, w, h)])
Step 2 : Non-Maximal Suppression onto Predicted Bounding Boxes
- Detect multiple detections for one object and suppress all except the one with the highest confidence score.
def non_max_suppression(bboxes, iou_threshold, threshold, box_format='center'):
bboxes : (n_batch, 7*7, [pred_clss, best_confidence_score, coordinates (x, y, w, h)])
iou_threshold != threshold
threshold here refers to the minimum confidence score to be selected as true bbox.
n_batch = len(bboxes)
post_nms_bboxes = []
for j in range(n_batch):
j_bboxes = bboxes[j]
j_bboxes = j_bboxes[np.where(j_bboxes[:, 2] > threshold)[0]]
if not j_bboxes.size(0):
confidence_scores = j_bboxes[:, 2]
order_idxs = confidence_scores.ravel().argsort(descending=True) # sort bboxes by confidence scores in a descending order.
keep_bboxes = []
order_idxs = order_idxs.argsort(descending=True)
while (order_idxs.size(0) > 0):
target_idx = order_idxs[0]
ious = compute_ious(box_format, j_bboxes[order_idxs[1:]], j_bboxes[target_idx])
keep_idxs = np.where(ious <= iou_threshold)[0]
order_idxs = order_idxs[keep_idxs+1]
post_nms_bboxes += [box for box in j_bboxes[keep_bboxes]]
return post_nms_bboxes
Initially, filter out only the bounding boxes with confidence scores higher than a specified threshold. (here, 0.35)
Sort the bounding boxes in descending order according to the objectness confidence scores.
Suppress the bounding boxes that captures the same object with the selected box.
- Compute IoU score with the target box and remove the boxes with IoU higher than an IoU threshold.
Calculate Mean Average Precison (mAP) between Predicted Boxes and Ground Truths
We now get filtered predicted bounding boxes that are about to used to calculate mAP score with ground truth boxes.
Before doing that, let’s figure out what mAP score is.
Mean Average Precison (mAP)
- mAP is mean “Averaged Precision” (AP) over all classes, which is determined by plotting the precision-recall curve for the class and calculating the area under this curve (AUC).
$\large \text{mAP} = \frac{1}{n} \sum_{i=1}^{n} \text{AP}_{i}$
AUC : Area underneath the Precision-Recall Curve
- Precision-Recall Curve : Shows the trade-off between precision and recall at different classification thresholds. (to label the given data as Positive or Negative for a specific class)
TP / (TP + FP)
The ratio of true positives out of all positivie predictions.
TP / (TP + FN)
The ratio of true positives out of all positive labels. (ground truth)
AUC : Integral of precision scores with respect to recall scores by varying classification threshold from 0 to 1.
def mean_average_precision(pred_boxes, gt_boxes, iou_threshold, num_classes, box_format="center"):
pred_boxes = tensor([train_idx, pred_class, best_confidence_score, x, y, w, h])
gt_boxes = same as pred_boxes
average_precisions = []
for c in range(num_classes):
c_pred_boxes = [box.unsqueeze(0) for box in pred_boxes if box[1] == c] # check if class matched, shape : (num_detections, 7)
c_gt_boxes = [box.unsqueeze(0) for box in gt_boxes if box[1] == c]
num_T, num_P = len(c_gt_boxes), len(c_pred_boxes)
if not num_T: continue
if not num_P:
c_pred_boxes = torch.concat(c_pred_boxes, dim=0)
FP = torch.zeros(num_P)
TP = torch.zeros(num_P)
gt_cnt = Counter([int(gt[:, 0].item()) for gt in c_gt_boxes]) # counting gt boxes by train_idx (per image) within a class, gt_cnt = {0:3, 1:5}
gt_cnt_graph = {}
# set 1 if detected -> only one predicter is responsible for each object (gt box)
for idx, cnt in gt_cnt.items():
gt_cnt_graph[idx] = torch.zeros(cnt)
c_gt_boxes = torch.concat(c_gt_boxes, dim=0)
for pred_idx, box in enumerate(c_pred_boxes):
train_idx = box[0].item() # check if train_idx matched
target_gt_idxs = np.where(c_gt_boxes[:, 0] == train_idx)[0]
if not len(target_gt_idxs):
FP[pred_idx] = 1
ious = compute_ious('center', c_gt_boxes[target_gt_idxs, 3:], box[3:])
best_iou_idx = ious.argmax(dim=0)
best_iou = ious[best_iou_idx]
# check 1. whether current gt_box is already detected by other pred box and 2. object detection clearly captures ground truth.
if gt_cnt_graph[int(train_idx)][best_iou_idx] == 0 and best_iou >= iou_threshold:
TP[pred_idx] = 1
gt_cnt_graph[int(train_idx)][best_iou_idx] = 1
FP[pred_idx] = 1
average_precisions.append(get_average_precision(FP, TP, num_T))
return sum(average_precisions)/len(average_precisions)
def get_average_precision(TP, FP, num_T):
eps = 1e-8
TP_cumsum = torch.cumsum(TP, dim=0)
FP_cumsum = torch.cumsum(FP, dim=0)
recalls = TP_cumsum / (num_T + eps)
precisions = torch.divide(TP_cumsum, (TP_cumsum + FP_cumsum + eps))
precisions = torch.cat((torch.tensor([1]), precisions))
recalls = torch.cat((torch.tensor([0]), recalls))
# torch.trapz for numerical integration
ap = torch.trapz(precisions, recalls)
return ap.item()
Compute mAP between the predicted and ground truth boxes with matched class and training index.
Each ground truth box is taken over by one predictor (predicted anchor).
- checked by
- checked by
TP vs FP
TP : a box whose best IoU score with gt boxes (which are not pre-occupied by former predictors) is higher than the iou threshold (0.5)
FP : a box that is not the case of TP.
Avearge Precision (AP)
- Numerical intergraion of precisions with respect to recalls
Computing Loss and Optimizing the Model
prev_loss = 9999
early_stopping = 0
for e in range(epochs):
pred_boxes, gt_boxes = get_bboxes(n_batch, train_loader, model, iou_threshold, threshold, device)
mAP = mean_average_precision(pred_boxes, gt_boxes, iou_threshold, num_classes)
print(f"Train mAP: {mAP}")
mean_loss = compute_loss(train_loader, model, optimizer, yolo_loss)
if mean_loss <= prev_loss:
checkpoint = {
"state_dict": model.state_dict(),
"optimizer": optimizer.state_dict(),
save_checkpoint(checkpoint, filename=chkpt_dir)
early_stopping += 1
if early_stopping == 2 :
print("----- Early Stopping -----")
prev_loss = mean_loss
def compute_loss(loader, model, optimizer, yolo_loss):
model = model.to(device)
loop = tqdm(loader, leave=True)
loss_history = []
for batch_idx, (img, labels) in enumerate(loop):
img, labels = img.to(device), labels.to(device)
preds = model(img)
loss = yolo_loss(preds, labels)
# update progress bar
mean_loss = sum(loss_history)/len(loss_history)
print(f" Mean loss : {mean_loss}")
return mean_loss
- Running through all epochs, compute the loss between predicted anchors and ground truth.
- Multi-task loss consists of coordinate regression loss (x, y, w, h), oject loss, no object loss, and classfication loss (not softmax, just MSE).
class YoloLoss(nn.Module):
Calculate the loss for yolo (v1) model
def __init__(self, S=7, B=2, C=20):
super(YoloLoss, self).__init__()
self.mse = nn.MSELoss(reduction="sum")
self.S = S
self.B = B
self.C = C
self.lambda_noobj = 0.5
self.lambda_coord = 5
def forward(self, predictions, target):
predictions = predictions.reshape(-1, self.S, self.S, self.C + self.B * 5)
## coordinate regression loss
# calculate IoU for the two predicted bounding boxes with target bbox
iou_b1 = compute_ious('center', predictions[..., 21:25], target[..., 21:25])
iou_b2 = compute_ious('center', predictions[..., 26:30], target[..., 21:25])
ious = torch.cat([iou_b1.unsqueeze(0), iou_b2.unsqueeze(0)], dim=0)
# Take the box with highest IoU out of the two prediction -> one bounding box prediction per each object
iou_maxes, bestbox = torch.max(ious, dim=0) # val, idx (argmax) = 0 (1st bbox) or 1 (2nd bbox)
obj_mask = target[..., 20:21] # object / no object (whether ground truth box holds object or not)
# Set boxes with no object in them to 0. Only select the box with max iou with ground truth
box_predictions = obj_mask*(bestbox * predictions[..., 26:30] + (1 - bestbox) * predictions[..., 21:25])
box_targets = obj_mask*target[..., 21:25]
# Take sqrt of width, height of boxes
box_predictions[..., 2:4] = torch.sign(box_predictions[..., 2:4]) * torch.sqrt(torch.abs(box_predictions[..., 2:4] + 1e-6))
box_targets[..., 2:4] = torch.sqrt(box_targets[..., 2:4])
box_loss = self.mse(torch.flatten(box_predictions, end_dim=-2),
torch.flatten(box_targets, end_dim=-2))
## object loss
pred_box = bestbox * predictions[..., 25:26] + (1 - bestbox) * predictions[..., 20:21]
object_loss = self.mse(
torch.flatten(obj_mask * pred_box),
torch.flatten(obj_mask * target[..., 20:21]),
## no object loss
no_object_loss = self.mse(torch.flatten((1 - obj_mask) * predictions[..., 20:21], start_dim=1),
torch.flatten((1 - obj_mask) * target[..., 20:21], start_dim=1))
no_object_loss += self.mse(torch.flatten((1 - obj_mask) * predictions[..., 25:26], start_dim=1),
torch.flatten((1 - obj_mask) * target[..., 20:21], start_dim=1))
## classification loss
class_loss = self.mse(torch.flatten(obj_mask * predictions[..., :20], end_dim=-2,),
torch.flatten(obj_mask * target[..., :20], end_dim=-2,))
loss = ( self.lambda_coord * box_loss
+ object_loss
+ self.lambda_noobj * no_object_loss
+ class_loss)
return loss
Out of the two predicted bounding boxes, only the one with highest IoU with corresponding ground truth is selected to compute all subsequent losses.
Except the no objectness error (fourth line of defined loss formula), all losses are computed only if an object is present in the target grid cell.
- $\large \mathbb{1}^{obj}_{ij}$ : an indicator function that returns 1 only if the cell of ith row and jth coloumn contains object, otherwise 0.
Multiply a weighting factor to differentially scale the effect of each type of error.
$\large \lambda_{coord}$ : set as 5, increasing the loss from bounding box coordinate predictions.
$\large \lambda_{noobj}$ : set as 0.5, decreasing the loss from confidence score for no oject boxes.
All losses are simply computed using MSE.
Training Result & Display Predicted Anchors from Trained Model onto Image
from train import main
Train mAP: 0.0
100%|██████████| 12/12 [02:40<00:00, 13.35s/it, loss=87.3]
Mean loss : 348.82359250386554
=> Saving checkpoint
Train mAP: 0.0
100%|██████████| 12/12 [02:40<00:00, 13.38s/it, loss=122]
Mean loss : 158.5302308400472
=> Saving checkpoint
Train mAP: 0.0
100%|██████████| 12/12 [02:40<00:00, 13.38s/it, loss=104]
Mean loss : 122.02847099304199
=> Saving checkpoint
Train mAP: 0.8965692117810249
100%|██████████| 12/12 [02:38<00:00, 13.20s/it, loss=68.3]
Mean loss : 91.6347204844157
=> Saving checkpoint
Train mAP: 0.9108292050659657
100%|██████████| 12/12 [02:38<00:00, 13.20s/it, loss=56.3]
Mean loss : 81.93021202087402
=> Saving checkpoint
Train mAP: 0.8115305352956057
100%|██████████| 12/12 [02:38<00:00, 13.22s/it, loss=45.2]
Mean loss : 71.95312404632568
=> Saving checkpoint
Train mAP: 0.8349191606044769
100%|██████████| 12/12 [02:36<00:00, 13.06s/it, loss=86.1]
Mean loss : 66.83295917510986
=> Saving checkpoint
Train mAP: 1.2271075576543808
100%|██████████| 12/12 [02:38<00:00, 13.20s/it, loss=80.2]
Mean loss : 59.71648979187012
=> Saving checkpoint
Train mAP: 0.752258376032114
100%|██████████| 12/12 [02:38<00:00, 13.20s/it, loss=115]
Mean loss : 51.03803793589274
=> Saving checkpoint
Train mAP: 1.038465577363968
100%|██████████| 12/12 [02:38<00:00, 13.18s/it, loss=41.7]
Mean loss : 49.17437616984049
=> Saving checkpoint
Train mAP: 1.2061315879225731
100%|██████████| 12/12 [02:38<00:00, 13.18s/it, loss=32.8]
Mean loss : 44.50646352767944
=> Saving checkpoint
Train mAP: 0.9527697516115088
100%|██████████| 12/12 [02:36<00:00, 13.06s/it, loss=30.3]
Mean loss : 41.781076431274414
=> Saving checkpoint
Train mAP: 1.3203784927725792
100%|██████████| 12/12 [02:37<00:00, 13.10s/it, loss=35]
Mean loss : 40.97672208150228
=> Saving checkpoint
Train mAP: 1.0776823602224652
100%|██████████| 12/12 [02:39<00:00, 13.26s/it, loss=30.2]
Mean loss : 39.26619656880697
=> Saving checkpoint
Train mAP: 1.008251892030239
100%|██████████| 12/12 [02:38<00:00, 13.18s/it, loss=73.9]
Mean loss : 37.97222876548767
=> Saving checkpoint
Train mAP: 0.9584985218942166
100%|██████████| 12/12 [02:38<00:00, 13.22s/it, loss=33.9]
Mean loss : 32.922889391581215
=> Saving checkpoint
Train mAP: 0.8698069430887699
100%|██████████| 12/12 [02:39<00:00, 13.29s/it, loss=27.5] Mean loss : 34.81727981567383
Train mAP: 0.9338142840485824
100%|██████████| 12/12 [02:37<00:00, 13.11s/it, loss=34.8]
Mean loss : 34.44208208719889
=> Saving checkpoint
Train mAP: 0.927067095511838
100%|██████████| 12/12 [02:38<00:00, 13.17s/it, loss=35]
Mean loss : 34.73020267486572
- Somewhat, some of mAP scores computed are bigger than 1. (I didn’t figure why this time, but I’ll try in future implementations of different models)
from train import main, plot_image
model, optimizer = main(load_model=True)
plot_image(idx=2, model=model)
Note there are two
functions, each in train.py and utils.py file. -
Coordinates of Predicted Bounding Boxes
[tensor([0.4892, 0.5136, 0.7015, 0.6358]), tensor([0.5608, 0.6186, 0.6383, 0.6278]), tensor([0.8117, 0.7309, 0.3865, 0.5740])]