Home SORT
Post
Cancel

SORT

Simple Online and Realtime Tracking(SORT)

SORT is a novel multi-object tracking algorithm that separates the detector and the assigner, unlike other traditional tracking algorithms. Additionally, unlike other traditional tracking algorithms, SORT uses simple but efficient methods such as the Kalman filter and the Hungarian method to handle the motion prediction and data association components of the tracking problem. Additionally, as the name suggests (Online Tracking), SORT performs detection response association to form a tracking trajectory using only object detection information from past and current frames, without information about future frames.



Methodologies

The proposed method is described by the following key components:

  • Detection

  • Propagating object states into future frames(Estimation Model)

  • Associating current detections with existing objects(Data Association)

  • Managing the lifespan of tracked objects



Detection

As mentioned in the first paragraph, they used detection frameworks. It might seem strange since it was made a long time ago, but at that time, using VGG16 was considered the best according to their experience.
We can see the comparison of tracking performance by switching the detectors:




Estimation Model

To achieve online tracking, they designed a representation and motion model used to propagate a target’s identity into the next frame. They calculated the inter-frame displacements of each object with a linear constant velocity model that is independent of other objects and camera motion. The state of each target is:

$x = [u, v, s, r, \dot{u}, \dot{v}, \dot{s}]^T$

where

  • $u$ : Horizontal pixel location of the center of the target.

  • $v$ : Vertical pixel location of the center of the target.

  • $s$ : Scale of the target’s bounding box.

  • $r$ : Aspect ratio of the target’s bounding box.

  • $\dot{u}$ : Future horizontal pixel location of the center of the target.

  • $\dot{v}$ : Future vertical pixel location of the center of the target.

  • $\dot{s}$ : Future scale of the target’s bounding box.

    ※Note that $r$ is considered to be constant.



If a detection is associated with a target, the detected bounding box is used to update the target state ($u, v, s, r$), where the velocity components are solved optimally via the Kalman filter.
If a detection is not associated with the target, its state is simply predicted without correction using the linear velocity model.


Data Association

This step involves assigning detections to existing targets. Each target’s bounding box geometry is estimated by predicting its new location in the current frame.
Like other assigners, SORT also needs an assignment cost matrix, which is computed as the IOU distance between each detection and all predicted bounding boxes from the existing targets. Here, the Hungarian algorithm is used to solve the assignment optimally.
Additionally, $IOU_{min}$ is used to reject assignments where the detection-to-target overlap is $\leq IOU_{min}$.

However, there is a weakness. When the target is occluded by another object, only the occluder will be detected because the IOU distance appropriately favors detections with a similar scale.

Managing the lifespan of tracked objects

To distinguish objects, it is necessary to create or destroy unique identities.
In this paper, when trackers are created, any detection with an overlap less than $IOU_{min}$ is considered to signify the existence of an untracked object, as mentioned in the Data Association section.
Then, let’s take a look at what happens when creating and deleting trackers.

Creating Trackers

0 The trackers are initialized using the geometry of the bounding box, with the velocity set to zero. Because the velocity is unobserved at this point, the covariance of the velocity component is initialized with large values to reflect this uncertainty.
Then, the new trackers undergo a probationary period during which the targets need to be associated with detections to accumulate enough evidence to prevent the tracking of false positives.

Deleting Trackers

Trackers are deleted if they are not detected for $T_{Lost}$ frames. This method prevents an uncontrolled growth in the number of trackers and localization errors caused by predictions over long durations without corrections from the detector.

Performance

They evaluated the performance of SORT using FRCNN (VGG16) and the MOT benchmark database.
They obtained the following results:

where:

  • MOTA(↑) : Multi-object tracking accuracy.
  • MOTP(↑) : Multi-object tracking precision.
  • FAF(↓) : Number of false alarms per frame.
  • MT(↑) : Number of mostly tracked trajectories. For example, the target has the same label for at least 80% of its lifespan.
  • ML(↓) : Number of mostly lost trajectories. For example, the target is not tracked for at least 20% of its lifespan.
  • FP(↓) : Number of false detections.
  • FN(↓) : Number of missed detections.
  • ID sw(↓) : Number of times an ID switches to a different previously tracked object.
  • Frag(↓) : Number of fragmentations where a track is interrupted by miss detection.

    Note that ↑ means higher scores denote better performance and ↓ means lower scores denote better performance.

As you can see in the image above, the performance of SORT is not the best because there is a trade-off between being online and performance. SORT is designed to be used in real-time applications such as robotics and autonomous vehicles. Then, let’s see the difference between batch type and online type.



Batch Tracking Type

The batch tracking method performs repeated detection-response association for all object information detected in the entire sequence to form the complete tracking trajectory. Because it forms a tracking trajectory using object detection information for the entire sequence, it shows better tracking performance than online tracking methods.
However, it is difficult to apply to real-time applications because object detection information for the entire sequence must be obtained in advance.

Online Tracking Type

The online tracking method uses only object detection information from past and current frames, without information about future frames, to perform detection-response association and form a tracking trajectory.Therefore, the online tracking method is more suitable for real-time applications compared to the batch tracking method.
Due to the absence of future frame information, online tracking methods are vulnerable to long-term occlusion or changes in object appearance, making it difficult to associate detection responses and form tracking trajectories. This causes track fragmentation and identity switches, which reduce tracking performance.





Conclusion

SORT demonstrates that existing tracking algorithms perform well when combined with a SOTA detector.
However, because it is an online tracking method, it has weaknesses, such as being vulnerable to occlusions.

This post is licensed under CC BY 4.0 by the author.