Video Annotator - Tracking objects through video frames

The current tutorial illustrates how to use Ipyannotator to classify video data.

The task of identifying objects in a video frame is called video classification.

Ipyannotator allows users to explore an entire set of video frames and specific labels; manually create their datasets drawing bounding boxes and associating labels across the frames; improve existing annotations.

This tutorial is divided in the following steps:

Select dataset

This tutorial uses a minimal artificial video dataset generated by Ipyannotator. The dataset follows MOT data format. It contains 20 images with 2 classes (rectangle and circle) and doesn’t need to be downloaded.

dataset = DS.ARTIFICIAL_VIDEO

Setup annotator

This section will set up the paths and the input/output pair needed to classify the images.

The following cell will import the project file and directory where the images were generated. For this tutorial we simplify the process using the get_settings function instead of hardcoding the paths.

settings_ = get_settings(dataset)
settings_.project_file, settings_.image_dir
(Path('data/artificial/annotations.json'), 'images')

Ipyannotator uses pairs of input/output data to set up the annotation.

The video image classification annotator uses InputImage and OutputVideoBoxas the pair to set up the annotator.

The InputImage function provides information about the directory that contains the images to be classified, and the images itself. The OutputImageBox function provides information about the directory that contains the classes that can be associated with the images.

input_ = InputImage(image_dir=settings_.image_dir,
                    image_width=settings_.im_width,
                    image_height=settings_.im_height)

output_ = OutputVideoBbox(classes=['Circle', 'Rectangle'])

input_.dir
'images'

The final part in setting up the Ipyannotator is the configuration of the Annotator factory with the pair of input/output data.

The factory allows three types of annotator tools: explore, create, improve. The next sections will guide you through every step.

anni = Annotator(input_, NoOutput(), settings_)

Explore

The explore option allows users to navigate across the images in the dataset using next/previous buttons. This function is used for data visualization only, improvement and additional labeling is done in the next steps.

When exploring the artificial dataset used in this tutorial you will see a red circle and a gray rectangle as the objects to be tracked. The black square represents an occlusion on the objects and is used to illustrate how the improve step works.

explorer = anni.explore()
explorer

Create

The create option allows users to manually create their annotated datasets. Please be aware that

Warning

The video annotator create option is a beta version

Currently, video annotation allows users to draw multiple bounding boxes in every frame and associate a label to every annotated object bounding box. Ipyannotator generates the objects creating indexed labels that start from 0.

The next cell removes already created annotation files to create a new dataset.

dirpath = 'data/artificial/create_results'
if os.path.exists(dirpath) and os.path.isdir(dirpath):
    shutil.rmtree(dirpath)

The next cell initializes the create option.

For this tutorial, a function was defined that imitates human work, annotating the images automatically.

anni.output_item = output_
creator = anni.create()
creator

The next cell imitate human work annotating all images automatically.

HELPER = Tutorial(dataset, settings_.project_path)
annotations = HELPER.annotate_video_bboxes(creator)

All data is stored in a file formatted as JSON in the following format:

data_format = {k: v for i, (k, v) in enumerate(annotations.items()) if i == 0}
print(json.dumps(data_format, indent=2))
{
  "data/artificial/images/0000.jpg": {
    "bbox": [
      {
        "x": 10,
        "y": 150,
        "width": 30,
        "height": 10,
        "id": "0"
      },
      {
        "x": 30,
        "y": 30,
        "width": 40,
        "height": 40,
        "id": "1"
      }
    ],
    "labels": [
      [
        "Rectangle"
      ],
      [
        "Circle"
      ]
    ]
  }
}

Note that in the JSON file above the annotations of each frame is mapped by the path of the image. Every bounding box drawn in the annotators has the properties: x, y, width, height, id as part of the bbox field. The annotation labels are mapped in the labels field in the JSON file. Every index of the labels array corresponds to the object mapped in the bbox property.

Improve

The improve feature in the Ipyannotator video annotation allows users to refine the annotated dataset. This includes:

  • Select objects across the frames and join the trajectories drawn.

  • Update labels across the entire annotation.

In the example below we have an occlusion illustrated by a black square. The rectangle disappears behind the occluding object and appears again with a new object id. The video annotator allows users to join the trajectories of different objects into a new object.

Joining trajectory:

  • Navigate across the annotator

  • Note that the gray rectangle disappears

  • Note that the gray rectangle reappears but with a new id

  • Select the rectangle with a new id (marking the checkbox)

  • Navigate back until you see the old gray rectangle id

  • Select the rectangle with the old id (marking the checkbox)

  • Click on the join button

improver = anni.improve()
improver