Tutorials

Data Management Tutorial

Tutorials for data management

Connect Cloud Storage

Setup integration with GCS/S3/Azure

Connect Cloud Storage

If you already have your data managed and organized on a cloud storage service, such as GCS/S3/Azure, you may want to utilize that with Dataloop, and not upload the binaries and create duplicates.

Cloud Storage Integration

Access & Permissions - Creating an integration with GCS/S2/Azure cloud requires adding a key/secret with the following permissions:

List (Mandatory) - allowing Dataloop to list all of the items in the storage. Get (Mandatory) - get the items and perform pre-process functionalities like thumbnails, item info etc. Put / Write (Mandatory) - lets you upload your items directly to the external storage from the Dataloop platform. Delete - lets you delete your items directly from the external storage using the Dataloop platform.

Create Integration With GCS
Creating an integration GCS requires having JSON file with GCS configuration.
import dtlpy as dl
if dl.token_expired():
    dl.login()
organization = dl.organizations.get(organization_name=org_name)
with open(r"C:\gcsfile.json", 'r') as f:
    gcs_json = json.load(f)
gcs_to_string = json.dumps(gcs_json)
organization.integrations.create(name='gcsintegration',
                                 integrations_type=dl.ExternalStorage.GCS,
                                 options={'key': '',
                                          'secret': '',
                                          'content': gcs_to_string})
Create Integration With S3
import dtlpy as dl
if dl.token_expired():
    dl.login()
organization = dl.organizations.get(organization_name='my-org')
organization.integrations.create(name='S3integration', integrations_type=dl.ExternalStorage.S3,
                                 options={'key': "my_key", 'secret': "my_secret"})
Create Integration With Azure
import dtlpy as dl
if dl.token_expired():
    dl.login()
organization = dl.organizations.get(organization_name='my-org')
organization.integrations.create(name='azureintegration',
                                 integrations_type=dl.ExternalStorage.AZUREBLOB,
                                 options={'key': 'my_key',
                                          'secret': 'my_secret',
                                          'clientId': 'my_clientId',
                                          'tenantId': 'my_tenantId'})
Storage Driver

Once you have an integration, you can set up a driver, which adds a specific bucket (and optionally with a specific path/folder) as a storage resource.

Create Drivers in the Platform (browser)
# param name: the driver name
# param driver_type: ExternalStorage.S3, ExternalStorage.GCS , ExternalStorage.AZUREBLOB
# param integration_id: the integration id
# param bucket_name: the external bucket name
# param project_id:
# param allow_external_delete:
# param region: relevant only for s3 - the bucket region
# param storage_class: relevant only for s3
# param path: Optional. By default, path is the root folder. Path is case sensitive.
# return: driver object
import dtlpy as dl
driver = dl.drivers.create(name='driver_name', driver_type=dl.ExternalStorage.S3, integration_id='integration_id',
                           bucket_name='bucket_name', project_id='project_id',
                           allow_external_delete=True,
                           region='eu-west-1', storage_class="", path="")

Manage Datasets

Create and manage Datasets and connect them with your cloud storage

Manage Datasets

Datasets are buckets in the dataloop system that hold a collection of data items of any type, regardless of their storage location (on Dataloop storage or external cloud storage).

Create Dataset

You can create datasets within a project. There are no limits to the number of dataset a project can have, which correlates with data versioning where datasets can be cloned and merged.

dataset = project.datasets.create_and_shlomi(dataset_name='my-dataset-name')
Create Dataset With Cloud Storage Driver

If you’ve created an integration and driver to your cloud storage, you can create a dataset connected to that driver. A single integration (for example: S3) can have multiple drivers (per bucket or even per folder), so you need to specify that.

project = dl.projects.get(project_name='my-project-name')
# Get your drivers list
project.drivers.list().print()
# Create a dataset from a driver name. You can also create by the driver ID.
dataset = project.datasets.create(driver='my_driver_name', dataset_name="my_dataset_name")
Retrieve Datasets

You can read all datasets that exist in a project, and then access the datasets by their ID (or name).

datasets = project.datasets.list()
dataset = project.datasets.get(dataset_id='my-dataset-id')
Create Directory

A dataset can have multiple directories, allowing you to manage files by context, such as upload time, working batch, source, etc.

dataset.items.make_dir(directory="/directory/name")
Hard-copy a Folder to Another Dataset

You can create a clone of a folder into a new dataset, but if you want to actually move between datasets a folder with files that are stored in the Dataloop system, you’ll need to download the files and upload again to the destination dataset.

copy_annotations = True
flat_copy = False  # if true, it copies all dir files and sub dir files to the destination folder without sub directories
source_folder = '/source_folder'
destination_folder = '/destination_folder'
source_project_name = 'source_project_name'
source_dataset_name = 'source_dataset_name'
destination_project_name = 'destination_project_name'
destination_dataset_name = 'destination_dataset_name'
# Get source project dataset
project = dl.projects.get(project_name=source_project_name)
dataset_from = project.datasets.get(dataset_name=source_dataset_name)
source_folder = source_folder.rstrip('/')
# Filter to get all files of a specific folder
filters = dl.Filters()
filters.add(field='filename', values=source_folder + '/**')  # Get all items in folder (recursive)
pages = dataset_from.items.list(filters=filters)
# Get destination project and dataset
project = dl.projects.get(project_name=destination_project_name)
dataset_to = project.datasets.get(dataset_name=destination_dataset_name)
# Go over all projects and copy file from src to dst
for page in pages:
    for item in page:
        # Download item (without save to disk)
        buffer = item.download(save_locally=False)
        # Give the item's name to the buffer
        if flat_copy:
            buffer.name = item.name
        else:
            buffer.name = item.filename[len(source_folder) + 1:]
        # Upload item
        print("Going to add {} to {} dir".format(buffer.name, destination_folder))
        new_item = dataset_to.items.upload(local_path=buffer, remote_path=destination_folder)
        if not isinstance(new_item, dl.Item):
            print('The file {} could not be upload to {}'.format(buffer.name, destination_folder))
            continue
        print("{} has been uploaded".format(new_item.filename))
        if copy_annotations:
            new_item.annotations.upload(item.annotations.list())

Data Versioning

How to manage versions

Data Versioning

Dataloop’s powerful data versioning provides you with unique tools for data management - clone, merge, slice & dice your files, to create multiple versions for various applications. Sample use cases include: Golden training sets management Reproducibility (dataset training snapshot) Experimentation (creating subsets from different kinds) Task/Assignment management Data Version “Snapshot” - Use our versioning feature as a way to save data (items, annotations, metadata) before any major process. For example, a snapshot can serve as a roll-back mechanism to original datasets in case of any error without losing the data.

Clone Datasets

Cloning a dataset creates a new dataset with the same files as the original. Files are actually a reference to the original binary and not a new copy of the original, so your cloud data remains safe and protected. When cloning a dataset, you can add a destination dataset, remote file path, and more…

dataset = project.datasets.get(dataset_id='my-dataset-id')
dataset.clone(clone_name='clone-name',
              filters=None,
              with_items_annotations=True,
              with_metadata=True,
              with_task_annotations_status=True)
Merge Datasets

Dataset merging outcome depends on how similar or different the datasets are.

  • Cloned Datasets - items, annotations, and metadata will be merged. This means that you will see annotations from different datasets on the same item.

  • Different datasets (not clones) with similar recipes - items will be summed up, which will cause duplication of similar items.

  • Datasets with different recipes - Datasets with different default recipes cannot be merged. Use the ‘Switch recipe’ option on dataset level (3-dots action button) to match recipes between datasets and be able to merge them.

dataset_ids = ["dataset-1-id", "dataset-2-id"]
project_ids = ["dataset-1-project-id", "dataset-2-project-id"]
dataset_merge = dl.datasets.merge(merge_name="my_dataset-merge",
                                  project_ids=project_ids,
                                  dataset_ids=dataset_ids,
                                  with_items_annotations=True,
                                  with_metadata=False,
                                  with_task_annotations_status=False)

Upload and Manage Data and Metadata

Upload data items and metadata

Upload & Manage Data & Metadata

Upload specific files

When you have specific files you want to upload, you can upload them all into a dataset using this script:

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
dataset.items.upload(local_path=[r'C:/home/project/images/John Morris.jpg',
                                 r'C:/home/project/images/John Benton.jpg',
                                 r'C:/home/project/images/Liu Jinli.jpg'],
                     remote_path='/folder_name')  # Remote path is optional, images will go to the main directory by default
Upload all files in a folder

If you want to upload all files from a folder, you can do that by just specifying the folder name:

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
dataset.items.upload(local_path=r'C:/home/project/images',
                     remote_path='/folder_name')  # Remote path is optional, images will go to the main directory by default
Upload Items and Annotations Metadata

You can upload items as a table using a pandas data frame that will let you upload items with info (annotations, metadata such as confidence, filename, etc.) attached to it.

import pandas
import dtlpy as dl
dataset = dl.datasets.get(dataset_id='id')  # Get dataset
to_upload = list()
# First item and info attached:
to_upload.append({'local_path': r"E:\TypesExamples\000000000064.jpg",  # Item file path
                  'local_annotations_path': r"E:\TypesExamples\000000000776.json",  # Annotations file path
                  'remote_path': "/first",  # Dataset folder to upload the item to
                  'remote_name': 'f.jpg',  # Dataset folder name
                  'item_metadata': {'user': {'dummy': 'fir'}}})  # Added user metadata
# Second item and info attached:
to_upload.append({'local_path': r"E:\TypesExamples\000000000776.jpg",  # Item file path
                  'local_annotations_path': r"E:\TypesExamples\000000000776.json",  # Annotations file path
                  'remote_path': "/second",  # Dataset folder to upload the item to
                  'remote_name': 's.jpg',  # Dataset folder name
                  'item_metadata': {'user': {'dummy': 'sec'}}})  # Added user metadata
df = pandas.DataFrame(to_upload)  # Make data into table
items = dataset.items.upload(local_path=df,
                             overwrite=True)  # Upload table to platform

Upload and Manage Annotations

Upload annotations into data items

Upload & Manage Annotations

import dtlpy as dl
item = dl.items.get(item_id="")
annotation = item.annotations.get(annotation_id="")
annotation.metadata["user"] = True
annotation.update()
Upload User Metadata

To upload annotations from JSON and include the user metadata, add the parameter local_annotation_path to the dataset.items.upload function, like so:

project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
dataset.items.upload(local_path=r'<items path>',
                     local_annotations_path=r'<annotation json file path>',
                     item_metadata=dl.ExportMetadata.FROM_JSON,
                     overwrite=True)
Convert Annotations To COCO Format
converter = dl.Converter()
converter.upload_local_dataset(
    from_format=dl.AnnotationFormat.COCO,
    dataset=dataset,
    local_items_path=r'C:/path/to/items',
    # Please make sure the names of the items are the same as written in the COCO JSON file
    local_annotations_path=r'C:/path/to/annotations/file/coco.json'
)
Upload Entire Directory and their Corresponding Dataloop JSON Annotations
# Local path to the items folder
# If you wish to upload items with your directory tree use : r'C:/home/project/images_folder'
local_items_path = r'C:/home/project/images_folder/*'
# Local path to the corresponding annotations - make sure the file names fit
local_annotations_path = r'C:/home/project/annotations_folder'
dataset.items.upload(local_path=local_items_path,
                     local_annotations_path=local_annotations_path)
Upload Annotations To Video Item

Uploading annotations to video items needs to consider spanning between frames, and toggling visibility (occlusion). In this example, we will use the following CSV file. In this file there is a single ‘person’ box annotation that begins on frame number 20, disappears on frame number 41, reappears on frame number 51 and ends on frame number 90.

Video_annotations_example.CSV

import pandas as pd
# Read CSV file
df = pd.read_csv(r'C:/file.csv')
# Get item
item = dataset.items.get(item_id='my_item_id')
builder = item.annotations.builder()
# Read line by line from the csv file
for i_row, row in df.iterrows():
    # Create box annotation from csv rows and add it to a builder
    builder.add(annotation_definition=dl.Box(top=row['top'],
                                             left=row['left'],
                                             bottom=row['bottom'],
                                             right=row['right'],
                                             label=row['label']),
                object_visible=row['visible'],  # Support hidden annotations on the visible row
                object_id=row['annotation id'],  # Numbering system that separates different annotations
                frame_num=row['frame'])
# Upload all created annotations
item.annotations.upload(annotations=builder)

Show Annotations Over Image

After uploading items and annotations with their metadata, you might want to see some of them and perform visual validation.

To see only the annotations, use the annotation type show option.

# Use the show function for all annotation types
box = dl.Box()
# Must provide all inputs
box.show(image='',
         thickness='',
         with_text='',
         height='',
         width='',
         annotation_format='',
         color='')

To see the item itself with all annotations, use the Annotations option.

# Must input an image or height and width
annotation.show(image='',
                height='', width='',
                annotation_format='dl.ViewAnnotationOptions.*',
                thickness='',
                with_text='')

Download Data, Annotations & Metadata

The item ID for a specific file can be found in the platform UI - Click BROWSE for a dataset, click on the selected file, and the file information will be displayed in the right-side panel. The item ID is detailed, and can be copied in a single click.

Download Items and Annotations

Download dataset items and annotations to your computer folder in two separate folders. See all annotation options here.

dataset.download(local_path=r'C:/home/project/images',  # The default value is ".dataloop" folder
                 annotation_options=dl.VIEW_ANNOTATION_OPTIONS_JSON)
Multiple Annotation Options

See all annotation options here.

dataset.download(local_path=r'C:/home/project/images',  # The default value is ".dataloop" folder
                 annotation_options=[dl.VIEW_ANNOTATION_OPTIONS_MASK,
                                     dl.VIEW_ANNOTATION_OPTIONS_JSON,
                                     dl.ViewAnnotationOptions.INSTANCE])
Filter by Item and/or Annotation
  • Items filter - download filtered items based on multiple parameters, like their directory. You can also download items based on different filters. Learn all about item filters here.

  • Annotation filter - download filtered annotations based on multiple parameters like their label. You can also download items annotations based on different filters, learn all about annotation filters here. This example will download items and JSONS from a dog folder of the label ‘dog’.

# Filter items from "folder_name" directory
item_filters = dl.Filters(resource='items', field='dir', values='/dog_name')
# Filter items with dog annotations
annotation_filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION, field='label', values='dog')
dataset.download(local_path=r'C:/home/project/images',  # The default value is ".dataloop" folder
                 filters=item_filters,
                 annotation_filters=annotation_filters,
                 annotation_options=dl.VIEW_ANNOTATION_OPTIONS_JSON)
Filter by Annotations
  • Annotation filter - download filtered annotations based on multiple parameters like their label. You can also download items annotations based on different filters, learn all about annotation filters here.

item = dataset.items.get(item_id="item_id")  # Get item from dataset to be able to view the dataset colors on Mask
# Filter items with dog annotations
annotation_filters = dl.Filters(resource='annotations', field='label', values='dog')
item.download(local_path=r'C:/home/project/images',  # the default value is ".dataloop" folder
              annotation_filters=annotation_filters,
              annotation_options=dl.VIEW_ANNOTATION_OPTIONS_JSON)
Download Annotations in COCO Format
  • Items filter - download filtered items based on multiple parameters like their directory. You can also download items based on different filters, learn all about item filters here.

  • Annotation filter - download filtered annotations based on multiple parameters like their label. You can also download items annotations based on different filters, learn all about annotation filters here.

This example will download COCO from a dog items folder of the label ‘dog’.

# Filter items from "folder_name" directory
item_filters = dl.Filters(resource='items', field='dir', values='/dog_name')
# Filter items with dog annotations
annotation_filters = dl.Filters(resource='annotations', field='label', values='dog')
converter = dl.Converter()
converter.convert_dataset(dataset=dataset,
                          to_format='coco',
                          local_path=r'C:/home/coco_annotations',
                          filters=item_filters,
                          annotation_filters=annotation_filters)

Sort and Filters

DQL Filters a Pagination

Advance SDK Filters

More complex filters on items and annotations

To access the filters entity click here.

Filter Operators

To understand more about filter operators please click here.

When adding a filter, several operators are available for use:

Equal

eq -> equal (or dl.FiltersOperation.EQUAL)

For example, filter items from a specific folder directory.

import dtlpy as dl
# Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
# Filter only items from a specific folder directory
filters.add(field='dir', values='/DatasetFolderName', operator=dl.FILTERS_OPERATIONS_EQUAL)
# optional - return results sorted by ascending file name
filters.sort_by(field='filename')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Not Equal

ne -> not equal (or dl.FiltersOperation.NOT_EQUAL)

In this example, you will get all items that do not have ONLY a ‘cat’ label.

Note
This Operator is a better fit for filters of a single value because, for example, this filter will return items that have both 'cat' and 'dog' labels. View an example of a solution for the issue in the full example section at the bottom of the page.
filters = dl.Filters()
# Filter ONLY a cat label
filters.add_join(field='label', values='cat', operator=dl.FILTERS_OPERATIONS_NOT_EQUAL)
# optional - return results sorted by ascending file name
filters.sort_by(field='filename')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in the dataset: {}'.format(pages.items_count))
Greater Than

gt -> greater than (or dl.FiltersOperation.GREATER_THAN)

You will get items with a greater height (in pixels) than the given value in this example.

filters = dl.Filters()
# Filter images with a bigger height size
filters.add(field='metadata.system.height', values=height_number_in_pixels,
            operator=dl.FILTERS_OPERATIONS_GREATER_THAN)
# optional - return results sorted by ascending file name
filters.sort_by(field='filename')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Less Than

lt -> less than (or dl.FiltersOperation.LESS_THAN)

You will get items with a width (in pixels) less than the given value in this example.

filters = dl.Filters()
# Filter images with a bigger height size
filters.add(field='metadata.system.width', values=width_number_in_pixels, operator=dl.FILTERS_OPERATIONS_LESS_THAN)
# optional - return results sorted by ascending file name
filters.sort_by(field='filename')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
In a List

in -> is in a list (when using this expression, values should be a list). (or dl.FiltersOperation.IN) In this example, you will get items with dog OR cat labels.

filters = dl.Filters()
# Filter items with dog OR cat labels
filters.add_join(field='label', values=['dog', 'cat'], operator=dl.FILTERS_OPERATIONS_IN)
# optional - return results sorted by ascending file name
filters.sort_by(field='filename')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Exist

The filter param FILTERS_OPERATIONS_EXISTS checks if an attribute exists. The following example checks if there is an item with user metadata:

filters = dl.Filters()
filters.add(field='metadata.user', values=True, operator=dl.FILTERS_OPERATIONS_EXISTS)
dataset.items.list(filters=filters)
SDK defaults

Filters ignore SDK defaults like hidden items and directories or note annotations as issues. If you wish to change this behavior, you may do the following:

filters = dl.Filters(use_defaults=False)
Hidden Items and Directories

If you wish to only show hidden items & directories in your filters use this code:

filters = dl.Filters()
filters.add(field='type', values='dir')
# or
filters.pop(field='type')
Delete a Filter
filters = dl.Filters()
# For example, if you added the following filter:
filters.add(field='to-delete-field', values='value')
# Use this command to delete the filter
filters.pop(field='to-delete-field')
# or for items by their annotations
filters.pop_join(field='to-delete-annotation-field')
Full Examples
How to filter items that were created between specific dates?

In this example, you will get all of the items that were created in 2018.

import datetime, time
filters = dl.Filters()
# -- time filters -- must be in ISO format and in UTC (offset from local time). converting using datetime package as follows:
earlier_timestamp = datetime.datetime(year=2018, month=1, day=1, hour=0, minute=0, second=0,
                                      tzinfo=datetime.timezone(
                                          datetime.timedelta(seconds=-time.timezone))).isoformat()
later_timestamp = datetime.datetime(year=2019, month=1, day=1, hour=0, minute=0, second=0,
                                    tzinfo=datetime.timezone(
                                        datetime.timedelta(seconds=-time.timezone))).isoformat()
filters.add(field='createdAt', values=earlier_timestamp, operator=dl.FiltersOperations.GREATER_THAN)
filters.add(field='createdAt', values=later_timestamp, operator=dl.FiltersOperations.LESS_THAN)
# change method to OR
filters.method = dl.FiltersMethod.OR
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
How to filter items that don’t have a specific label?

In this example, you will get all items that do not have a ‘cat’ label AT ALL.

Note
This filter will NOT return items that have both 'cat' and 'dog' labels.
# Get all items
all_items = set([item.id for item in dataset.items.list().all()])
# Get all items WITH the label cat
filters = dl.Filters()
filters.add_join(field='label', values='cat')
cat_items = set([item.id for item in dataset.items.list(filters=filters).all()])
# Get the difference between the sets. This will give you a list of the items with no cat
no_cat_items = all_items.difference(cat_items)
print('Number of filtered items in dataset: {}'.format(len(no_cat_items)))
# Iterate through the ID's  - Go over all ID's and print the matching item
for item_id in no_cat_items:
    print(dataset.items.get(item_id=item_id))

Annotation Level Filters

Create filter on annotations, use DQL on an annotation level attributes

To access the filters entity click here.

The Dataloop Query Language - DQL

Using The Dataloop Query Language, you may navigate through massive amounts of data.

You can filter, sort, and update your metadata with it.

Filters

Using filters, you can filter items and get a generator of the filtered items. The filters entity is used to build such filters.

Filters - Field & Value

Filter your items or annotations using the parameters in the JSON code that represent its data within our system. Access your item/annotation JSON using to_json().

Field

Field refers to the attributes you filter by.

For example, “dir” would be used if you wish to filter items by their folder/directory.

Value

Value refers to the input by which you want to filter. For example, “/new_folder” can be the directory/folder name where the items you wish to filter are located.

Sort - Field & Value
Field

Field refers to the field you sort your items/annotations list by. For example, if you sort by filename, you will get the item list sorted in alphabetical order by filename. See the full list of the available fields here.

Value

Value refers to the list order direction. Either ascending or descending.

Filter Annotations

Filter annotations by the annotations’ JSON fields. In this example, you will get all of the note annotations in the dataset sorted by the label.

Note

See all of the items iterator options on the Iterator of Items page.

import dtlpy as dl
# Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance with annotation resource
filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION)
# Filter example - only note annotations
filters.add(field='type', values='note')
# optional - return results sorted by descending label
filters.sort_by(field='label', value=dl.FiltersOrderByDirection.DESCENDING)
pages = dataset.annotations.list(filters=filters)
# Count the annotations
print('Number of filtered annotations in dataset: {}'.format(pages.items_count))
# Iterate through the annotations - Go over all annotations and print the properties
for page in pages:
    for annotation in page:
        annotation.print()
Filter Annotations by the Annotations’ Item

add_join - filter Annotations by the annotations’ items’ JSON fields. For example, filter only box annotations from image items.

Note
See all of the items iterator options on the Iterator of Items page.
# Create filters instance
filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION)
# Filter all box annotations
filters.add(field='type', values='box')
# AND filter annotations by their items - only items that are of mimetype image
# Meaning you will get 'box' annotations of all image items
filters.add_join(field='metadata.system.mimetype', values="image*")
# optional - return results sorted by descending creation date
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered annotations list in a page object
pages = dataset.annotations.list(filters=filters)
# Count the annotations
print('Number of filtered annotations in dataset: {}'.format(pages.items_count))
Filters Method - “Or” and “And”
Filters Operators
For more advanced filters operators visit the Advanced SDK Filters page.
And

If you wish to filter annotations with the “and” logical operator, you can do so by specifying which filters will be checked with “and”.

AND is the default value and can be used without specifying the method.
In this example, you will get a list of annotations in the dataset of the type box and label car.
filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION)
# set annotation resource
filters.add(field='type', values='box', method=dl.FiltersMethod.AND)
filters.add(field='label', values='car',
            method=dl.FiltersMethod.AND)  # optional - return results sorted by ascending creation date
filters.sort_by(field='createdAt')
# Get filtered annotations list
pages = dataset.annotations.list(filters=filters)
# Count the annotations
print('Number of filtered annotations in dataset: {}'.format(pages.items_count))
Or

If you wish to filter annotations with the “or” logical operator, you can do so by specifying which filters will be checked with “or”. In this example, you will get a list of the dataset’s annotations that are either a ‘box’ or a ‘point’ type.

filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION)
# filters with or
filters.add(field='type', values='/box', method=dl.FiltersMethod.OR)
filters.add(field='type', values='/point',
            method=dl.FiltersMethod.OR)  # optional - return results sorted by descending updated date
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered annotations list
pages = dataset.annotations.list(filters=filters)
# Count the annotations
print('Number of filtered annotations in dataset: {}'.format(pages.items_count))
Delete Filtered Items

In this example, you will delete annotations that were created on 30/8/2020 at 8:17 AM.

filters = dl.Filters()
# set annotation resource
filters.resource = dl.FiltersResource.ANNOTATION
# Example - created on 30/8/2020 at 8:17 AM
filters.add(field='createdAt', values="2020-08-30T08:17:08.000Z")
dataset.annotations.delete(filters=filters)
Annotation Filtering Fields
More Filter Options
Use a dot to access parameters within curly brackets. For example use field='metadata.system.status' to filter by the annotation's status.
{
    "id": "5f576f660bb2fb455d79ffdf",
    "datasetId": "5e368bee106a76a61cf05282",
    "type": "segment",
    "label": "Planet",
    "attributes": [],
    "coordinates": [
        [
            {
                "x": 856.25,
                "y": 1031.2499999999995
            },
            {
                "x": 1081.25,
                "y": 1631.2499999999995
            },
            {
                "x": 485.41666666666663,
                "y": 1735.4166666666665
            },
            {
                "x": 497.91666666666663,
                "y": 1172.9166666666665
            }
        ]
    ],
    "metadata": {
        "system": {
            "status": null,
            "startTime": 0,
            "endTime": 1,
            "frame": 0,
            "endFrame": 1,
            "snapshots_": [
                {
                    "fixed": true,
                    "type": "transition",
                    "frame": 0,
                    "objectVisible": true,
                    "data": [
                        [
                            {
                                "x": 856.25,
                                "y": 1031.2499999999995
                            },
                            {
                                "x": 1081.25,
                                "y": 1631.2499999999995
                            },
                            {
                                "x": 485.41666666666663,
                                "y": 1735.4166666666665
                            },
                            {
                                "x": 497.91666666666663,
                                "y": 1172.9166666666665
                            }
                        ]
                    ],
                    "label": "Planet",
                    "attributes": []
                }
            ],
            "automated": false,
            "isOpen": false,
            "system": false
        },
        "user": {}
    },
    "creator": "user@dataloop.ai",
    "createdAt": "2020-09-08T11:47:50.576Z",
    "updatedBy": "user@dataloop.ai",
    "updatedAt": "2020-09-08T11:47:50.576Z",
    "itemId": "5f572f4423a69b8c83408f12",
    "url": "https://gate.dataloop.ai/api/v1/annotations/5f576f660bb2fb455d79ffdf",
    "item": "https://gate.dataloop.ai/api/v1/items/5f572f4423a69b8c83408f12",
    "dataset": "https://gate.dataloop.ai/api/v1/datasets/5e368bee106a76a61cf05282",
    "hash": "11fdc816804faf0f7266b40d1cb67aff38e5c10d"
}
Full Examples
How to filter annotations by their label?
filters = dl.Filters()
# set resource
filters.resource = dl.FiltersResource.ANNOTATION
filters.add(field='label', values='your_label_value')
pages = dataset.annotations.list(filters=filters)
# Count the annotations
print('Number of filtered annotations in dataset: {}'.format(pages.items_count))
Advanced Filtering Operators

Explore advanced filtering options on this page.

Item Level

Create filter on items, use DQL on an item level attributes

To access the filters entity click here.

The Dataloop Query Language - DQL

Using The Dataloop Query Language, you may navigate through massive amounts of data.

You can filter, sort, and update your metadata with it.

Filters

Using filters, you can filter items and get a generator of the filtered items. The filters entity is used to build such filters.

Filters - Field & Value

Filter your items or annotations using the parameters in the JSON code that represent its data within our system. Access your item/annotation JSON using to_json().

Field

Field refers to the attributes you filter by.

For example, “dir” would be used if you wish to filter items by their folder/directory.

Value

Value refers to the input by which you want to filter. For example, “/new_folder” can be the directory/folder name where the items you wish to filter are located.

Sort - Field & Value
Field

Field refers to the field you sort your items/annotations list by. For example, if you sort by filename, you will get the item list sorted in alphabetical order by filename. See the full list of the available fields here.

Value

Value refers to the list order direction. Either ascending or descending.

Filter Items

Filter items by the item’s JSON fields. In this example, you will get all annotated items in a dataset sorted by the filename.

Note
See all of the items iterator options on the Iterator of Items page.
import dtlpy as dl
# Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
# Filter only annotated items
filters.add(field='annotated', values=True)
# optional - return results sorted by ascending file name
filters.sort_by(field="filename")
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Filter Items by the Items’ Annotations

add_join - filter items by the items’ annotations JSON fields. For example, filter only items with ‘box’ annotations.

Note
See all of the items iterator options on the Iterator of Items page.
filters = dl.Filters()
# Filter all approved items
filters.add(field='metadata.system.annotationStatus', values="approved")
# AND filter items by their annotation - only items with 'box' annotations
# Meaning you will get approved items with 'box' annotations
filters.add_join(field='type', values='box')
# optional - return results sorted by descending creation date
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Filters Method - “Or” and “And”
Filters Operators
For more advanced filters operators visit the Advanced SDK Filters page.
And

If you wish to filter annotations with the “and” logical operator, you can do so by specifying which filters will be checked with “and”.

AND is the default value and can be used without specifying the method.
In this example, you will get a list of annotated items with user metadata of the field "is_automated" and value True.
filters = dl.Filters()  # filters with and
filters.add(field='annotated', values=True, method=dl.FiltersMethod.AND)
filters.add(field='metadata.user.is_automated', values=True,
            method=dl.FiltersMethod.AND)  # optional - return results sorted by ascending file name
filters.sort_by(field='name')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Or

If you wish to filter annotations with the “or” logical operator, you can do so by specifying which filters will be checked with “or”. In this example, you will get a list of items that are either on “folder1” or “folder2” directories.

filters = dl.Filters()
# filters with or
filters.add(field='dir', values='/folderName1', method=dl.FiltersMethod.OR)
filters.add(field='dir', values='/folderName2',
            method=dl.FiltersMethod.OR)  # optional - return results sorted by descending directory name
filters.sort_by(field='dir', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Update User Metadata of Filtered Items

Update Filtered Items - The ‘update_value’ must be a dictionary. The dictionary will only update user metadata. Understand more about user metadata <a href=https://dataloop.ai/docs/metadata#user-metadata” target=”_blank”>here. In this example, you will update/add user metadata (with the field “BlackDogs” and value True), to items in a specific folder ‘dogs’ with an attribute ‘black’.

filters = dl.Filters()
# For example -  filter only items in a specific folder - like 'dogs'
filters.add(field='dir', values='/dogs')
# For example - filter items by their annotation - only items with 'black' attribute
filters.add_join(field='attributes', values='black')
# to add filed BlackDogs to all filtered items and give value True
# this field will be added to user metadata
# create update order
update_values = {'BlackDogs': True}
# update
pages = dataset.items.update(filters=filters, update_values=update_values)
Delete Filtered Items

In this example, you will delete items that were created on 30/8/2020 at 8:17 AM.

filters = dl.Filters()
# For example -  filter only annotated items
filters.add(field='createdAt', values="2020-08-30T08:17:08.000Z")
dataset.items.delete(filters=filters)
Item Filtering Fields
More Filter Options
Use a dot to access parameters within curly brackets. For example use field='metadata.system.originalname' to filter by the item's original name.
{
    "id": "5f4b60848ced1d50c3df114a",
    "datasetId": "5f4b603d9825b9f191bbd3b3",
    "createdAt": "2020-08-30T08:17:08.000Z",
    "dir": "/new_folder",
    "filename": "/new_folder/optional.jpg",
    "type": "file",
    "hidden": false,
    "metadata": {
        "system": {
            "originalname": "file",
            "size": 3290035,
            "encoding": "7bit",
            "mimetype": "image/jpeg",
            "annotationStatus": [
                "completed"
            ],
            "refs": [
                {
                    "type": "task",
                    "id": "5f4b61f8f81ab6238c331bd2"
                },
                {
                    "type": "assignment",
                    "id": "5f4b61f8f81ab60508331bd3"
                }
            ],
            "executionLogs": {
                "image-metadata-extractor": {
                    "default_module": {
                        "run": {
                            "5f4b60841b892d82eaa2d95b": {
                                "progress": 100,
                                "status": "success"
                            }
                        }
                    }
                }
            },
            "exif": {},
            "height": 2734,
            "width": 4096,
            "statusLog": [
                {
                    "status": "completed",
                    "timestamp": "2020-08-30T14:54:17.014Z",
                    "creator": "user@dataloop.ai",
                    "action": "created"
                }
            ],
            "isBinary": true
        }
    },
    "name": "optional.jpg",
    "url": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a",
    "dataset": "https://gate.dataloop.ai/api/v1/datasets/5f4b603d9825b9f191bbd3b3",
    "annotationsCount": 18,
    "annotated": "discarded",
    "stream": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/stream",
    "thumbnail": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/thumbnail",
    "annotations": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/annotations"
}
Full Examples
How to filter items by their annotations label?
filters = dl.Filters()
filters.add_join(field='label', values='your_label_value')
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
How to filter items by completed and approved status?
filters = dl.Filters()
filters.add(field='metadata.system.annotationStatus', values=["completed", "approved"])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
How to filter items by completed status (with items who are approved as well)?
filters = dl.Filters()
# set resource
filters.add(field='metadata.system.annotationStatus', values="completed")
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
How to filter items by only completed status?
filters = dl.Filters()
filters.add(field='metadata.system.annotationStatus', values=["completed"])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
How to filter unassigned items?
filters = dl.Filters()
filters.add(field='metadata.system.refs', values=[])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
How to filter items by a specific folder?
filters = dl.Filters()
filters.add(field='dir', values="/folderName")
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
Get all items named foo.bar
filters = dl.Filters()
filters.add(field='name', values='foo.bar.*')
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
Sort files of size 0-5 MB by name, in ascending order
filters = dl.Filters()
filters.add(field='metadata.system.size', values='0', operator='gt')
filters.add(field='metadata.system.size', values='5242880', operator='lt')
filters.sort_by(field='filename', value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
Sort with multiple fields: Sort Items by labels ascending and createdAt descending
filters = dl.Filters()
# set annotation resource
filters.resource = dl.FiltersResource.ANNOTATION
# return results sorted by descending label
filters.sort_by(field='label', value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
Advanced Filtering Operators

Explore advanced filtering options on this page.

Response to DQL Query

A typical response to a DQL query will look like the following:

{
    "totalItemsCount": number,
    "items": Array,
    "totalPagesCount": number,
    "hasNextPage": boolean,
}
# A possible result:
{
    "totalItemsCount": 2,
    "totalPagesCount": 1,
    "hasNextPage": false,
    "items": [
        {
            "id": "5d0783852dbc15306a59ef6c",
            "createdAt": "2019-06-18T23:29:15.775Z",
            "filename": "/5546670769_8df950c6b6.jpg",
            "type": "file"
                    // ...
        },
        {
            "id": "5d0783852dbc15306a59ef6d",
            "createdAt": "2019-06-19T23:29:15.775Z",
            "filename": "/5551018983_3ce908ac98.jpg",
            "type": "file"
                    // ...
        }
    ]
}

Pagination

How to use pages and iteration over items

Pagination
Pages

We use pages instead of a list when we have an object that contains a lot of information.

The page object divides a large list into pages (with a default of 1000 items) in order to save time when going over the items.

It is the same as we display it in the annotation platform, see example here.

You can redefine the number of items on a page with the page_size attribute. When we go over the items we use nested loops to first go to the pages and then go over the items for each page.

Iterator of Items

You can create a generator of items with different filters.

import dtlpy as dl
# Get the project
project = dl.projects.get(project_name='project_name')
# Get the dataset
dataset = project.datasets.get(dataset_name='dataset_name')
# Get items in pages (1000 item per page)
filters = dl.Filters()
filters.add(field='filename', values='/your/file/path.mimetype')
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
# Go over all item and print the properties
for i_page, page in enumerate(pages):
    print('{} items in page {}'.format(len(page), i_page))
    for item in page:
        item.print()

A Page entity iterator also allows reverse iteration for cases in which you want to change items during the iteration:

# Go over all item and print the properties
for i_page, page in enumerate(reversed(pages)):
    print('{} items in page {}'.format(len(page), i_page))

If you want to iterate through all items within your filter, you can also do so without going through them page by page:

for item in pages.all():
    print(item.name)

If you are planning to do some process on each item, it’s faster to use multi-threads (or multi-process) for parallel computation. The following uses ThreadPoolExecutor with 32 workers to process parallel batches of 32 items:

from concurrent.futures import ThreadPoolExecutor
def single_item(item):
    # do some work on item
    print(item.filename)
    return True
with ThreadPoolExecutor(max_workers=32) as executor:
    executor.map(single_item, pages.all())

Lets compare the runtime to see that now the process is faster:

from concurrent.futures import ThreadPoolExecutor
import time
tic = time.time()
for item in pages.all():
    # do stuff on item
    time.sleep(1)
print('One by one took {:.2f}[s]'.format(time.time() - tic))
def single_item(item):
    # do stuff on item
    time.sleep(1)
    return True
tic = time.time()
with ThreadPoolExecutor(max_workers=32) as executor:
    executor.map(single_item, pages.all())
print('Using threads took {:.2f}[s]'.format(time.time() - tic))

Visualizing the progress with tqdm progress bar:

import tqdm
pbar = tqdm.tqdm(total=pages.items_count)
def single_item(item):
    # do stuff on item
    time.sleep(1)
    pbar.update()
    return True
with ThreadPoolExecutor(max_workers=32) as executor:
    executor.map(single_item, pages.all())
Set page_size

The following example sets the page_size to 50:

# Create filters instance
filters = dl.Filters()
# Get filtered item list in a page object, where the starting page is 1
pages = dataset.items.list(filters=filters, page_offset=1, page_size=50)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
# Print items from page 1
print('Length of first page: {}'.format(len(pages.items)))

Working with Metadata

Working with Item’s metadata

Working with Metadata

import dtlpy as dl
# Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
User Metadata

As a powerful tool to manage data based on your categories and information, you can add any keys and values to both the item’s and annotations’ user-metadata sections using the Dataloop SDK. Then, you can use your user-metadata for data filtering, sorting, etc.

Note
When adding metadata to the same item, the new metadata overwrites existing metadata. To avoid overwriting existing metadata, use the "list" data type and add to the list the new metadata.
Metadata Data Types

Metadata is a dictionary attribute used with items, annotations, and other entities of the Dataloop system (task, recipe, and more). As such, it can be used with string, number, boolean, list or null types.

String
item.metadata['user']['MyKey'] = 'MyValue'
annotation.metadata['user']['MyKey'] = 'MyValue'
Number
item.metadata['user']['MyKey'] = 3
annotation.metadata['user']['MyKey'] = 3
Boolean
item.metadata['user']['MyKey'] = True
annotation.metadata['user']['MyKey'] = True
Null – add metadata with no information
item.metadata['user']['MyKey'] = None
annotation.metadata['user']['MyKey'] = None
List
# add metadata of a list (can contain elements of different types).
item.metadata['user']['MyKey'] = ["A", 2, False]
annotation.metadata['user']['MyKey'] = ["A", 2, False]
Add new metadata to a list without losing existing data
item.metadata['user']['MyKey'].append(3)
item = item.update()
annotation.metadata['user']['MyKey'].append(3)
annotation = annotation.update()
Add metadata to an item’s user metadata
# upload and claim item
item = dataset.items.upload(local_path=r'C:/home/project/images/item.mimetype')
# or get item
item = dataset.items.get(item_id='write-your-id-number')
# modify metadata
item.metadata['user'] = dict()
item.metadata['user']['MyKey'] = 'MyValue'
# update and reclaim item
item = item.update()
Modify an existing user metadata field
# upload and claim item
item = dataset.items.upload(local_path=r'C:/home/project/images/item.mimetype')
# or get item
item = dataset.items.get(item_id='write-your-id-number')
# modify metadata
if 'user' not in item.metadata:
    item.metadata['user'] = dict()
item.metadata['user']['MyKey'] = 'MyValue'
# update and reclaim item
item = item.update()
Add metadata to annotations’ user metadata
# Get annotation
annotation = dl.annotations.get(annotation_id='my-annotation-id')
# modify metadata
annotation.metadata['user'] = dict()
item.metadata['user']['red'] = True
# update and reclaim annotation
annotation = annotation.update()
Filter items by user metadata
1. Get your dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
2. Add metadata to an item

You can also add metadata to filtered items

# upload and claim item
item = dataset.items.upload(local_path=r'C:/home/project/images/item.mimetype')
# or get item
item = dataset.items.get(item_id='write-your-id-number')
# modify metadata
item.metadata['user'] = dict()
item.metadata['user']['MyKey'] = 'MyValue'
# update and reclaim item
item = item.update()
3. Create a filter
filters = dl.Filters()
# set resource - optional - default is item
filters.resource = dl.FiltersResource.ITEM
4. Filter by your written key
filters.add(field='metadata.user.Key', values='Value')
5. Get filtered items
pages = dataset.items.list(filters=filters)
# Go over all item and print the properties
for page in pages:
    for item in page:
        item.print()

FaaS Tutorial

Tutorials for FaaS

FaaS Interactive Tutorial – Using Python & Dataloop SDK

FaaS Interactive Tutorial

FaaS Interactive Tutorial – Using Python & Dataloop SDK

Concept

Dataloop Function-as-a-Service (FaaS) is a compute service that automatically runs your code based on time patterns or in response to trigger events.

You can use Dataloop FaaS to extend other Dataloop services with custom logic. Altogether, FaaS serves as a super flexible unit that provides you with increased capabilities in the Dataloop platform and allows achieving any need while automating processes.

With Dataloop FaaS, you simply upload your code and create your functions. Following that, you can define a time interval or specify a resource event for triggering the function. When a trigger event occurs, the FaaS platform launches and manages the compute resources, and executes the function.

You can configure the compute settings according to your preferences (machine types, concurrency, timeout, etc.) or use the default settings.

Use Cases

Pre annotation processing: Resize, video assembler, video dissembler

Post annotation processing: Augmentation, crop box-annotations, auto-parenting

ML models: Auto-detection

QA models: Auto QA, consensus model, majority vote model

Introduction

Getting started with FaaS.

Introduction

This tutorial will help you get started with FaaS.

  1. Prerequisites

  2. Basic use case: Single function

  • Deploy a function as a service

  • Execute the service manually and view the output

  1. Advance use case: Multiple functions

  • Deploy several functions as a package

  • Deploy a service of the package

  • Set trigger events to the functions

  • Execute the functions and view the output and logs

First, log in to the platform by running the following Python code in the terminal or your IDE:

import dtlpy as dl
if dl.token_expired():
    dl.login()

Your browser will open a login screen, allowing you to enter your credentials or log in with Google. Once the “Login Successful” tab appears, you are allowed to close it.

This tutorial requires a project. You can create a new project, or alternatively use an existing one:

# Create a new project
project = dl.projects.create(project_name='project-sdk-tutorial')
# Use an existing project
project = dl.projects.get(project_name='project-sdk-tutorial')

Let’s create a dataset to work with and upload a sample item to it:

dataset = project.datasets.create(dataset_name='dataset-sdk-tutorial')
item = dataset.items.upload(
    local_path=['https://raw.githubusercontent.com/dataloop-ai/tiny_coco/master/images/train2017/000000184321.jpg'],
    remote_path='/folder_name')

Run Your First Function

Create and run your first FaaS in the Dataloop platform

Basic Use Case: Single Function

Create and Deploy a Sample Function

Below is an image-manipulation function in Python to use for converting an RGB image to a grayscale image. The function receives a single item, which later can be used as a trigger to invoke the function:

def rgb2gray(item: dl.Item):
    """
    Function to convert RGB image to GRAY
    Will also add a modality to the original item
    :param item: dl.Item to convert
    :return: None
    """
    import numpy as np
    import cv2
    buffer = item.download(save_locally=False)
    bgr = cv2.imdecode(np.frombuffer(buffer.read(), np.uint8), -1)
    gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
    bgr_equalized_item = item.dataset.items.upload(local_path=gray,
                                                   remote_path='/gray' + item.dir,
                                                   remote_name=item.filename)
    # add modality
    item.modalities.create(name='gray',
                           ref=bgr_equalized_item.id)
    item.update(system_metadata=True)

You can now deploy the function as a service using Dataloop SDK. Once the service is ready, you may execute the available function on any input:

project = dl.projects.get(project_name='project-sdk-tutorial')
service = project.services.deploy(func=rgb2gray,
                                  service_name='grayscale-item-service')
Execute the function

An execution means running the function on a service with specific inputs (arguments). The execution input will be provided to the function that the execution runs.

Now that the service is up, it can be executed manually (on-demand) or automatically, based on a set trigger (time/event). As part of this tutorial, we will demonstrate how to manually run the “RGB to Gray” function.

To see the item we uploaded, run the following code:

item.open_in_web()

Multiple Function

Create a Package with multiple functions and modules

Advanced Use Case: Multiple Functions

Create and Deploy a Package of Several Functions

First, login to the Dataloop platform:

import dtlpy as dl
if dl.token_expired():
    dl.login()

Let’s define the project and dataset you will work with in this tutorial. create a new project and dataset:

project = dl.projects.create(project_name='project-sdk-tutorial')
project.datasets.create(dataset_name='dataset-sdk-tutorial')

To use an existing project and dataset:

project = dl.projects.get(project_name='project-sdk-tutorial')
dataset = project.datasets.get(dataset_name='dataset-sdk-tutorial')
Write your code

The following code consists of two image-manipulation methods:

  • RGB to grayscale over an image

  • CLAHE Histogram Equalization over an image - Contrast Limited Adaptive Histogram Equalization (CLAHE) to equalize images

To proceed with this tutorial, copy the following code and save it as a main.py file.

import dtlpy as dl
import cv2
import numpy as np
class ImageProcess(dl.BaseServiceRunner):
    @staticmethod
    def rgb2gray(item: dl.Item):
        """
        Function to convert RGB image to GRAY
        Will also add a modality to the original item
        :param item: dl.Item to convert
        :return: None
        """
        buffer = item.download(save_locally=False)
        bgr = cv2.imdecode(np.frombuffer(buffer.read(), np.uint8), -1)
        gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
        gray_item = item.dataset.items.upload(local_path=gray,
                                              remote_path='/gray' + item.dir,
                                              remote_name=item.filename)
        # add modality
        item.modalities.create(name='gray',
                               ref=gray_item.id)
        item.update(system_metadata=True)
    @staticmethod
    def clahe_equalization(item: dl.Item):
        """
        Function to perform histogram equalization (CLAHE)
        Will add a modality to the original item
        Based on opencv https://docs.opencv.org/4.x/d5/daf/tutorial_py_histogram_equalization.html
        :param item: dl.Item to convert
        :return: None
        """
        buffer = item.download(save_locally=False)
        bgr = cv2.imdecode(np.frombuffer(buffer.read(), np.uint8), -1)
        # create a CLAHE object (Arguments are optional).
        lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
        lab_planes = cv2.split(lab)
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        lab_planes[0] = clahe.apply(lab_planes[0])
        lab = cv2.merge(lab_planes)
        bgr_equalized = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
        bgr_equalized_item = item.dataset.items.upload(local_path=bgr_equalized,
                                                       remote_path='/equ' + item.dir,
                                                       remote_name=item.filename)
        # add modality
        item.modalities.create(name='equ',
                               ref=bgr_equalized_item.id)
        item.update(system_metadata=True)
Define the module

Multiple functions may be defined in a single package under a “module” entity. This way you will be able to use a single codebase for various services.

Here, we will create a module containing the two functions we discussed. The “main.py” file you downloaded is defined as the module entry point. Later, you will specify its directory file path.

modules = [dl.PackageModule(name='image-processing-module',
                            entry_point='main.py',
                            class_name='ImageProcess',
                            functions=[dl.PackageFunction(name='rgb2gray',
                                                          description='Converting RGB to gray',
                                                          inputs=[dl.FunctionIO(type=dl.PackageInputType.ITEM,
                                                                                name='item')]),
                                       dl.PackageFunction(name='clahe_equalization',
                                                          description='CLAHE histogram equalization',
                                                          inputs=[dl.FunctionIO(type=dl.PackageInputType.ITEM,
                                                                                name='item')])
                                       ])]
Push the package

When you deployed the service in the previous tutorial (“Single Function”), a module and a package were automatically generated.

Now we will explicitly create and push the module as a package in the Dataloop FaaS library (application hub). For that, please specify the source path (src_path) of the “main.py” file you downloaded, and then run the following code:

src_path = 'functions/opencv_functions'
project = dl.projects.get(project_name='project-sdk-tutorial')
package = project.packages.push(package_name='image-processing',
                                modules=modules,
                                src_path=src_path)
Deploy a service

Now that the package is ready, it can be deployed to the Dataloop platform as a service. To create a service from a package, you need to define which module the service will serve. Notice that a service can only contain a single module. All the module functions will be automatically added to the service.

Multiple services can be deployed from a single package. Each service can get its own configuration: a different module and settings (computing resources, triggers, UI slots, etc.).

In our example, there is only one module in the package. Let’s deploy the service:

service = package.services.deploy(service_name='image-processing',
                                  runtime=dl.KubernetesRuntime(concurrency=32),
                                  module_name='image-processing-module')
Trigger the service

Once the service is up, we can configure a trigger to automatically run the service functions. When you bind a trigger to a function, that function will execute when the trigger fires. The trigger is defined by a given time pattern or by an event in the Dataloop system.

Event based trigger is related to a combination of resource and action. A resource can be any entity in our system (item, dataset, annotation, etc.) and the associated action will define a change in the resource that will prompt the trigger (update, create, delete). You can only have one resource per trigger.

The resource object that triggered the function will be passed as the function’s parameter (input).

Let’s set a trigger in the event a new item is created:

filters = dl.Filters()
filters.add(field='datasetId', values=dataset.id)
trigger = service.triggers.create(name='image-processing2',
                                  function_name='clahe_equalization',
                                  execution_mode=dl.TriggerExecutionMode.ONCE,
                                  resource=dl.TriggerResource.ITEM,
                                  actions=dl.TriggerAction.CREATED,
                                  filters=filters)

In the defined filters we specified a dataset. Once a new item is uploaded (created) in this dataset, the CLAHE function will be executed for this item. You can also add filters to specify the item type (image, video, JSON, directory, etc.) or a certain format (jpeg, jpg, WebM, etc.).

A separate trigger must be set for each function in your service. Now, we will define a trigger for the second function in the module rgb2gray. Each time an item is updated, invoke the rgb2gray function:

trigger = service.triggers.create(name='image-processing-rgb',
                                  function_name='rgb2gray',
                                  execution_mode=dl.TriggerExecutionMode.ALWAYS,
                                  resource=dl.TriggerResource.ITEM,
                                  actions=dl.TriggerAction.UPDATED,
                                  filters=filters)

To trigger the function only once (only on the first item update), set TriggerExecutionMode.ONCE instead of TriggerExecutionMode.ALWAYS.

Execute the function

Now we can upload (“create”) an image to our dataset to trigger the service. The function clahe_equalization will be invoked:

item = dataset.items.upload(
    local_path=['https://raw.githubusercontent.com/dataloop-ai/tiny_coco/master/images/train2017/000000463730.jpg'])

To see the original item, please click here.

Review the function’s logs

You can review the execution log history to check that your execution succeeded:

service.log()

The transformed image will be saved in your dataset. Once you see in the log that the execution succeeded, you may open the item to see its transformation:

item.open_in_web()
Pause the service:

We recommend pausing the service you created for this tutorial so it will not be triggered:

service.pause()

Congratulations! You have successfully created, deployed, and tested Dataloop functions!

Task Workflows

Tutorials for workforce management

Tasks and Assignment

Getting started with Task and Assignments.

Create Annotation Task

Getting started with Annotation Tasks.

Create a Task

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

Creating a Task with Assignments

There are a couple of ways to create a task with assignments.

1. By Folder Directory

This example will create a task for items that match a filter. The items will be divided equally between annotator’s assignments:

import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
filters = dl.Filters(field='<dir>', values='</my/folder/directory>')  # filter by directory
task = dataset.tasks.create(
    task_name='<task_name>',
    due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'],
    # The items will be divided equally between assignments
    filters=filters  # filter by folder directory or use other filters
)
2. By Filters

This example will create a task for items that match a filter. The items will be divided equally between the annotator’s assignments:

Note
These examples are for creating a task from items without annotations.
You can also create tasks based on different filters, learn all about filters here.
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
# filter items without annotations
filters = dl.Filters(field='<annotated>', values=False)
task = dataset.tasks.create(
    task_name='<task_name>',
    due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'],
    # The items will be divided equally between assignments
    filters=filters  # filter items without annotations or use other filters
)
3. List of Items

Create a task from a list of items. The items will be divided equally between annotator’s assignments:

import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
items = dataset.items.list()
items_list = [item for item in items.all()]
task = dataset.tasks.create(
    task_name='<task_name>',
    due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'],
    # The items will be divided equally between assignments
    items=items_list
)
4. Full Dataset

Create a task from all of the items in the dataset. The items will be divided equally between annotator’s assignments:

import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
task = dataset.tasks.create(
    task_name='<task_name>',
    due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>']
    # The items will be divided equally between assignments
)
Add items to an existing task

Adding items to an existing task will create new assignments (for new assignee/s).

1. By Filters
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
filters = dl.Filters(field='<metadata.system.refs>', values=[])  # filter on unassigned items
task.add_items(
    filters=filters,  # filter by folder directory or use other filters
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'])
2. Single Item
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
item = dataset.items.get(item_id='<my-item-id>')
task.add_items(
    items=[item],
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'])
3. List of Items
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
items = dataset.items.list()
items_list = [item for item in items.all()]
task.add_items(
    items=items_list,
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>']
)

Create Annotation Assignment

Getting started with Annotation Assignment.

Task Assignment

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

Item Review

The Annotation Studio is built for realtime review, task assignment and feedback.

Each item can be classified in 3 ways:

  • Discarded: Items that are not relevant for labeling

  • Complete (or an alternate custom status created by the task creator): Items after an annotation process

  • Approved (or an alternate custom status created by the task creator): Completed items after a QA process

Prep
import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
Single status update
# Mark single item as completed
item = dataset.items.get(item_id='<my-item-id>')
item.update_status(status=dl.ItemStatus.COMPLETED)
# In the same way you can update to another status
item.update_status(status=dl.ItemStatus.APPROVED)
item.update_status(status=dl.ItemStatus.DISCARDED)
Clear status
# Clear status for completed/approved/discarded
item.update_status(dl.ITEM_STATUS_COMPLETED, clear=True)
Bulk status update
# With items list
filters = dl.Filters(field='<annotated>', values=True)
items = dataset.items.list(filters=filters)
dataset.items.update_status(status=dl.ItemStatus.APPROVED, items=items)
# With filters
filters = dl.Filters(field='<annotated>', values=True)
dataset.items.update_status(status=dl.ItemStatus.DISCARDED, filters=filters)
# With list of item ids
item_ids = ['<id1>', '<id2>', '<id3>']
dataset.items.update_status(status=dl.ItemStatus.COMPLETED, item_ids=item_ids)
Example

To mark an entire task as completed use the following:

task = dataset.tasks.get(task_name='<my-task-name>')
dataset.items.update_status(status=dl.ItemStatus.COMPLETED, items=task.get_items())

Redistribute and Reassign

Redistribute and reassign items from tasks and assignments

Redistributing and Reassigning a Task

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

Get Task and Assignments
Get Task
Get by ID
task = dl.tasks.get(task_id='<my-task-id>')
Get by name – in a project
project = dl.projects.get(project_name='<project_name>')
task = project.tasks.get(task_name='<my-task-name>')
Get by name – in a dataset
dataset = project.datasets.get(dataset_name='<dataset_name>')
task = project.tasks.get(task_name='<my-task-name>')
Get list – in a project
dataset = project.datasets.get(dataset_name='<dataset_name>')
task = project.tasks.get(task_name='<my-task-name>')
Get list – in a dataset
tasks = project.tasks.list()
Get Task Items
tasks = dataset.tasks.list()
Get Assignments
Get by ID
assignment = dl.assignments.get(assignment_id='<my-assignment-id>')
Get by name – in a project
project = dl.projects.get(project_name='<project_name>')
assignment = project.assignments.get(assignment_name='<my-assignment-name>')
Get by name – in a dataset
dataset = project.datasets.get(dataset_name='<dataset_name>')
assignment = dataset.assignments.get(assignment_name='<my-assignment-name>')
Get by name – in a task
task = project.tasks.get(task_name='<my-task-name>')
assignment = task.assignments.get(assignment_name='<my-assignment-name>')
Get list – in a project
assignments = project.assignments.list()
Get list – in a dataset
assignments = dataset.assignments.list()
Get list – in a task
assignments = task.assignments.list()
Get Assignment Items
assignment_items = assignment.get_items()
Redistribute and Reassign the Assignment
prep
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
task = dl.tasks.get(task_id='<my-task-id>')
assignment = task.assignments.get(assignment_name='<my-assignment-name>')
Redistribute
# load is the workload percentage for each annotator
assignment.redistribute(dl.Workload([dl.WorkloadUnit(assignee_id='<annotator1@dataloop.ai>', load=50),
                                     dl.WorkloadUnit(assignee_id='<annotator2@dataloop.ai>', load=50)]))
Reassign
assignment.reassign(assignee_ids['<annotator1@dataloop.ai>'])
Delete Task and Assignments
Delete Task
Note
In case you delete a task, it will delete all its assignments as well.
task.delete()

QA Tasks Management

Create QA tasks and annotation-qa flows

Create QA Task

Getting started with QA Tasks.

Create a QA Task

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

In Dataloop there are two ways to create a QA task:

  1. You can create a QA task from the annotation task. This will collect all completed Items and create a QA Task.

  2. You can create a standalone QA task.

QA task from the annotation task
prep
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
# Get the annotation task, you can also get a task by name or from a list
task = project.tasks.get(task_id='<my-task-id>')
2. Create a QA Task

This action will collect all completed Items and create a QA Task under the annotation task.

Note
Adding filters is optional. Learn all about filters here.
# Add filter for completed items
filters = dl.Filters()
filters.add(field='<metadata.system.annotationStatus>', values='<completed>')
# create a QA task - fill in the due date and assignees.
QAtask = dataset.tasks.create_qa_task(task=task,
                                      due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
                                      assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'],
                                      filters=filters  # this filter is for "completed items"
                                      )
A standalone QA task
prep
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
2. Add filter by directory
Note
Adding filters is optional. Learn all about filters here.
filters = dl.Filters(field='<metadata.system.annotationStatus>', values='<completed>')
filters.add(field='<dir>', values='</my/folder/directory>')
Create a QA Task

This action will collect all items on the folder and create a QA Task from them.

QAtask = dataset.tasks.create(
    task_type='<qa>',
    due_date=datetime.datetime(day=1, month=1, year=2029).timestamp(),
    assignee_ids=['<annotator1@dataloop.ai>', '<annotator2@dataloop.ai>'],
    filters=filters  # filter by folder directory or use other filters
)

Create QA Assignment

Getting started with QA Assignment.

Create Note annotation on items

Note Annotation

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

The Annotation Studio also enables real time dialog in the studio. The note annotation allows annotators and reviewers the option to add an issue directly to the item as an annotation.

Prep
import dtlpy as dl
if dl.token_expired():
    dl.login()
Init Note

With message inside and top, bottom, left, right positioning Using the annotations definitions classes you can create, edit, view and upload platform annotations.

annotation_definition=dl.Note(top=10,left=10, bottom=100, right=100,label='my-label')
annotation_definition.assignee = "user@dataloop.ai"
annotation_definition.add_message("this is a message 1")
annotation_definition.add_message("this is a message 2")
Create Note Annotation
1. Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
2. Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
3. Create a builder instance
builder = item.annotations.builder()
4. Add a note
annotation_definition=dl.Note(top=10,left=10, bottom=100, right=100,label='my-label')
annotation_definition.assignee = "user@dataloop.ai"
annotation_definition.add_message("this is a message 1")
annotation_definition.add_message("this is a message 2")
builder.add(annotation_definition=annotation_definition)
5. Upload annotations to the item
item.annotations.upload(builder)

Annotation level QA

QA on Annotation Level

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

ItemAnnotations Review

The Annotation Studio also enables direct feedback for specific annotations. To enable a realtime review, a Reviewer can open an issue on an Annotation. The Annotator (person who annotated the issued Annotation) then receives the issue, fixes it and sends it back for a second review. The Reviewer may approve the fix or return it as an issue.

We also support a real-time dialog on items as an annotation, go to Note Annotation to learn more.

Prep
import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
Single status update
# Mark a single annotation with an open issue
item = dataset.items.get(item_id='my-item-id')
annotation = item.annotations.get(annotation_id='your-annotation-id-number')
annotation.update_status(dl.AnnotationStatus.ISSUE)
# In the same way you can update to another status
annotation.update_status(dl.AnnotationStatus.APPROVED)
annotation.update_status(dl.AnnotationStatus.REVIEW)
annotation.update_status(dl.AnnotationStatus.CLEAR)  # Have the annotation without status
Bulk status update
# Get Task
task = project.tasks.get(task_id='my_task_id')
# Add filters for items in the task who have annotations with issues
filters = dl.Filters()
filters.add_join(field='metadata.system.status', values='issue')
items = task.get_items(filters=filters)
# Go over all of the items
for page in items:
    for item in page:
        # Add filter for annotations with issues
        filters = dl.Filters()
        filters.resource = dl.FiltersResource.ANNOTATION
        filters.add(field='metadata.system.status', values='issue')
        annotations = item.annotations.list(filters=filters)
        # For every annotation that has issue in the item update the status to "for review"
        for annotation in annotations:           annotation.update_status(dl.AnnotationStatus.REVIEW)

Item level QA

QA on Item Level

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

Item Review

The Annotation Studio is built for realtime review, task assignment and feedback.

Each item can be classified in 3 ways:

  • Discarded: Items that are not relevant for labeling

  • Complete (or an alternate custom status created by the task creator): Items after an annotation process

  • Approved (or an alternate custom status created by the task creator): Completed items after a QA process

Prep
import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
Single status update
# Mark single item as completed
item = dataset.items.get(item_id='<my-item-id>')
item.update_status(status=dl.ItemStatus.COMPLETED)
# In the same way you can update to another status
item.update_status(status=dl.ItemStatus.APPROVED)
item.update_status(status=dl.ItemStatus.DISCARDED)
Clear status
# Clear status for completed/approved/discarded
item.update_status(dl.ITEM_STATUS_COMPLETED, clear=True)
Bulk status update
# With items list
filters = dl.Filters(field='annotated', values=True)
items = dataset.items.list(filters=filters)
dataset.items.update_status(status=dl.ItemStatus.APPROVED, items=items)
# With filters
filters = dl.Filters(field='annotated', values=True)
dataset.items.update_status(status=dl.ItemStatus.DISCARDED, filters=filters)
# With list of item ids
item_ids = ['id1', 'id2', 'id3']
dataset.items.update_status(status=dl.ItemStatus.COMPLETED, item_ids=item_ids)
Example

To mark an entire task as completed use this:

task = dataset.tasks.get(task_name='my-task-name')
dataset.items.update_status(status=dl.ItemStatus.COMPLETED, items=task.get_items())

Redistribute and Reassign

Redistribute and reassign items from tasks and assignments

Redistributing and Reassigning a QA Task

To reach the tasks and assignments repositories go to tasks and assignments.

To reach the tasks and assignments entities go to tasks and assignments.

Get QA Task and Assignments
Get Task
Get by ID
QAtask = dl.tasks.get(task_id='<my-task-id>')
Get by name – in a project
project = dl.projects.get(project_name='<project_name>')
QAtask = project.tasks.get(task_name='<my-qa-task-name>')
Get by name – in a dataset
dataset = project.datasets.get(dataset_name='<dataset_name>')
QAtask = project.tasks.get(task_name='<my-qa-task-name>')
Get list – in a project
tasks = project.tasks.list()
Get list – in a dataset
tasks = dataset.tasks.list()
Get Task Items
qa_task_items = QAtask.get_items()
Get Assignments
Get by ID
assignment = dl.assignments.get(assignment_id='<my-assignment-id>')
Get by name – in a project
project = dl.projects.get(project_name='<project_name>')
assignment = project.assignments.get(assignment_name='<my-assignment-name>')
Get by name – in a dataset
dataset = project.datasets.get(dataset_name='<dataset_name>')
assignment = dataset.assignments.get(assignment_name='<my-assignment-name>')
Get by name – in a task
task = project.tasks.get(task_name='<my-task-name>')
assignment = task.assignments.get(assignment_name='<my-assignment-name>')
Get list – in a project
assignments = project.assignments.list()
Get list – in a dataset
assignments = dataset.assignments.list()
Get list – in a task
assignments = task.assignments.list()
Get Assignment Items
assignment_items = assignment.get_items()
Redistribute and Reassign the QA Assignment
prep
import dtlpy as dl
import datetime
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='<project_name>')
dataset = project.datasets.get(dataset_name='<dataset_name>')
QAtask = dl.tasks.get(task_id='<my-task-id>')
assignment = task.assignments.get(assignment_name='<my-assignment-name>')
Redistribute
# load is the workload percentage for each annotator
assignment.redistribute(dl.Workload([dl.WorkloadUnit(assignee_id='<annotator1@dataloop.ai>', load=50), dl.WorkloadUnit(assignee_id='<annotator2@dataloop.ai>', load=50)]))
Reassign
assignment.reassign(
ignee_ids['<annotator1@dataloop.ai>']
Delete Task and Assignments
Delete Task
Note
In case you delete a task it will delete all its assignments as well.
QAtask.delete()
Delete Assignment
assignment.delete()

Image Annotations

Tutorials for creating all types of image annotations

Setup

Setup environment before starting

This tutorial guides you through the process using the Dataloop SDK to create and upload annotations into items. The tutorial includes chapters with different tools, and the last chapter includes various more advanced scripts

  • Classification & Pose

  • Bounding box & Cuboid

  • Polygon & Polyline

  • Ellipse & Item-Description

  • Advanced tutorials

    • Copy Annotations Between Items

    • Show Images & Annotations

    • Show Annotations from JSON file

    • Count the Total Number of Annotations in a Dataset

    • Parenting Annotations

    • Change Annotation’s Label to a New Label

    • Append Attribute to an Existing Label

Setup

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')

Initiation

Using the annotation definitions classes you can create, edit, view and upload platform annotations. Each annotation init receives the coordinates for the specific type, label, and optional attributes.

Optional Plotting

Before updating items with annotations, you can optionally plot the annotation you created and review it before uploading it. This applies to all annotations described in the following section.

import matplotlib.pyplot as plt
plt.figure()
plt.imshow(builder.show())
for annotation in builder:
    plt.figure()
    plt.imshow(annotation.show())
    plt.title(annotation.label)

Classification, Point and Pose

Classification, Point and Pose annotations types

Classification

Classify a single item

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Classify
builder.add(annotation_definition=dl.Classification(label=label))
# Upload classification to the item
item.annotations.upload(builder)
Classify Multiple Items

Classifying multiple items requires using an Items entity with a filter.

# mutiple items classification using filter
...
Create a Point Annotation
# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Create point annotation with label and attribute
builder.add(annotation_definition=dl.Point(x=100,
                                           y=100,
                                           label='my-label',
                                           attributes={'color': 'red'}))
# Upload point to the item
item.annotations.upload(builder)
Pose Annotation
# Pose annotation is based on pose template. Create the pose template from the platform UI and use it in the script by its ID
template_id = recipe.get_annotation_template_id(template_name="my_template_name")
# Get item
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Define the Pose parent annotation and upload it to the item
parent_annotation = item.annotations.upload(
    dl.Annotation.new(annotation_definition=dl.Pose(label='my_parent_label',
                                                    template_id=template_id,
                                                    # instance_id is optional
                                                    instance_id=None)))[0]
# Add child points
builder = item.annotations.builder()
builder.add(annotation_definition=dl.Point(x=x,
                                           y=y,
                                           label='my_point_label'),
            parent_id=parent_annotation.id)
builder.upload()

Bounding Box and Cuboid

Bounding Box and Cuboid annotations types

Create Box Annotation

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Create box annotation with label
builder.add(annotation_definition=dl.Box(top=10,
                                         left=10,
                                         bottom=100,
                                         right=100,
                                         label='my-label'))
# Upload box to the item
item.annotations.upload(builder)

Create a Rotated Bounding Box Annotation

A rotated box is created by setting its top-left and bottom-right coordinates, and providing its rotation angle.

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Create box annotation with label
builder.add(annotation_definition=dl.Box(top=10,
                                         left=10,
                                         bottom=100,
                                         right=100,
                                         angle=80,
                                         label='my-label'))
# Upload box to the item
item.annotations.upload(builder)

Convert Semantic Segmentation to Bounding Box

Convert all semantic segmentation annotations in an item into box annotation

annotations = item.annotations.list()
builder = item.annotations.builder()
# run over all annotation in item
for annotation in annotations:
    if annotation.type == dl.AnnotationType.SEGMENTATION:
        print("Found binary annotation - id:", annotation.id)
        builder.add(annotation_definition=annotation.annotation_definition.to_box())
item.annotations.upload(annotations=builder)

Create Cuboid (3D Box) Annotation

Create cuboid annotation in one of two ways :

# A.Bring front and back rectangles and the angel of the cuboid
builder.add(annotation_definition=dl.Cube.from_boxes_and_angle(label="label",
                                                               front_top=100,
                                                               front_left=100,
                                                               front_right=300,
                                                               front_bottom=300,
                                                               back_top=200,
                                                               back_left=200,
                                                               back_right=400,
                                                               back_bottom=400,
                                                               angle=0
                                                               ))
# B.Bring all 8 points of the Cuboid
builder.add(annotation_definition=dl.Cube(label="label",
                                          # front top left point coordinates
                                          front_tl=[200, 200],
                                          # front top right point coordinates
                                          front_tr=[500, 250],
                                          # front bottom right point coordinates
                                          front_br=[500, 550],
                                          # front bottom left point coordinates
                                          front_bl=[200, 500],
                                          # back top left point coordinates
                                          back_tl=[300, 300],
                                          # back top right point coordinates
                                          back_tr=[600, 350],
                                          # back bottom right point coordinates
                                          back_br=[600, 650],
                                          # back bottom left point coordinates
                                          back_bl=[300, 600]
                                          ))
item.annotations.upload(builder)

Polygon and Polyline

Polygon and Polyline annotations types

Create Single Polygon/Polyline Annotation

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Create polygon annotation with label
# with array of points: [[x1, y1], [x2, y2], ..., [xn, yn]]
builder.add(annotation_definition=dl.Polygon(geo=[[100, 50],
                                                  [80, 120],
                                                  [110, 130]],
                                             label='my-label'))
# create Polyline annotation with label
builder.add(annotation_definition=dl.Polyline(geo=[[100, 50],
                                                   [80, 120],
                                                   [110, 130]],
                                              label='my-label'))
# Upload polygon to the item
item.annotations.upload(builder)

Create Multiple Polygons from Mask

annotations = item.annotations.list()
mask_annotation = annotations[0]
builder = item.annotations.builder()
builder.add(dl.Polygon.from_segmentation(mask_annotation.geo,
                                         max_instances=2,
                                         label=mask_annotation.label))
item.annotations.upload(builder)

Convert Mask Annotations to Polygon

More about from_segmentation() function on here.

annotations = item.annotations.list()
builder = item.annotations.builder()
# run over all annotation in item
for annotation in annotations:
    if annotation.type == dl.AnnotationType.SEGMENTATION:
        print("Found binary annotation - id:", annotation.id)
        builder.add(dl.Polygon.from_segmentation(mask=annotation.annotation_definition.geo,
                                                 # binary mask of the annotation
                                                 label=annotation.label,
                                                 max_instances=None))
        annotation.delete()
item.annotations.upload(annotations=builder)

Convert Polygon Annotation to Mask

More about from_polygon() function on here. This script uses module CV2, please use this page to install it.

if annotation.type == dl.AnnotationType.POLYGON:
    print("Found polygon annotation - id:", annotation.id)
    builder.add(dl.Segmentation.from_polygon(geo=annotation.annotation_definition.geo,
                                             # binary mask of the annotation
                                             label=annotation.label,
                                             shape=img.size[::-1]  # (h,w)
                                             ))
annotation.delete()
item.annotations.upload(annotations=builder)

Ellipse and Item Description

Ellipse and Item Description annotations types

Create Ellipse Annotation

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Create a builder instance
builder = item.annotations.builder()
# Create ellipse annotation with label - With params for an ellipse; x and y for the center, rx, and ry for the radius and rotation angle:
builder.add(annotations_definition=dl.Ellipse(x=x,
                                              y=y,
                                              rx=rx,
                                              ry=ry,
                                              angle=angle,
                                              label=label))
# Upload the ellipse to the item
item.annotations.upload(builder)

Item Description

Item description is added as a “system annotation”, and serves as a way to save information about the item, that can be seen by anyone accessing it.

# Get item from the platform
item = dataset.items.get(filepath='/your-image-file-path.jpg')
# Add description (update if already exists)- if text is empty it will remove the description from the item
item.set_description(text="this is item description")

Advance Tutorials

Copy, count, show and annotation parenting.

Copy Annotations Between Items

By setting annotations entity from one item, and uploading it into another, we can copy annotations between items. Running through all items in a filter allows us to copy from one item into multiple items, for example video snapshots with the same object.

# Set the source item with the annotations we want to copy
project = dl.projects.get(project_name='second-project_name')
dataset = project.datasets.get(dataset_name='second-dataset_name')
item = dataset.items.get(item_id='first-id-number')
annotations = item.annotations.list()
# Set the target item where we want to copy to. If located on a different Project or Dataset, set these accordingly
item = dataset.items.get(item_id='second-id-number')
item.annotations.upload(annotations=annotations)
# Copy the annotation into multiple items, based on a filter entity. In this example, the filter is based on directory
filters = dl.Filters()
filters.add(field='filename', values='/fighting/**')  # take files from the directory only (recursive)
filters.add(field='type', values='file')  # only files
pages = dataset.items.list(filters=filters)
for page in pages:
    for item in page:
        # upload annotations
        item.annotations.upload(annotations=annotations)

Show Images & Annotations

This script uses module CV2, please use this page to install it.

from PIL import Image
# Get item
item = dataset.items.get(item_id='write-your-id-number')
# download item as a buffer
buffer = item.download(save_locally=False)
# open image
image = Image.open(buffer)
# download annotations
annotations = item.annotations.show(width=image.size[0],
                                    height=image.size[1],
                                    thickness=3)
annotations = Image.fromarray(annotations.astype(np.uint8))
# show the annotations and the image separately
annotations.show()
image.show()
# Show the annotations with the image
image.paste(annotations, (0, 0), annotations)
image.show()

Show Annotations from JSON file (Dataloop format)

Please notice that directory paths look different in OS and Linux and does not require “r” at the beginning

from PIL import Image
import json
with open(r'C:/home/project/images/annotation.json', 'r') as f:
    data = json.load(f)
for annotation in data['annotations']:
    annotations = dl.Annotation.from_json(annotation)
    mask = annotations.show(width=640,
                            height=480,
                            thickness=3,
                            color=(255, 0, 0))
    mask = Image.fromarray(mask.astype(np.uint8))
    mask.show()

Count total number of annotations

The following script counts the number of annotations in a filter. The filter can be set to any context - Dataset, folder or any specific criteria. In the following example, it is set to a dataset.

# Create annotations filters instance
filters = dl.Filters(resource=dl.FiltersResource.ANNOTATION)
filters.page_size = 0
# Count the annotations
annotations_count = dataset.annotations.list(filters=filters).items_count

Parenting Annotations

Parenting establishes a relation between 2 annotations, executed by setting the parent_id parameter. The Dataloop system will reject an attempt to set circular parenting. The following script demonstrate setting parenting relation while uploading/creating annotations

builder = item.annotations.builder()
builder.add(annotation_definition=dl.Box(top=10, left=10, bottom=100, right=100,
                                         label='my-parent-label'))
# upload parent annotation
annotations = item.annotations.upload(annotations=builder)
# create the child annotation
builder = item.annotations.builder()
builder.add(annotation_definition=dl.Box(top=10, left=10, bottom=100, right=100,
                                         label='my-child-label'),
            parent_id=annotations[0].id)
# upload annotations to item
item.annotations.upload(annotations=builder)

The following script demonstrate setting parenting relation on existing annotations:

# create and upload parent annotation
builder = item.annotations.builder()
builder.add(annotation_definition=dl.Box(top=10, left=10, bottom=100, right=100,
                                         label='my-parent-label'))
parent_annotation = item.annotations.upload(annotations=builder)[0]
# create and upload child annotation
builder = item.annotations.builder()
builder.add(annotation_definition=dl.Box(top=10, left=10, bottom=100, right=100,
                                         label='my-child-label'))
child_annotation = item.annotations.upload(annotations=builder)[0]
# set the child parent ID to the parent
child_annotation.parent_id = parent_annotation.id
# update the annotation
child_annotation.update(system_metadata=True)

Change Annotations’ Label

The following example creates a new label in the recipe (an optional step, you can also use an existing label), then applies it to all annotations in a certain filter.

# Create a new label
dataset.add_label(label_name='newLabel', color=(2, 43, 123))
# Filter annotations with the "oldLabel" label.
filters = dl.Filters()
filters.resource = dl.FiltersResource.ANNOTATION
filters.add(field='label', values='oldLabel')
pages = dataset.annotations.list(filters=filters)
# Change the Label of the Annotations - For every annotation we filtered out, Change it's Label to the "newLabel".
for annotation in pages.all():
    annotation.label = 'newLabel'
    annotation.update()

Video Annotations

Tutorials for annotating videos

Video Annotations

Upload and work with video annotations

In this tutorial we create and upload annotations into a video item. Video annotations differ from image annotations since they span over frames, and need to be set with their scope. This script uses module CV2, please use this page to install it.

Setup

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
item = dataset.items.get(filepath='/my_item.mp4')

Create A Single annotation

Create a single annotations for a video item and upload it

annotation = dl.Annotation.new(item=item)
# Span the annotation over 100 frames. Change this or use a different approach based on your context
for i_frame in range(100):
    # go over 100 frame
    annotation.add_frame(annotation_definition=dl.Box(top=2 * i_frame,
                                                      left=2 * (i_frame + 10),
                                                      bottom=2 * (i_frame + 50),
                                                      right=2 * (i_frame + 100),
                                                      label="my-label"),
                         frame_num=i_frame,  # set the frame for the annotation
                         )
# upload to platform
annotation.upload()

Adding Multiple Annotations Using Annotation Builder

The following scripts demonstrate adding 10 annotations into each frame

# create annotation builder
builder = item.annotations.builder()
for i_frame in range(100):
    # go over 100 frames
    for i_detection in range(10):
        # for each frame we have 10 different detections (location is just for the example)
        builder.add(annotation_definition=dl.Box(top=2 * i_frame,
                                                 left=2 * i_detection,
                                                 bottom=2 * i_frame + 10,
                                                 right=2 * i_detection + 100,
                                                 label="my-label"),
                    # set the frame for the annotation
                    frame_num=i_frame,
                    # need to input the element id to create the connection between frames
                    object_id=i_detection + 1,
                    )
# Upload the annotations to platform
item.annotations.upload(builder)

Read Frames of an Annotation

The following example reads all the frames an annotation exist in, e.g. the frame range an annotation spans over.

for annotation in item.annotations.list():
    print(annotation.object_id)
    for key in annotation.frames:
        frame = annotation.frames[key]
        print(frame.left, frame.right, frame.top, frame.bottom)

Create Frame Snapshots from Video

One of Dataloop video utilities enables creating a frame snapshot from a video item every X frames (frame_interval). You will need FFmpeg needs to be installed on your system using this official website.

dl.utilities.Videos.video_snapshots_generator(item=item, frame_interval=30)

Play An Item In Video Player

Play a video item with its annotations and labels with a video player

from dtlpy.utilities.videos.video_player import VideoPlayer
VideoPlayer(project_name=project_name,
            dataset_name=dataset_name,
            item_filepath=item_filepath)

Show Annotations in a Specified Frame

import matplotlib.pyplot as plt
# Get from platform
annotations = item.annotations.list()
# Plot the annotations in frame 55 of the created annotations
frame_annotation = annotations.get_frame(frame_num=55)
plt.figure()
plt.imshow(frame_annotation.show())
plt.title(frame_annotation.label)
# Play video with the Dataloop video player
annotations.video_player()

Recipe and Ontology

Tutorials for managing ontologies, labels, and recipes

Concepts

What are Recipe and Ontology

The Dataloop Recipe & Ontology concepts are detailed in our documentation. In short:

  • Ontology - an entity that contains labels and attributes. An attribute is linked to a label

  • Recipe - An entity that ties an ontology with labeling instructions

    • Linked with an ontology

    • Labeling tools (e.g. box, polygon etc)

    • Optional PDF instructions

    • And more…

Ontology

Create and manage Ontology, Labels and Attributes

In this chapter we will create an ontology and populate it with labels

Preparing - Entities setup

import dtlpy as dl
if dl.token_expired():
    dl.login()
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Get recipe from list
recipe = dataset.recipes.list()[0]
# Or get specific recipe:
recipe = dataset.recipes.get(recipe_id='id')
# Get ontology from list or create it using the "Create Ontology" script
ontology = recipe.ontologies.list()[0]
# Or get specific ontology:
ontology = recipe.ontologies.get(ontology_id='id')
# Print entities:
recipe.print()
ontology.print()

Create an Ontology

project = dl.projects.get(project_name='project_name')
ontology = project.ontologies.create(title="your_created_ontology_title",
                                     labels=[dl.Label(tag="Chameleon", color=(255, 0, 0))])

Labels

Ontology uses the ‘Labels’ entity, which is a python list object, and as such you can use python list methods such as sort(). Be sure to use ontology.update() after each python list action.

ontology.add_labels(label_list=['Shark', 'Whale', 'Animal.Donkey'], update_ontology=True)

Labels can be added with branched hierarchy to facilitate sub-labels at up-to 5 levels. Labels hierarchy is created by adding ‘.’ between parent and child labels. In the above example, this script will get the Donkey Label:

child_label = ontology.labels[-1].children[0]
print(child_label.tag, child_label.rgb)

Attributes

An attribute describes a label, without having to add more labels. For example “Car” is a label, but its color is an attribute. You can add multiple attributes to the ontology, and map it to labels. For example create the “color” attribute once, but have multiple labels use it. Attributes can be multiple-selection (e.g checkbox), single selection (radio button), value over slider, a yes/no question and free-text. An attribute can be set as a mandatory one, so annotators have to answer it before they can complete the item.

Add attributes to the ontology

The following example adds 1 attribute of every type, all as a mandatory attribute:

  • Multiple-choice attribute

  • Single-choice attributes

  • Slider attribute

  • Yes/no question attribute

  • Free text attribute

# This option is not available yet
...

Read Ontology Attributes

Read & print the all the ontology attributes:

print(ontology.metadata['attributes'])
keys = [att['key'] for att in ontology.metadata['attributes']]

Getting all labels is (including children):

print(ontology.labels_flat_dict)

Recipe

Create and manage Recipe and Annotations Instructions

Since a recipe is linked with an ontology, it allows for making changes with labels and attributes. When the recipe is set as the default one for a dataset, the same applies for the dataset entity - it can be used for making changes with the labels and attributes which are ultimately linked to it through the recipe and its ontology.

Working With Recipes

# Get recipe from a list
recipe = dataset.recipes.list()[0]
# Get recipe by ID - ID can be retrieved from the page URL when opening the recipe in the platform
recipe = dataset.recipes.get(recipe_id='your-recipe-id')
# Delete recipe - applies only for deleted datasets
dataset.recipes.get(recipe_id='your-recipe-id').delete()

Cloning Recipes

When you want to create a new recipe that’s only slightly different from an existing recipe, it can be easier to start by cloning the original recipe and then making changes on its clone. shallow: If True, link to existing ontology, If false clone all ontologies that are links to the recipe as well.

dataset = project.datasets.get(dataset_name="myDataSet")
recipe = dataset.recipes.get(recipe_id="recipe_id")
recipe2 = recipe.clone(shallow=False)

View Dataset Labels

# as objects
labels = dataset.labels
# as instance map
labels = dataset.instance_map

Add Labels by Dataset

Working with dataset labels can be done one-by-one or as a list. The Dataset entity documentation details all label options - read here.

# Add multiple labels
dataset.add_labels(label_list=['person', 'animal', 'object'])
# Add single label with specific color and attributes
dataset.add_label(label_name='person', color=(34, 6, 231))
# Add single label with a thumbnail/icon
dataset.add_label(label_name='person', icon_path='/home/project/images/icon.jpg')

Add Labels Using Label Object

# Create Labels list using Label object
labels = [
    dl.Label(tag='Donkey', color=(255, 100, 0)),
    dl.Label(tag='Mammoth', color=(34, 56, 7)),
    dl.Label(tag='Bird', color=(100, 14, 150))
]
# Add Labels to Dataset
dataset.add_labels(label_list=labels)
# or you can also create a recipe from the label list
recipe = dataset.recipes.create(recipe_name='My-Recipe-name', labels=labels)

Add a Label and Sub-Labels

label = dl.Label(tag='Fish',
                 color=(34, 6, 231),
                 children=[dl.Label(tag='Shark',
                                    color=(34, 6, 231)),
                           dl.Label(tag='Salmon',
                                    color=(34, 6, 231))]
                 )
dataset.add_labels(label_list=label)
# or you can also create a recipe from the label list
recipe = dataset.recipes.create(recipe_name='My-Recipe-name', labels=labels)

Add Hierarchy Labels with Nested

Different options for hierarchy label creation.

# Option A
# add father label
labels = dataset.add_label(label_name="animal", color=(123, 134, 64))
# add child label
labels = dataset.add_label(label_name="animal.Dog", color=(45, 34, 164))
# add grandchild label
labels = dataset.add_label(label_name="animal.Dog.poodle")
# Option B: only if you dont have attributes
# parent and grandparent (animal and dog) will be generated automatically
labels = dataset.add_label(label_name="animal.Dog.poodle")
# Option C: with the Big Dict
nested_labels = [
    {'label_name': 'animal.Dog',
     'color': '#220605',
     'children': [{'label_name': 'poodle',
                   'color': '#298345'},
                  {'label_name': 'labrador',
                   'color': '#298651'}]},
    {'label_name': 'animal.cat',
     'color': '#287605',
     'children': [{'label_name': 'Persian',
                   'color': '#298345'},
                  {'label_name': 'Balinese',
                   'color': '#298651'}]}
]
# Add Labels to the dataset:
labels = dataset.add_labels(label_list=nested_labels)

Delete Labels by Dataset

dataset.delete_labels(label_names=['Cat', 'Dog'])

Update Label Features

# update existing label , if not exist fails
dataset.update_label(label_name='Cat', color="#000080")
# update label, if not exist add it
dataset.update_label(label_name='Cat', color="#fcba03", upsert=True)

Model Management

Tutorials for creating and managing model and snapshots

Introduction

Getting started with Model.

Model Management

Introduction

Dataloop’s Model Management is here to provide Machine Learning engineers the ability to manage their research and production process.

We want to introduce Dataloop entities to create, manage, view, compare, restore, and deploy training sessions.

Our Model Management gives a separation between Model code, weights and configuration, and the data.

in Offline mode, there is no need to do any code integration with Dataloop - just create a model and snapshots entities and you can start managing your work on the platform create reproducible training:

  • same configurations and dataset to reproduce the training

  • view project/org models and snapshots in the platform

  • view training metrics and results

  • compare experiments NOTE: all functions from the codebase can be used in FaaS and pipelines only with custom functions! User must create a FaaS and expose those functions any way he’d like

Online Mode: In the online mode, you can train and deploy your models easily anywhere on the platform. All you need to do is create a Model Adapter class and expose some functions to build an API between Dataloop and your model. After that, you can easily add model blocks to pipelines, add UI slots in the studio, one-button-training etc

Model and Snapshot entities
Model

The model entity is basically the algorithm, the architecture of the model, e.g Yolov5, Inception, SVM, etc.

  • In online it should contain the Model Adapter to create a Dataloop API

Snapshot

Using the Model (architecture), Dataset and Ontology (data and labels) and configuration (a dictionary) we can create a Snapshot of a training process. The Snapshot contains the weights and any other artifact needed to load the trained model

a snapshot can be used as a parent to another snapshot - to start for that point (fine-tune and transfer learning)

Buckets and Codebase
  1. local

  2. item

  3. git

  4. GCS

The Model Adapter

The Model Adapter is a python class to create a single API between Dataloop’s platform and your Model

  1. Train

  2. Predict

  3. load/save model weights

  4. annotation conversion if needed

We enable two modes of work: in Offline mode, everything is local, you don’t have to upload any model code or any weights to platform, which causes the platform integration to be minimal. For example, you cannot use the Model Management components in a pipeline, can easily create a button interface with your model’s inference and more. In Online mode - once you build an Adapter, our platform can interact with your model and trained snapshots and you can connect buttons and slots inside the platform to create, train, inference etc and connect the model and any train snapshot to the UI or to add to a pipeline

Create a Model and Snapshot

Create a Model with a Dataloop Model Adapter

Create Your own Model and Snapshot

We will create a dummy model adapter in order to build our model and snapshot entities NOTE: This is an example for a torch model adapter. This example will NOT run as-is. For working examples please refer to our models on github

The following class inherits from the dl.BaseModelAdapter, which have all the Dataloop methods for interacting with the Model and Snapshot There are four methods that are model-related that the creator must implement for the adapter to have the API with Dataloop

import dtlpy as dl
import torch
import os
class SimpleModelAdapter(dl.BaseModelAdapter):
    def load(self, local_path, **kwargs):
        print('loading a model')
        self.model = torch.load(os.path.join(local_path, 'model.pth'))
    def save(self, local_path, **kwargs):
        print('saving a model to {}'.format(local_path))
        torch.save(self.model, os.path.join(local_path, 'model.pth'))
    def train(self, data_path, output_path, **kwargs):
        print('running a training session')
    def predict(self, batch, **kwargs):
        print('predicting batch of size: {}'.format(len(batch)))
        preds = self.model(batch)
        return preds

Now we can create our Model entity with an Item codebase.

project = dl.projects.get('MyProject')
codebase: dl.ItemCodebase = project.codebases.pack(directory='/path/to/codebase')
model = project.models.create(model_name='first-git-model',
                              description='Example from model creation tutorial',
                              output_type=dl.AnnotationType.CLASSIFICATION,
                              tags=['torch', 'inception', 'classification'],
                              codebase=codebase,
                              entry_point='dataloop_adapter.py',
                              )

For creating a Model with a Git code, simply change the codebase to be a Git one:

project = dl.projects.get('MyProject')
codebase: dl.GitCodebase = dl.GitCodebase(git_url='github.com/mygit', git_tag='v25.6.93')
model = project.models.create(model_name='first-model',
                              description='Example from model creation tutorial',
                              output_type=dl.AnnotationType.CLASSIFICATION,
                              tags=['torch', 'inception', 'classification'],
                              codebase=codebase,
                              entry_point='dataloop_adapter.py',
                              )

Creating a local snapshot:

bucket = dl.buckets.create(dl.BucketType.ITEM)
bucket.upload('/path/to/weights')
snapshot = model.snapshots.create(snapshot_name='tutorial-snapshot',
                                  description='first snapshot we uploaded',
                                  tags=['pretrained', 'tutorial'],
                                  dataset_id=None,
                                  configuration={'weights_filename': 'model.pth'
                                                 },
                                  project_id=model.project.id,
                                  bucket=bucket,
                                  labels=['car', 'fish', 'pizza']
                                  )

Building to model adapter and calling one of the adapter’s methods:

adapter = model.build()
adapter.load_from_snapshot(snapshot=snapshot)
adapter.train()

Using Dataloop’s Dataset Generator

Use the SDK and the Dataset Tools to iterate, augment and serve the data to your model

Dataloop Dataloader

A dl.Dataset image and annotation generator for training and for items visualization

We can visualize the data with augmentation for debugging and exploration. After that, we will use the Data Generator as an input to the training functions.

# matplotlib notebook
import matplotlib.pyplot as plt
import logging
from dtlpy.utilities import DatasetGenerator
import dtlpy as dl
logging.basicConfig(level='INFO')
dataset = dl.datasets.get(dataset_id='611b86e647fe2f865323007a')
datagen = DatasetGenerator(data_path='train',
                           dataset_entity=dataset,
                           annotation_type=dl.AnnotationType.BOX)
Object Detection Examples

We can visualize a random item from the dataset:

for i in range(5):
    datagen.visualize()

Or get the same item using its index:

for i in range(5):
    datagen.visualize(10)

Adding augmentations using imgaug repository:

from imgaug import augmenters as iaa
import numpy as np
augmentation = iaa.Sequential([
    iaa.Resize({"height": 256, "width": 256}),
    # iaa.Superpixels(p_replace=(0, 0.5), n_segments=(10, 50)),
    iaa.flip.Fliplr(p=0.5),
    iaa.flip.Flipud(p=0.5),
    iaa.GaussianBlur(sigma=(0.0, 0.8)),
])
tfs = [
    augmentation,
    np.copy,
    # transforms.ToTensor()
]
datagen = DatasetGenerator(data_path='train',
                           dataset_entity=dataset,
                           annotation_type=dl.AnnotationType.BOX,
                           transforms=tfs)
datagen.visualize()
datagen.visualize(10)

All of the Data Generator options (from the function docstring):

:param dataset_entity: dl.Dataset entity :param annotation_type: dl.AnnotationType - type of annotation to load from the annotated dataset :param filters: dl.Filters - filtering entity to filter the dataset items :param data_path: Path to Dataloop annotations (root to “item” and “json”). :param overwrite: :param label_to_id_map: dict - {label_string: id} dictionary :param transforms: Optional transform to be applied on a sample. list or torchvision.Transform :param num_workers: :param shuffle: Whether to shuffle the data (default: True) If set to False, sorts the data in alphanumeric order. :param seed: Optional random seed for shuffling and transformations. :param to_categorical: convert label id to categorical format :param class_balancing: if True - performing random over-sample with class ids as the target to balance training data :param return_originals: bool - If True, return ALSO images and annotations before transformations (for debug) :param ignore_empty: bool - If True, generator will NOT collect items without annotations

The output of a single element is a dictionary holding all the relevant information. the keys for the DataGen above are: [‘image_filepath’, ‘item_id’, ‘box’, ‘class’, ‘labels’, ‘annotation_filepath’, ‘image’, ‘annotations’, ‘orig_image’, ‘orig_annotations’]

print(list(datagen[0].keys()))

We’ll add the flag to return the origin items to understand better how the augmentations look like. Let’s set the flag and we can plot:

import matplotlib.pyplot as plt
datagen = DatasetGenerator(data_path='train',
                           dataset_entity=dataset,
                           annotation_type=dl.AnnotationType.BOX,
                           return_originals=True,
                           shuffle=False,
                           transforms=tfs)
fig, ax = plt.subplots(2, 2)
for i in range(2):
    item_element = datagen[np.random.randint(len(datagen))]
    ax[i, 0].imshow(item_element['image'])
    ax[i, 0].set_title('After Augmentations')
    ax[i, 1].imshow(item_element['orig_image'])
    ax[i, 1].set_title('Before Augmentations')
Segmentation Examples

First we’ll load a semantic dataset and view some images and the output structure

dataset = dl.datasets.get(dataset_id='6197985a104eb81cb728e4ac')
datagen = DatasetGenerator(data_path='semantic',
                           dataset_entity=dataset,
                           transforms=tfs,
                           return_originals=True,
                           annotation_type=dl.AnnotationType.SEGMENTATION)
for i in range(5):
    datagen.visualize()

Visualize original vs augmented image and annotations mask:

fig, ax = plt.subplots(2, 4)
for i in range(2):
    item_element = datagen[np.random.randint(len(datagen))]
    ax[i, 0].imshow(item_element['orig_image'])
    ax[i, 0].set_title('Original Image')
    ax[i, 1].imshow(item_element['orig_annotations'])
    ax[i, 1].set_title('Original Annotations')
    ax[i, 2].imshow(item_element['image'])
    ax[i, 2].set_title('Augmented Image')
    ax[i, 3].imshow(item_element['annotations'])
    ax[i, 3].set_title('Augmented Annotations')

Converting to 3d one-hot encoding to visualize the binary mask per label. We will plot only 8 labels (there might be more on the item):

item_element = datagen[np.random.randint(len(datagen))]
annotations = item_element['annotations']
unique_labels = np.unique(annotations)
one_hot_annotations = np.arange(len(datagen.id_to_label_map)) == annotations[..., None]
print('unique label indices in the item: {}'.format(unique_labels))
print('unique labels in the item: {}'.format([datagen.id_to_label_map[i] for i in unique_labels]))
plt.figure()
plt.imshow(item_element['image'])
fig = plt.figure()
for i_label_ind, label_ind in enumerate(unique_labels[:8]):
    ax = fig.add_subplot(2, 4, i_label_ind + 1)
    ax.imshow(one_hot_annotations[:, :, label_ind])
    ax.set_title(datagen.id_to_label_map[label_ind])
Setting a Label Map

One of the inputs to the DatasetGenerator is ‘label_to_id_map’. This variable can be used to change the label mapping for the annotations and allow using the dataset ontology in a greater variety of cases. For example, you can map multiple labels so a single id or add a default value for all the unlabeled pixels in segmentation annotations. This is what the annotation looks like without any mapping:

# project = dl.projects.get(project_name='Semantic')
# dataset = project.datasets.get(dataset_name='Hamster')
# dataset.items.upload(local_path='assets/images/hamster.jpg',
#                      local_annotations_path='assets/images/hamster.json')
dataset = dl.datasets.get(dataset_id='621ddc855c2a3d151451ec58')
datagen = DatasetGenerator(data_path='semantic',
                           dataset_entity=dataset,
                           return_originals=True,
                           overwrite=True,
                           annotation_type=dl.AnnotationType.SEGMENTATION)
datagen.visualize()
data_item = datagen[0]
plt.imshow(data_item['annotations'])
print('BG value: {}'.format(data_item['annotations'][0, 0]))

Now, we’ll map both the ‘eye’ label and the background to 2 and the ‘fur’ to 1:

label_to_id_map = {'fur': 1,
                   'eye': 2,
                   '$default': 2}
datagen = DatasetGenerator(data_path='semantic',
                           dataset_entity=dataset,
                           return_originals=True,
                           overwrite=True,
                           label_to_id_map=label_to_id_map,
                           annotation_type=dl.AnnotationType.SEGMENTATION)
datagen.visualize()
data_item = datagen[0]
plt.imshow(data_item['annotations'])
print('BG value: {}'.format(data_item['annotations'][0, 0]))
Batch size and batch_size and collate_fn

If batch_size is not None, the returned structure will be a list with batch_size data items. Setting a collate function will convert the returned structure to a tensor of any kind. The default collate will convert everything to ndarrays. We also have tensorflow and torch collate to convert to the corresponding tensors.

dataset = dl.datasets.get(dataset_id='611b86e647fe2f865323007a')
datagen = DatasetGenerator(data_path='train',
                           dataset_entity=dataset,
                           batch_size=10,
                           annotation_type=dl.AnnotationType.BOX)
batch = datagen[0]
print('type: {}, len: {}'.format(type(batch), len(batch)))
print('single element in the list: {}'.format(batch[0]['image']))
# with collate
from dtlpy.utilities.dataset_generators import collate_default
datagen = DatasetGenerator(data_path='train',
                           dataset_entity=dataset,
                           collate_fn=collate_default,
                           batch_size=10,
                           annotation_type=dl.AnnotationType.BOX)
batch = datagen[0]
print('type: {}, len: {}, shape: {}'.format(type(batch['images']), len(batch['images']), batch['images'].shape))