<!-- Autogenerated by `scripts/make_examples.py` -->
<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/voxel51/fiftyone-examples/blob/master/examples/image_uniqueness.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791629-6e618700-5769-11eb-857f-d176b37d2496.png" height="32" width="32">
            Try in Google Colab
        </a>
    </td>
    <td>
        <a target="_blank" href="https://nbviewer.jupyter.org/github/voxel51/fiftyone-examples/blob/master/examples/image_uniqueness.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791634-6efa1d80-5769-11eb-8a4c-71d6cb53ccf0.png" height="32" width="32">
            Share via nbviewer
        </a>
    </td>
    <td>
        <a target="_blank" href="https://github.com/voxel51/fiftyone-examples/blob/master/examples/image_uniqueness.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791633-6efa1d80-5769-11eb-8ee3-4b2123fe4b66.png" height="32" width="32">
            View on GitHub
        </a>
    </td>
    <td>
        <a href="https://github.com/voxel51/fiftyone-examples/raw/master/examples/image_uniqueness.ipynb" download>
            <img src="https://user-images.githubusercontent.com/25985824/104792428-60f9cc00-576c-11eb-95a4-5709d803023a.png" height="32" width="32">
            Download notebook
        </a>
    </td>
</table>


# Exploring Image Uniqueness

This example provides a brief overivew of using FiftyOne's [image uniqueness method](https://voxel51.com/docs/fiftyone/user_guide/brain.html#image-uniqueness) to analyze and extract insights from unlabeled datasets.

For more details, check out the in-depth [image uniqueness tutorial](https://voxel51.com/docs/fiftyone/tutorials/uniqueness.html).

## Setup

If you haven't already, install FiftyOne:


In [None]:
!pip install fiftyone

## Load dataset

We'll work with the test split of the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), which is
conveniently available in the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/zoo.html):

In [1]:
import fiftyone as fo
import fiftyone.zoo as foz

# Load the CIFAR-10 test split
# This will download the dataset from the web, if necessary
dataset = foz.load_zoo_dataset("cifar10", split="test")
dataset.name = "image-uniqueness-example"

print(dataset)

Split 'test' already downloaded
Loading existing dataset 'cifar10-test'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
Name:           image-uniqueness-example
Media type:     None
Num samples:    10000
Persistent:     True
Info:           {'classes': ['airplane', 'automobile', 'bird', ...]}
Tags:           ['test']
Sample fields:
    media_type:   fiftyone.core.fields.StringField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    uniqueness:   fiftyone.core.fields.FloatField


## Index by visual uniqueness

Next we'll index the dataset by visual uniqueness using a
[builtin method](https://voxel51.com/docs/fiftyone/user_guide/brain.html#image-uniqueness)
from the FiftyOne Brain:

In [2]:
import fiftyone.brain as fob

fob.compute_uniqueness(dataset)

print(dataset)

Loading uniqueness model...
Loaded default deployment config for model 'simple_resnet_cifar10'
Applied 0 setting(s) from default deployment config
Preparing data...
Generating embeddings...
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [2.5m elapsed, 0s remaining, 56.4 samples/s]      
Computing uniqueness...
Saving results...
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [18.3s elapsed, 0s remaining, 559.9 samples/s]      
Uniqueness computation complete
Name:           image-uniqueness-example
Media type:     None
Num samples:    10000
Persistent:     True
Info:           {'classes': ['airplane', 'automobile', 'bird', ...]}
Tags:           ['test']
Sample fields:
    media_type:   fiftyone.core.fields.StringField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fie

Note that the dataset now has a `uniqueness` field that contains a numeric measure of the visual uniqueness of each sample:

In [4]:
# View a sample from the dataset
print(dataset.first())

<Sample: {
    'id': '5f89c1e54937ecdaa3ffa1a4',
    'media_type': 'image',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000001.jpg',
    'tags': BaseList(['test']),
    'metadata': None,
    'ground_truth': <Classification: {
        'id': '5f89c1e54937ecdaa3ffa1a3',
        'label': 'cat',
        'confidence': None,
        'logits': None,
    }>,
    'uniqueness': 0.4978481475892659,
}>


## Visualize near-duplicate samples in the App

Let's open the dataset in the App:

In [3]:
# View dataset in the App
session = fo.launch_app(dataset)

App launched


![uniqueness-01](https://user-images.githubusercontent.com/25985824/97113820-1adc2180-16c3-11eb-97b8-474878099522.png)

From the App, we can show the most visually similar images in the dataset by creating a `SortBy("uniqueness", reverse=False)` stage in the [view bar](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar).

Alternatively, this same operation can be performed programmatically via Python:

In [5]:
# Show least unique images first
least_unique_view = dataset.sort_by("uniqueness", reverse=False)

# Open view in App
session.view = least_unique_view

![uniqueness-02](https://user-images.githubusercontent.com/25985824/97113818-1a438b00-16c3-11eb-96a7-4307d65ddc1f.png)

## Omit near-duplicate samples from the dataset

Next, we'll show how to omit visually similar samples from a dataset.

First, use the App to select visually similar samples.

![uniqueness-03](https://user-images.githubusercontent.com/25985824/97113816-19125e00-16c3-11eb-856d-8720d4bf50df.png)

Assuming the visually similar samples are currently selected in the App, we can easily add a `duplicate` tag to these samples via Python:

In [6]:
# Get currently selected images from App
dup_ids = session.selected
print(dup_ids)

# Get view containing selected samples
dups_view = dataset.select(dup_ids)

# Mark as duplicates
for sample in dups_view:
    sample.tags.append("duplicate")
    sample.save()

['5f89c1f54937ecdaa3fffb11', '5f89c1f04937ecdaa3ffde28', '5f89c1eb4937ecdaa3ffc52f', '5f89c1f84937ecdaa30010ec', '5f89c1f94937ecdaa3001458', '5f89c1f24937ecdaa3ffe959', '5f89c1ec4937ecdaa3ffcd45', '5f89c1ec4937ecdaa3ffce32', '5f89c1f04937ecdaa3ffe0da']


We can, for example, then use the `MatchTag("duplicate")` stage in the [view bar](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar) to re-isolate the duplicate samples.

Alternatively, this same operation can be performed programmatically via Python:

In [7]:
# Select samples with `duplicate` tag
dups_tag_view = dataset.match_tags("duplicate")

# Open view in App
session.view = dups_tag_view

![uniqueness-04](https://user-images.githubusercontent.com/25985824/97113813-16b00400-16c3-11eb-9031-a097e24ecd5a.png)

## Export de-duplicated dataset

Now let's [create a view](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering)
that omits samples with the `duplicate` tag, and then export them to disk as an [image classification directory tree](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#imageclassificationdirectorytree):

In [8]:
from fiftyone import ViewField as F

# Get samples that do not have the `duplicate` tag
no_dups_view = dataset.match(~F("tags").contains("duplicate"))

# Export dataset to disk as a classification directory tree
no_dups_view.export(
    "/tmp/fiftyone-examples/cifar10-no-dups",
    fo.types.ImageClassificationDirectoryTree
)

 100% |██████████████████████████████████████████████████████████████████████████████████████████████████████████| 9991/9991 [13.1s elapsed, 0s remaining, 779.2 samples/s]       


Let's list the contents of the exported dataset on disk to verify the export:

In [9]:
# Check the top-level directory structure
!ls -lah /tmp/fiftyone-examples/cifar10-no-dups

total 0
drwxr-xr-x    12 Brian  wheel   384B Oct 25 13:03 [34m.[m[m
drwxr-xr-x     3 Brian  wheel    96B Oct 25 13:03 [34m..[m[m
drwxr-xr-x  1001 Brian  wheel    31K Oct 25 13:03 [34mairplane[m[m
drwxr-xr-x   995 Brian  wheel    31K Oct 25 13:03 [34mautomobile[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mbird[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mcat[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mdeer[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mdog[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mfrog[m[m
drwxr-xr-x  1001 Brian  wheel    31K Oct 25 13:03 [34mhorse[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mship[m[m
drwxr-xr-x  1002 Brian  wheel    31K Oct 25 13:03 [34mtruck[m[m


In [10]:
# View the contents of a class directory
!ls -lah /tmp/fiftyone-examples/cifar10-no-dups/airplane | head

total 7992
drwxr-xr-x  1001 Brian  wheel    31K Oct 25 13:03 .
drwxr-xr-x    12 Brian  wheel   384B Oct 25 13:03 ..
-rw-r--r--     1 Brian  wheel   1.2K Oct 25 13:03 000004.jpg
-rw-r--r--     1 Brian  wheel   1.1K Oct 25 13:03 000011.jpg
-rw-r--r--     1 Brian  wheel   1.1K Oct 25 13:03 000022.jpg
-rw-r--r--     1 Brian  wheel   1.3K Oct 25 13:03 000028.jpg
-rw-r--r--     1 Brian  wheel   1.2K Oct 25 13:03 000045.jpg
-rw-r--r--     1 Brian  wheel   1.2K Oct 25 13:03 000053.jpg
-rw-r--r--     1 Brian  wheel   1.3K Oct 25 13:03 000075.jpg
