<!-- Autogenerated by `scripts/make_examples.py` -->
<table align="left">
    <td>
        <a target="_blank" href="https://colab.research.google.com/github/voxel51/fiftyone-examples/blob/master/examples/wildme_conservation_datasets.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791629-6e618700-5769-11eb-857f-d176b37d2496.png" height="32" width="32">
            Try in Google Colab
        </a>
    </td>
    <td>
        <a target="_blank" href="https://nbviewer.jupyter.org/github/voxel51/fiftyone-examples/blob/master/examples/wildme_conservation_datasets.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791634-6efa1d80-5769-11eb-8a4c-71d6cb53ccf0.png" height="32" width="32">
            Share via nbviewer
        </a>
    </td>
    <td>
        <a target="_blank" href="https://github.com/voxel51/fiftyone-examples/blob/master/examples/wildme_conservation_datasets.ipynb">
            <img src="https://user-images.githubusercontent.com/25985824/104791633-6efa1d80-5769-11eb-8ee3-4b2123fe4b66.png" height="32" width="32">
            View on GitHub
        </a>
    </td>
    <td>
        <a href="https://github.com/voxel51/fiftyone-examples/raw/master/examples/wildme_conservation_datasets.ipynb" download>
            <img src="https://user-images.githubusercontent.com/25985824/104792428-60f9cc00-576c-11eb-95a4-5709d803023a.png" height="32" width="32">
            Download notebook
        </a>
    </td>
</table>


# Load WildMe Conservation Data into FiftyOne

This notebook walks you through how to load data from the WildMe Collection in the Labeled Information Library of Alexandria: Biology and Conservation [LILA BC](https://lila.science/datasets) dataset!

First, we'll download the data. Then, we'll load the data into FiftyOne. Finally, we'll add some visualization and similarity indexes to the data, as a bonus.

**Note**: You can also browse this dataset for free at [try.fiftyone.ai](https://try.fiftyone.ai/datasets/wildme/samples)!

![WildMe Thumbnail](https://user-images.githubusercontent.com/12500356/260644073-17d5612a-aca3-45bb-b221-c429f3b7c545.gif)


## Setup

To run this code, you will need to install the [FiftyOne open source library](https://github.com/voxel51/fiftyone) for dataset curation.

In [None]:
!pip install fiftyone

We will import all of the necessary modules:

In [None]:
from datetime import datetime
import json
import numpy as np 
import os

import fiftyone as fo
import fiftyone.brain as fob
from fiftyone import ViewField as F

## Downloading Data

All of the raw data is hosted in Google Cloud buckets. We will be creating one combined dataset out of three collections:

- [Leopard ID 2022](https://lila.science/datasets/leopard-id-2022/)
- [Hyena ID 2022](https://lila.science/datasets/hyena-id-2022/)
- [Beluga ID 2022](https://lila.science/datasets/beluga-id-2022/)

Run the following cell to batch download the zip files containing the X-ray images:

In [1]:
## 3 collections of images and annotations from WildMe
subsets = ["beluga", "hyena", "leopard"]

for s in subsets:
    ## Download the data
    !wget https://storage.googleapis.com/public-datasets-lila/wild-me/{s}.coco.tar.gz
    ## Unzip the data
    !gunzip {s}.coco.tar.gz
    ## Untar the data
    !tar -xvf {s}.coco.tar

    ## Move the data to the correct location for COCO Import
    !mkdir {s}/data
    !mv {s}/images/train2022* {s}/data/
    !mv {s}/annotations/instances_train2022.json labels.json

## Loading the data

Now that we have the data downloaded, we can create a FiftyOne dataset for all of it. First, let's create an empty dataset:

In [2]:
dataset = fo.Dataset("WildMe")

Then, we can loop over the subdatasets, import each of them in COCO format, and add them to the main dataset. We also delete the `segmentations` sample field from the annotations, as it is not used.

In [None]:
DATASET_TYPE = fo.types.COCODetectionDataset

for subset in subsets:
    dataset_dir = f"{subset}.coco/"

    subset = fo.Dataset.from_dir(
        dataset_dir=dataset_dir,
        dataset_type=DATASET_TYPE,
    )

    ## delete unused segmentations field
    subset.delete_sample_field("segmentations")
    dataset.add_samples(subset)

Now, we can make the dataset persistent so it can be used in the future without having to re-download the data.

In [None]:
dataset.persistent=True

Additionally, we can use the `add_dynamic_sample_Fields()` method to make all of the non-standard attributes on the dataset visible and filterable in the FiftyOne App:

In [None]:
dataset.add_dynamic_sample_fields()

In order to easily differentiate between the sub-collections in the dataset, we will save them each as their own view, and also tag samples with the sub-collection name. This will allow us to easily filter the dataset by sub-collection.

In [None]:
beluga_view = dataset.match_labels(filter = F("label") == "beluga_whale")
dataset.save_view("beluga_view", beluga_view)
beluga_view.tag_samples("beluga")

hyena_view = dataset.match_labels(filter = F("label") == "hyena")
dataset.save_view("hyena_view", hyena_view)
hyena_view.tag_samples("hyena")

leopard_view = dataset.match_labels(filter = F("label") == "leopard")
dataset.save_view("leopard_view", leopard_view)
leopard_view.tag_samples("leopard")

## Add Embeddings, Similarity, and Visualization

In order to capture visual and conceptual similarity, we will use [DreamSim](https://dreamsim-nights.github.io/). We will compute embeddings once so that we can use them for the rest of the notebook. If you would like, you can swap out DreamSim for another embedding model, such as ResNet50.

In [None]:
!pip install dreamsim

In [None]:
from dreamsim import dreamsim
from PIL import Image
model, preprocess = dreamsim(pretrained=True)

Iterate through samples in the dataset, adding dreamsim embedding to each:

In [None]:
dataset.add_sample_field("dreamsim_embedding", fo.ArrayField)
for sample in dataset.iter_samples(autosave=True, progress=True):
    img1 = preprocess(Image.open(sample.filepath)).to("cuda")
    sample["dreamsim_embedding"] = np.array(model.embed(img1).cpu())[0]

Now we can use these embeddings to compute an [image similarity index](https://docs.voxel51.com/user_guide/app.html#image-similarity) on the dataset:

In [None]:
fob.compute_similarity(
    dataset,
    embeddings = "dreamsim_embedding",
    brain_key = "dreamsim_sim",
)

![WildMe Image Sim](https://user-images.githubusercontent.com/12500356/260641532-e7ff0833-9d93-4d79-93f6-d9f08e4985d8.gif)

As well as an embedding visualization, which we can generate by running UMAP on the embeddings to reduce them to 2 dimensions:

In [3]:
fob.compute_visualization(
    dataset,
    embeddings = "dreamsim_embedding",
    brain_key = "dreamsim_vis",
)

![WildMe Vis](https://user-images.githubusercontent.com/12500356/260641522-100cebf7-c6d2-4dbb-ac19-50d9f8aec0e0.gif)

We can also add a similarity index to the detection patches, making them searchable as well. Let's use a CLIP model so that we can search through the object detection patches with natural language queries:

In [4]:
fob.compute_similarity(
    dataset,
    patches_field = "detections",
    model = "clip-vit-base32-torch",
    brain_key = "clip_sim"
)

![WildMe Text Sim](https://user-images.githubusercontent.com/12500356/260641528-5fe9f705-8896-4051-814b-06c5e30ac9de.gif)