{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Autogenerated by `scripts/make_examples.py` -->\n",
    "<table align=\"left\">\n",
    "    <td>\n",
    "        <a target=\"_blank\" href=\"https://colab.research.google.com/github/voxel51/fiftyone-examples/blob/master/examples/wrangling_datasets.ipynb\">\n",
    "            <img src=\"https://user-images.githubusercontent.com/25985824/104791629-6e618700-5769-11eb-857f-d176b37d2496.png\" height=\"32\" width=\"32\">\n",
    "            Try in Google Colab\n",
    "        </a>\n",
    "    </td>\n",
    "    <td>\n",
    "        <a target=\"_blank\" href=\"https://nbviewer.jupyter.org/github/voxel51/fiftyone-examples/blob/master/examples/wrangling_datasets.ipynb\">\n",
    "            <img src=\"https://user-images.githubusercontent.com/25985824/104791634-6efa1d80-5769-11eb-8a4c-71d6cb53ccf0.png\" height=\"32\" width=\"32\">\n",
    "            Share via nbviewer\n",
    "        </a>\n",
    "    </td>\n",
    "    <td>\n",
    "        <a target=\"_blank\" href=\"https://github.com/voxel51/fiftyone-examples/blob/master/examples/wrangling_datasets.ipynb\">\n",
    "            <img src=\"https://user-images.githubusercontent.com/25985824/104791633-6efa1d80-5769-11eb-8ee3-4b2123fe4b66.png\" height=\"32\" width=\"32\">\n",
    "            View on GitHub\n",
    "        </a>\n",
    "    </td>\n",
    "    <td>\n",
    "        <a href=\"https://github.com/voxel51/fiftyone-examples/raw/master/examples/wrangling_datasets.ipynb\" download>\n",
    "            <img src=\"https://user-images.githubusercontent.com/25985824/104792428-60f9cc00-576c-11eb-95a4-5709d803023a.png\" height=\"32\" width=\"32\">\n",
    "            Download notebook\n",
    "        </a>\n",
    "    </td>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Wrangling Datasets\n",
    "\n",
    "This example provides a brief overivew of loading datasets in common formats\n",
    "into FiftyOne, manipulating them, and then exporting them (or subsets of them)\n",
    "to disk (in possbily different formats).\n",
    "\n",
    "For more details, check out the resources below:\n",
    "\n",
    "-   [Loading data into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html)\n",
    "-   [Dataset basics](https://voxel51.com/docs/fiftyone/user_guide/basics.html)\n",
    "-   [Using dataset views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html)\n",
    "-   [Exporting FiftyOne datasets](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "If you haven't already, install FiftyOne:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install fiftyone"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's prepare some datasets to work with. Don't worry about the details for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Split 'test' already downloaded\n",
      "Loading existing dataset 'cifar10-test'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n",
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [494.2ms elapsed, 0s remaining, 505.9 samples/s]      \n",
      "Split 'validation' already downloaded\n",
      "Loading existing dataset 'coco-2017-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n",
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [3.0s elapsed, 0s remaining, 90.6 samples/s]      \n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "import fiftyone.zoo as foz\n",
    "\n",
    "# ImageClassificationDirectoryTree\n",
    "dataset = foz.load_zoo_dataset(\"cifar10\", split=\"test\")\n",
    "dataset.take(250).export(\n",
    "    \"/tmp/fiftyone-examples/image-classification-directory-tree\",\n",
    "    fo.types.ImageClassificationDirectoryTree,\n",
    ")\n",
    "\n",
    "# CVATImageDataset\n",
    "dataset = foz.load_zoo_dataset(\"coco-2017\", split=\"validation\")\n",
    "dataset.take(250).export(\n",
    "    \"/tmp/fiftyone-examples/cvat-image-dataset\",\n",
    "    fo.types.CVATImageDataset,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading data into FiftyOne\n",
    "\n",
    "FiftyOne provides support for loading [many common dataset formats](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#supported-formats) out-of-the-box."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image classification directory tree\n",
    "\n",
    "You can load a classification dataset stored as a directory tree whose subfolders define the classes of the images.\n",
    "\n",
    "The relevant dataset type is [ImageClassificationDirectoryTree](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#imageclassificationdirectorytree):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [354.0ms elapsed, 0s remaining, 706.2 samples/s]      \n",
      "Name:           2020.10.25.13.23.47\n",
      "Media type:     None\n",
      "Num samples:    250\n",
      "Persistent:     False\n",
      "Info:           {'classes': ['airplane', 'automobile', 'bird', ...]}\n",
      "Tags:           []\n",
      "Sample fields:\n",
      "    media_type:   fiftyone.core.fields.StringField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "\n",
    "DATASET_DIR = \"/tmp/fiftyone-examples/image-classification-directory-tree\"\n",
    "\n",
    "classification_dataset = fo.Dataset.from_dir(\n",
    "    DATASET_DIR, fo.types.ImageClassificationDirectoryTree\n",
    ")\n",
    "\n",
    "print(classification_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CVAT image dataset\n",
    "\n",
    "You can load a set of object detections stored in [CVAT image format](https://github.com/openvinotoolkit/cvat).\n",
    "\n",
    "The relevant dataset type is [CVATImageDataset](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#cvatimagedataset):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [3.5s elapsed, 0s remaining, 72.3 samples/s]      \n",
      "Name:           2020.10.25.13.23.54\n",
      "Media type:     image\n",
      "Num samples:    250\n",
      "Persistent:     False\n",
      "Info:           {'created': '2020-10-25T13:23:43.618971', 'dumped': '2020-10-25T13:23:43.618971', 'task_labels': [{...}, {...}, {...}, ...], ...}\n",
      "Tags:           []\n",
      "Sample fields:\n",
      "    media_type:              fiftyone.core.fields.StringField\n",
      "    filepath:                fiftyone.core.fields.StringField\n",
      "    tags:                    fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:                fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth_detections: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "\n",
    "DATASET_DIR = \"/tmp/fiftyone-examples/cvat-image-dataset\"\n",
    "\n",
    "detection_dataset = fo.Dataset.from_dir(\n",
    "    DATASET_DIR, fo.types.CVATImageDataset\n",
    ")\n",
    "\n",
    "print(detection_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding samples to datasets\n",
    "\n",
    "Adding new samples to datsets is easy.\n",
    "\n",
    "You can [create new samples](https://voxel51.com/docs/fiftyone/user_guide/basics.html#samples):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Sample: {\n",
      "    'id': None,\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/path/to/image.jpg',\n",
      "    'tags': [],\n",
      "    'metadata': None,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "\n",
    "sample = fo.Sample(filepath=\"/path/to/image.jpg\")\n",
    "\n",
    "print(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "... [add fields dynamically](https://voxel51.com/docs/fiftyone/user_guide/basics.html#fields) to them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Sample: {\n",
      "    'id': None,\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/path/to/image.jpg',\n",
      "    'tags': [],\n",
      "    'metadata': None,\n",
      "    'quality': 89.7,\n",
      "    'keypoints': [[31, 27], [63, 72]],\n",
      "    'geo_json': {\n",
      "        'type': 'Feature',\n",
      "        'geometry': {'type': 'Point', 'coordinates': [125.6, 10.1]},\n",
      "        'properties': {'name': 'camera'},\n",
      "    },\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "sample[\"quality\"] = 89.7\n",
    "sample[\"keypoints\"] = [[31, 27], [63, 72]]\n",
    "sample[\"geo_json\"] = {\n",
    "    \"type\": \"Feature\",\n",
    "    \"geometry\": {\"type\": \"Point\", \"coordinates\": [125.6, 10.1]},\n",
    "    \"properties\": {\"name\": \"camera\"},\n",
    "}\n",
    "\n",
    "print(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "... [add labels](https://voxel51.com/docs/fiftyone/user_guide/basics.html#labels) that can be rendered on the media in the App:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Sample: {\n",
      "    'id': None,\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/path/to/image.jpg',\n",
      "    'tags': [],\n",
      "    'metadata': None,\n",
      "    'quality': 89.7,\n",
      "    'keypoints': [[31, 27], [63, 72]],\n",
      "    'geo_json': {\n",
      "        'type': 'Feature',\n",
      "        'geometry': {'type': 'Point', 'coordinates': [125.6, 10.1]},\n",
      "        'properties': {'name': 'camera'},\n",
      "    },\n",
      "    'weather': <Classification: {\n",
      "        'id': '5f95b732ad3e1adb146596d8',\n",
      "        'label': 'sunny',\n",
      "        'confidence': 0.95,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "    'animals': <Detections: {\n",
      "        'detections': BaseList([\n",
      "            <Detection: {\n",
      "                'id': '5f95b732ad3e1adb146596d9',\n",
      "                'attributes': BaseDict({}),\n",
      "                'label': 'cat',\n",
      "                'bounding_box': BaseList([0.5, 0.5, 0.4, 0.3]),\n",
      "                'mask': None,\n",
      "                'confidence': 0.75,\n",
      "                'index': None,\n",
      "            }>,\n",
      "            <Detection: {\n",
      "                'id': '5f95b732ad3e1adb146596da',\n",
      "                'attributes': BaseDict({}),\n",
      "                'label': 'dog',\n",
      "                'bounding_box': BaseList([0.2, 0.2, 0.2, 0.4]),\n",
      "                'mask': None,\n",
      "                'confidence': 0.51,\n",
      "                'index': None,\n",
      "            }>,\n",
      "        ]),\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "sample[\"weather\"] = fo.Classification(label=\"sunny\", confidence=0.95)\n",
    "sample[\"animals\"] = fo.Detections(\n",
    "    detections=[\n",
    "        fo.Detection(\n",
    "            label=\"cat\", bounding_box=[0.5, 0.5, 0.4, 0.3], confidence=0.75\n",
    "        ),\n",
    "        fo.Detection(\n",
    "            label=\"dog\", bounding_box=[0.2, 0.2, 0.2, 0.4], confidence=0.51\n",
    "        )\n",
    "    ]\n",
    ")\n",
    "\n",
    "print(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "...and add them to [datasets](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name:           2020.10.25.13.34.44\n",
      "Media type:     None\n",
      "Num samples:    0\n",
      "Persistent:     False\n",
      "Info:           {}\n",
      "Tags:           []\n",
      "Sample fields:\n",
      "    media_type: fiftyone.core.fields.StringField\n",
      "    filepath:   fiftyone.core.fields.StringField\n",
      "    tags:       fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n"
     ]
    }
   ],
   "source": [
    "dataset = fo.Dataset()\n",
    "print(dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name:           2020.10.25.13.34.44\n",
      "Media type:     image\n",
      "Num samples:    1\n",
      "Persistent:     False\n",
      "Info:           {}\n",
      "Tags:           []\n",
      "Sample fields:\n",
      "    media_type: fiftyone.core.fields.StringField\n",
      "    filepath:   fiftyone.core.fields.StringField\n",
      "    tags:       fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    quality:    fiftyone.core.fields.FloatField\n",
      "    keypoints:  fiftyone.core.fields.ListField\n",
      "    geo_json:   fiftyone.core.fields.DictField\n",
      "    weather:    fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "    animals:    fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n"
     ]
    }
   ],
   "source": [
    "dataset.add_sample(sample)\n",
    "print(dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Sample: {\n",
      "    'id': '5f95b738ad3e1adb146596dc',\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/path/to/image.jpg',\n",
      "    'tags': BaseList([]),\n",
      "    'metadata': None,\n",
      "    'quality': 89.7,\n",
      "    'keypoints': BaseList([BaseList([31, 27]), BaseList([63, 72])]),\n",
      "    'geo_json': BaseDict({\n",
      "        'type': 'Feature',\n",
      "        'geometry': BaseDict({\n",
      "            'type': 'Point',\n",
      "            'coordinates': BaseList([125.6, 10.1]),\n",
      "        }),\n",
      "        'properties': BaseDict({'name': 'camera'}),\n",
      "    }),\n",
      "    'weather': <Classification: {\n",
      "        'id': '5f95b732ad3e1adb146596d8',\n",
      "        'label': 'sunny',\n",
      "        'confidence': 0.95,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "    'animals': <Detections: {\n",
      "        'detections': BaseList([\n",
      "            <Detection: {\n",
      "                'id': '5f95b732ad3e1adb146596d9',\n",
      "                'attributes': BaseDict({}),\n",
      "                'label': 'cat',\n",
      "                'bounding_box': BaseList([0.5, 0.5, 0.4, 0.3]),\n",
      "                'mask': None,\n",
      "                'confidence': 0.75,\n",
      "                'index': None,\n",
      "            }>,\n",
      "            <Detection: {\n",
      "                'id': '5f95b732ad3e1adb146596da',\n",
      "                'attributes': BaseDict({}),\n",
      "                'label': 'dog',\n",
      "                'bounding_box': BaseList([0.2, 0.2, 0.2, 0.4]),\n",
      "                'mask': None,\n",
      "                'confidence': 0.51,\n",
      "                'index': None,\n",
      "            }>,\n",
      "        ]),\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "print(dataset.first())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with datasets\n",
    "\n",
    "You can access samples in datasts by iterating over them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [831.6ms elapsed, 0s remaining, 300.4 samples/s]      \n"
     ]
    }
   ],
   "source": [
    "dataset = classification_dataset.clone()\n",
    "dataset.compute_metadata()\n",
    "\n",
    "for sample in dataset:\n",
    "    # Do something with the sample here\n",
    "\n",
    "    sample.tags.append(\"processed\")\n",
    "    sample.save()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Sample: {\n",
      "    'id': '5f95b4a3ad3e1adb14658378',\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/tmp/fiftyone-examples/image-classification-directory-tree/airplane/000045.jpg',\n",
      "    'tags': BaseList(['processed']),\n",
      "    'metadata': <ImageMetadata: {\n",
      "        'size_bytes': 1239,\n",
      "        'mime_type': 'image/jpeg',\n",
      "        'width': 32,\n",
      "        'height': 32,\n",
      "        'num_channels': 3,\n",
      "    }>,\n",
      "    'ground_truth': <Classification: {\n",
      "        'id': '5f95b4a3ad3e1adb14658377',\n",
      "        'label': 'airplane',\n",
      "        'confidence': None,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "print(dataset.first())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "...or access them directly by ID:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample = dataset.first()\n",
    "\n",
    "same_sample = dataset[sample.id]\n",
    "\n",
    "same_sample is sample  # True: samples are singletons!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also [create views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) into your datasets that slice and dice your data in interesting ways:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:        2020.10.25.13.34.56\n",
      "Media type:     None\n",
      "Num samples:    250\n",
      "Tags:           ['processed']\n",
      "Sample fields:\n",
      "    media_type:   fiftyone.core.fields.StringField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "Pipeline stages:\n",
      "    1. SortBy(field_or_expr='filepath', reverse=False)\n"
     ]
    }
   ],
   "source": [
    "# Sort by filepath\n",
    "view1 = dataset.sort_by(\"filepath\")\n",
    "print(view1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/tmp/fiftyone-examples/image-classification-directory-tree/airplane/000045.jpg\n"
     ]
    }
   ],
   "source": [
    "print(view1.first().filepath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:        2020.10.25.13.34.56\n",
      "Media type:     None\n",
      "Num samples:    100\n",
      "Tags:           ['processed']\n",
      "Sample fields:\n",
      "    media_type:   fiftyone.core.fields.StringField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "Pipeline stages:\n",
      "    1. Take(size=100, seed=None)\n"
     ]
    }
   ],
   "source": [
    "# Random sample from a dataset\n",
    "view2 = dataset.take(100)\n",
    "print(view2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<SampleView: {\n",
      "    'id': '5f95b4a3ad3e1adb146585d0',\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/tmp/fiftyone-examples/image-classification-directory-tree/horse/005678.jpg',\n",
      "    'tags': BaseList(['processed']),\n",
      "    'metadata': <ImageMetadata: {\n",
      "        'size_bytes': 1324,\n",
      "        'mime_type': 'image/jpeg',\n",
      "        'width': 32,\n",
      "        'height': 32,\n",
      "        'num_channels': 3,\n",
      "    }>,\n",
      "    'ground_truth': <Classification: {\n",
      "        'id': '5f95b4a3ad3e1adb146585cf',\n",
      "        'label': 'horse',\n",
      "        'confidence': None,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "print(view2.first())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:        2020.10.25.13.34.56\n",
      "Media type:     None\n",
      "Num samples:    20\n",
      "Tags:           ['processed']\n",
      "Sample fields:\n",
      "    media_type:   fiftyone.core.fields.StringField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "Pipeline stages:\n",
      "    1. Skip(skip=10)\n",
      "    2. Limit(limit=20)\n"
     ]
    }
   ],
   "source": [
    "# Extract slice of a dataset\n",
    "view3 = dataset[10:30]\n",
    "print(view3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<SampleView: {\n",
      "    'id': '5f95b4a3ad3e1adb14658396',\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/tmp/fiftyone-examples/image-classification-directory-tree/airplane/004375.jpg',\n",
      "    'tags': BaseList(['processed']),\n",
      "    'metadata': <ImageMetadata: {\n",
      "        'size_bytes': 1267,\n",
      "        'mime_type': 'image/jpeg',\n",
      "        'width': 32,\n",
      "        'height': 32,\n",
      "        'num_channels': 3,\n",
      "    }>,\n",
      "    'ground_truth': <Classification: {\n",
      "        'id': '5f95b4a3ad3e1adb14658395',\n",
      "        'label': 'airplane',\n",
      "        'confidence': None,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "print(view3.first())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "View operations can be chained together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:        2020.10.25.13.34.56\n",
      "Media type:     None\n",
      "Num samples:    5\n",
      "Tags:           ['processed']\n",
      "Sample fields:\n",
      "    media_type:   fiftyone.core.fields.StringField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "Pipeline stages:\n",
      "    1. MatchTag(tag='processed')\n",
      "    2. Exists(field='metadata', bool=True)\n",
      "    3. Match(filter={'$expr': {'$gte': [...]}})\n",
      "    4. SortBy(field_or_expr='filepath', reverse=False)\n",
      "    5. Limit(limit=5)\n"
     ]
    }
   ],
   "source": [
    "from fiftyone import ViewField as F\n",
    "\n",
    "complex_view = (\n",
    "    dataset\n",
    "    .match_tag(\"processed\")\n",
    "    .exists(\"metadata\")\n",
    "    .match(F(\"metadata.size_bytes\") >= 1024)  # >= 1 kB\n",
    "    .sort_by(\"filepath\")\n",
    "    .limit(5)\n",
    ")\n",
    "\n",
    "print(complex_view)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<SampleView: {\n",
      "    'id': '5f95b4a3ad3e1adb14658378',\n",
      "    'media_type': 'image',\n",
      "    'filepath': '/tmp/fiftyone-examples/image-classification-directory-tree/airplane/000045.jpg',\n",
      "    'tags': BaseList(['processed']),\n",
      "    'metadata': <ImageMetadata: {\n",
      "        'size_bytes': 1239,\n",
      "        'mime_type': 'image/jpeg',\n",
      "        'width': 32,\n",
      "        'height': 32,\n",
      "        'num_channels': 3,\n",
      "    }>,\n",
      "    'ground_truth': <Classification: {\n",
      "        'id': '5f95b4a3ad3e1adb14658377',\n",
      "        'label': 'airplane',\n",
      "        'confidence': None,\n",
      "        'logits': None,\n",
      "    }>,\n",
      "}>\n"
     ]
    }
   ],
   "source": [
    "print(complex_view.first())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See the other examples in this folder for more sophisticated view operations!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exporting datasets\n",
    "\n",
    "You can easily [export samples](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html) in whatever format suits your fancy:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exporting a classification dataset\n",
    "\n",
    "FiftyOne natively supports exporting classification datasets as [directory trees](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#imageclassificationdirectorytree) whose subfolders encode the class labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [135.0ms elapsed, 0s remaining, 741.0 samples/s]     \n"
     ]
    }
   ],
   "source": [
    "# Create a view\n",
    "view = classification_dataset.take(100)\n",
    "\n",
    "# Export as a classification directory tree using the labels in the\n",
    "# `ground_truth` field as classes\n",
    "view.export(\n",
    "    \"/tmp/fiftyone-examples/export-classification-directory-tree\",\n",
    "    fo.types.ImageClassificationDirectoryTree,\n",
    "    label_field=\"ground_truth\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 0\r\n",
      "drwxr-xr-x  12 Brian  wheel   384B Oct 25 13:36 \u001b[34m.\u001b[m\u001b[m\r\n",
      "drwxr-xr-x   5 Brian  wheel   160B Oct 25 13:36 \u001b[34m..\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  19 Brian  wheel   608B Oct 25 13:36 \u001b[34mairplane\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  11 Brian  wheel   352B Oct 25 13:36 \u001b[34mautomobile\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  11 Brian  wheel   352B Oct 25 13:36 \u001b[34mbird\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  18 Brian  wheel   576B Oct 25 13:36 \u001b[34mcat\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  10 Brian  wheel   320B Oct 25 13:36 \u001b[34mdeer\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  14 Brian  wheel   448B Oct 25 13:36 \u001b[34mdog\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  13 Brian  wheel   416B Oct 25 13:36 \u001b[34mfrog\u001b[m\u001b[m\r\n",
      "drwxr-xr-x   7 Brian  wheel   224B Oct 25 13:36 \u001b[34mhorse\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  10 Brian  wheel   320B Oct 25 13:36 \u001b[34mship\u001b[m\u001b[m\r\n",
      "drwxr-xr-x   7 Brian  wheel   224B Oct 25 13:36 \u001b[34mtruck\u001b[m\u001b[m\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-classification-directory-tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 136\r\n",
      "drwxr-xr-x  19 Brian  wheel   608B Oct 25 13:36 .\r\n",
      "drwxr-xr-x  12 Brian  wheel   384B Oct 25 13:36 ..\r\n",
      "-rw-r--r--   1 Brian  wheel   1.1K Oct 25 13:36 002718.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   1.3K Oct 25 13:36 003006.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   898B Oct 25 13:36 003279.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   1.3K Oct 25 13:36 003343.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   1.3K Oct 25 13:36 003498.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   1.2K Oct 25 13:36 005828.jpg\r\n",
      "-rw-r--r--   1 Brian  wheel   1.1K Oct 25 13:36 006222.jpg\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-classification-directory-tree/airplane | head"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exporting a detection dataset\n",
    "\n",
    "FiftyOne natively supports exporting object detection datasets in [COCO format](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#cocodetectiondataset):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [738.2ms elapsed, 0s remaining, 135.5 samples/s]      \n"
     ]
    }
   ],
   "source": [
    "# Create a view\n",
    "view = detection_dataset.take(100)\n",
    "\n",
    "# Export in COCO format with detections from the `ground_truth_detections` field of\n",
    "# the samples\n",
    "view.export(\n",
    "    \"/tmp/fiftyone-examples/export-coco\",\n",
    "    fo.types.COCODetectionDataset,\n",
    "    label_field=\"ground_truth_detections\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 224\r\n",
      "drwxr-xr-x    4 Brian  wheel   128B Oct 25 13:39 \u001b[34m.\u001b[m\u001b[m\r\n",
      "drwxr-xr-x    6 Brian  wheel   192B Oct 25 13:39 \u001b[34m..\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  102 Brian  wheel   3.2K Oct 25 13:39 \u001b[34mdata\u001b[m\u001b[m\r\n",
      "-rw-r--r--    1 Brian  wheel   112K Oct 25 13:39 labels.json\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-coco"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 25760\r\n",
      "drwxr-xr-x  102 Brian  wheel   3.2K Oct 25 13:39 .\r\n",
      "drwxr-xr-x    4 Brian  wheel   128B Oct 25 13:39 ..\r\n",
      "-rw-r--r--    1 Brian  wheel    84K Oct 25 13:39 000014.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   128K Oct 25 13:39 000111.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   101K Oct 25 13:39 000172.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   130K Oct 25 13:39 000173.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   107K Oct 25 13:39 000270.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   185K Oct 25 13:39 000272.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   102K Oct 25 13:39 000316.jpg\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-coco/data | head"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        {\r\n",
      "            \"id\": 745,\r\n",
      "            \"image_id\": 99,\r\n",
      "            \"category_id\": 52,\r\n",
      "            \"bbox\": [\r\n",
      "                0.0,\r\n",
      "                56.0,\r\n",
      "                427.0,\r\n",
      "                479.0\r\n",
      "            ],\r\n",
      "            \"segmentation\": null,\r\n",
      "            \"area\": 76225.0,\r\n",
      "            \"iscrowd\": 1\r\n",
      "        }\r\n",
      "    ]\r\n",
      "}\r\n"
     ]
    }
   ],
   "source": [
    "!python -m json.tool /tmp/fiftyone-examples/export-coco/labels.json | tail -n 16"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exporting entire samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [835.9ms elapsed, 0s remaining, 119.6 samples/s]      \n"
     ]
    }
   ],
   "source": [
    "# Create a view\n",
    "view = detection_dataset.take(100)\n",
    "\n",
    "# Export entire samples\n",
    "view.export(\n",
    "    \"/tmp/fiftyone-examples/export-fiftyone-dataset\",\n",
    "    fo.types.FiftyOneDataset\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 496\r\n",
      "drwxr-xr-x    5 Brian  wheel   160B Oct 25 13:43 \u001b[34m.\u001b[m\u001b[m\r\n",
      "drwxr-xr-x    7 Brian  wheel   224B Oct 25 13:43 \u001b[34m..\u001b[m\u001b[m\r\n",
      "drwxr-xr-x  102 Brian  wheel   3.2K Oct 25 13:43 \u001b[34mdata\u001b[m\u001b[m\r\n",
      "-rw-r--r--    1 Brian  wheel   3.2K Oct 25 13:43 metadata.json\r\n",
      "-rw-r--r--    1 Brian  wheel   242K Oct 25 13:43 samples.json\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-fiftyone-dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 24944\r\n",
      "drwxr-xr-x  102 Brian  wheel   3.2K Oct 25 13:43 .\r\n",
      "drwxr-xr-x    5 Brian  wheel   160B Oct 25 13:43 ..\r\n",
      "-rw-r--r--    1 Brian  wheel   127K Oct 25 13:43 000026.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel    90K Oct 25 13:43 000067.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel    41K Oct 25 13:43 000107.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   128K Oct 25 13:43 000111.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel    96K Oct 25 13:43 000154.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   101K Oct 25 13:43 000172.jpg\r\n",
      "-rw-r--r--    1 Brian  wheel   185K Oct 25 13:43 000272.jpg\r\n"
     ]
    }
   ],
   "source": [
    "!ls -lah /tmp/fiftyone-examples/export-fiftyone-dataset/data | head"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\r\n",
      "    \"name\": \"2020.10.25.13.23.54-view\",\r\n",
      "    \"media_type\": \"image\",\r\n",
      "    \"sample_fields\": {\r\n",
      "        \"media_type\": \"fiftyone.core.fields.StringField\",\r\n",
      "        \"filepath\": \"fiftyone.core.fields.StringField\",\r\n",
      "        \"tags\": \"fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\",\r\n",
      "        \"metadata\": \"fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\",\r\n",
      "        \"ground_truth_detections\": \"fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\"\r\n",
      "    },\r\n",
      "    \"info\": {\r\n",
      "        \"task_labels\": [\r\n",
      "            {\r\n",
      "                \"name\": \"airplane\",\r\n",
      "                \"attributes\": []\r\n",
      "            },\r\n"
     ]
    }
   ],
   "source": [
    "!python -m json.tool /tmp/fiftyone-examples/export-fiftyone-dataset/metadata.json | head -n 16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                    {\r\n",
      "                        \"_id\": {\r\n",
      "                            \"$oid\": \"5f95b4abad3e1adb14658d5f\"\r\n",
      "                        },\r\n",
      "                        \"_cls\": \"Detection\",\r\n",
      "                        \"attributes\": {\r\n",
      "                            \"area\": {\r\n",
      "                                \"_cls\": \"NumericAttribute\",\r\n",
      "                                \"value\": 534.3859500000002\r\n",
      "                            },\r\n",
      "                            \"iscrowd\": {\r\n",
      "                                \"_cls\": \"NumericAttribute\",\r\n",
      "                                \"value\": 0.0\r\n",
      "                            }\r\n",
      "                        },\r\n",
      "                        \"label\": \"person\",\r\n",
      "                        \"bounding_box\": [\r\n",
      "                            0.5515625,\r\n",
      "                            0.42516268980477223,\r\n",
      "                            0.04375,\r\n",
      "                            0.06073752711496746\r\n",
      "                        ]\r\n",
      "                    }\r\n",
      "                ]\r\n",
      "            }\r\n",
      "        }\r\n",
      "    ]\r\n",
      "}\r\n"
     ]
    }
   ],
   "source": [
    "!python -m json.tool /tmp/fiftyone-examples/export-fiftyone-dataset/samples.json | tail -n 28"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf /tmp/fiftyone-examples"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "nbsphinx": {
   "execute": "never"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}