{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " Try in Google Colab\n", " \n", " \n", " \n", " \n", " Share via nbviewer\n", " \n", " \n", " \n", " \n", " View on GitHub\n", " \n", " \n", " \n", " \n", " Download notebook\n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Wrangling Datasets\n", "\n", "This example provides a brief overivew of loading datasets in common formats\n", "into FiftyOne, manipulating them, and then exporting them (or subsets of them)\n", "to disk (in possbily different formats).\n", "\n", "For more details, check out the resources below:\n", "\n", "- [Loading data into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html)\n", "- [Dataset basics](https://voxel51.com/docs/fiftyone/user_guide/basics.html)\n", "- [Using dataset views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html)\n", "- [Exporting FiftyOne datasets](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's prepare some datasets to work with. Don't worry about the details for now." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Split 'test' already downloaded\n", "Loading existing dataset 'cifar10-test'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n", " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [494.2ms elapsed, 0s remaining, 505.9 samples/s] \n", "Split 'validation' already downloaded\n", "Loading existing dataset 'coco-2017-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n", " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [3.0s elapsed, 0s remaining, 90.6 samples/s] \n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "# ImageClassificationDirectoryTree\n", "dataset = foz.load_zoo_dataset(\"cifar10\", split=\"test\")\n", "dataset.take(250).export(\n", " \"/tmp/fiftyone-examples/image-classification-directory-tree\",\n", " fo.types.ImageClassificationDirectoryTree,\n", ")\n", "\n", "# CVATImageDataset\n", "dataset = foz.load_zoo_dataset(\"coco-2017\", split=\"validation\")\n", "dataset.take(250).export(\n", " \"/tmp/fiftyone-examples/cvat-image-dataset\",\n", " fo.types.CVATImageDataset,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading data into FiftyOne\n", "\n", "FiftyOne provides support for loading [many common dataset formats](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#supported-formats) out-of-the-box." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Image classification directory tree\n", "\n", "You can load a classification dataset stored as a directory tree whose subfolders define the classes of the images.\n", "\n", "The relevant dataset type is [ImageClassificationDirectoryTree](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#imageclassificationdirectorytree):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [354.0ms elapsed, 0s remaining, 706.2 samples/s] \n", "Name: 2020.10.25.13.23.47\n", "Media type: None\n", "Num samples: 250\n", "Persistent: False\n", "Info: {'classes': ['airplane', 'automobile', 'bird', ...]}\n", "Tags: []\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n" ] } ], "source": [ "import fiftyone as fo\n", "\n", "DATASET_DIR = \"/tmp/fiftyone-examples/image-classification-directory-tree\"\n", "\n", "classification_dataset = fo.Dataset.from_dir(\n", " DATASET_DIR, fo.types.ImageClassificationDirectoryTree\n", ")\n", "\n", "print(classification_dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CVAT image dataset\n", "\n", "You can load a set of object detections stored in [CVAT image format](https://github.com/openvinotoolkit/cvat).\n", "\n", "The relevant dataset type is [CVATImageDataset](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#cvatimagedataset):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [3.5s elapsed, 0s remaining, 72.3 samples/s] \n", "Name: 2020.10.25.13.23.54\n", "Media type: image\n", "Num samples: 250\n", "Persistent: False\n", "Info: {'created': '2020-10-25T13:23:43.618971', 'dumped': '2020-10-25T13:23:43.618971', 'task_labels': [{...}, {...}, {...}, ...], ...}\n", "Tags: []\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth_detections: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n" ] } ], "source": [ "import fiftyone as fo\n", "\n", "DATASET_DIR = \"/tmp/fiftyone-examples/cvat-image-dataset\"\n", "\n", "detection_dataset = fo.Dataset.from_dir(\n", " DATASET_DIR, fo.types.CVATImageDataset\n", ")\n", "\n", "print(detection_dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding samples to datasets\n", "\n", "Adding new samples to datsets is easy.\n", "\n", "You can [create new samples](https://voxel51.com/docs/fiftyone/user_guide/basics.html#samples):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import fiftyone as fo\n", "\n", "sample = fo.Sample(filepath=\"/path/to/image.jpg\")\n", "\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... [add fields dynamically](https://voxel51.com/docs/fiftyone/user_guide/basics.html#fields) to them:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "sample[\"quality\"] = 89.7\n", "sample[\"keypoints\"] = [[31, 27], [63, 72]]\n", "sample[\"geo_json\"] = {\n", " \"type\": \"Feature\",\n", " \"geometry\": {\"type\": \"Point\", \"coordinates\": [125.6, 10.1]},\n", " \"properties\": {\"name\": \"camera\"},\n", "}\n", "\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... [add labels](https://voxel51.com/docs/fiftyone/user_guide/basics.html#labels) that can be rendered on the media in the App:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'animals': ,\n", " ,\n", " ]),\n", " }>,\n", "}>\n" ] } ], "source": [ "sample[\"weather\"] = fo.Classification(label=\"sunny\", confidence=0.95)\n", "sample[\"animals\"] = fo.Detections(\n", " detections=[\n", " fo.Detection(\n", " label=\"cat\", bounding_box=[0.5, 0.5, 0.4, 0.3], confidence=0.75\n", " ),\n", " fo.Detection(\n", " label=\"dog\", bounding_box=[0.2, 0.2, 0.2, 0.4], confidence=0.51\n", " )\n", " ]\n", ")\n", "\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and add them to [datasets](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html):" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: 2020.10.25.13.34.44\n", "Media type: None\n", "Num samples: 0\n", "Persistent: False\n", "Info: {}\n", "Tags: []\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n" ] } ], "source": [ "dataset = fo.Dataset()\n", "print(dataset)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: 2020.10.25.13.34.44\n", "Media type: image\n", "Num samples: 1\n", "Persistent: False\n", "Info: {}\n", "Tags: []\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " quality: fiftyone.core.fields.FloatField\n", " keypoints: fiftyone.core.fields.ListField\n", " geo_json: fiftyone.core.fields.DictField\n", " weather: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " animals: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n" ] } ], "source": [ "dataset.add_sample(sample)\n", "print(dataset)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'animals': ,\n", " ,\n", " ]),\n", " }>,\n", "}>\n" ] } ], "source": [ "print(dataset.first())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with datasets\n", "\n", "You can access samples in datasts by iterating over them:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [831.6ms elapsed, 0s remaining, 300.4 samples/s] \n" ] } ], "source": [ "dataset = classification_dataset.clone()\n", "dataset.compute_metadata()\n", "\n", "for sample in dataset:\n", " # Do something with the sample here\n", "\n", " sample.tags.append(\"processed\")\n", " sample.save()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'ground_truth': ,\n", "}>\n" ] } ], "source": [ "print(dataset.first())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...or access them directly by ID:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample = dataset.first()\n", "\n", "same_sample = dataset[sample.id]\n", "\n", "same_sample is sample # True: samples are singletons!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also [create views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) into your datasets that slice and dice your data in interesting ways:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: 2020.10.25.13.34.56\n", "Media type: None\n", "Num samples: 250\n", "Tags: ['processed']\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "Pipeline stages:\n", " 1. SortBy(field_or_expr='filepath', reverse=False)\n" ] } ], "source": [ "# Sort by filepath\n", "view1 = dataset.sort_by(\"filepath\")\n", "print(view1)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/tmp/fiftyone-examples/image-classification-directory-tree/airplane/000045.jpg\n" ] } ], "source": [ "print(view1.first().filepath)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: 2020.10.25.13.34.56\n", "Media type: None\n", "Num samples: 100\n", "Tags: ['processed']\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "Pipeline stages:\n", " 1. Take(size=100, seed=None)\n" ] } ], "source": [ "# Random sample from a dataset\n", "view2 = dataset.take(100)\n", "print(view2)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'ground_truth': ,\n", "}>\n" ] } ], "source": [ "print(view2.first())" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: 2020.10.25.13.34.56\n", "Media type: None\n", "Num samples: 20\n", "Tags: ['processed']\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "Pipeline stages:\n", " 1. Skip(skip=10)\n", " 2. Limit(limit=20)\n" ] } ], "source": [ "# Extract slice of a dataset\n", "view3 = dataset[10:30]\n", "print(view3)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'ground_truth': ,\n", "}>\n" ] } ], "source": [ "print(view3.first())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "View operations can be chained together:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: 2020.10.25.13.34.56\n", "Media type: None\n", "Num samples: 5\n", "Tags: ['processed']\n", "Sample fields:\n", " media_type: fiftyone.core.fields.StringField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "Pipeline stages:\n", " 1. MatchTag(tag='processed')\n", " 2. Exists(field='metadata', bool=True)\n", " 3. Match(filter={'$expr': {'$gte': [...]}})\n", " 4. SortBy(field_or_expr='filepath', reverse=False)\n", " 5. Limit(limit=5)\n" ] } ], "source": [ "from fiftyone import ViewField as F\n", "\n", "complex_view = (\n", " dataset\n", " .match_tag(\"processed\")\n", " .exists(\"metadata\")\n", " .match(F(\"metadata.size_bytes\") >= 1024) # >= 1 kB\n", " .sort_by(\"filepath\")\n", " .limit(5)\n", ")\n", "\n", "print(complex_view)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ",\n", " 'ground_truth': ,\n", "}>\n" ] } ], "source": [ "print(complex_view.first())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the other examples in this folder for more sophisticated view operations!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting datasets\n", "\n", "You can easily [export samples](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html) in whatever format suits your fancy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting a classification dataset\n", "\n", "FiftyOne natively supports exporting classification datasets as [directory trees](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#imageclassificationdirectorytree) whose subfolders encode the class labels:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [135.0ms elapsed, 0s remaining, 741.0 samples/s] \n" ] } ], "source": [ "# Create a view\n", "view = classification_dataset.take(100)\n", "\n", "# Export as a classification directory tree using the labels in the\n", "# `ground_truth` field as classes\n", "view.export(\n", " \"/tmp/fiftyone-examples/export-classification-directory-tree\",\n", " fo.types.ImageClassificationDirectoryTree,\n", " label_field=\"ground_truth\"\n", ")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 0\r\n", "drwxr-xr-x 12 Brian wheel 384B Oct 25 13:36 \u001b[34m.\u001b[m\u001b[m\r\n", "drwxr-xr-x 5 Brian wheel 160B Oct 25 13:36 \u001b[34m..\u001b[m\u001b[m\r\n", "drwxr-xr-x 19 Brian wheel 608B Oct 25 13:36 \u001b[34mairplane\u001b[m\u001b[m\r\n", "drwxr-xr-x 11 Brian wheel 352B Oct 25 13:36 \u001b[34mautomobile\u001b[m\u001b[m\r\n", "drwxr-xr-x 11 Brian wheel 352B Oct 25 13:36 \u001b[34mbird\u001b[m\u001b[m\r\n", "drwxr-xr-x 18 Brian wheel 576B Oct 25 13:36 \u001b[34mcat\u001b[m\u001b[m\r\n", "drwxr-xr-x 10 Brian wheel 320B Oct 25 13:36 \u001b[34mdeer\u001b[m\u001b[m\r\n", "drwxr-xr-x 14 Brian wheel 448B Oct 25 13:36 \u001b[34mdog\u001b[m\u001b[m\r\n", "drwxr-xr-x 13 Brian wheel 416B Oct 25 13:36 \u001b[34mfrog\u001b[m\u001b[m\r\n", "drwxr-xr-x 7 Brian wheel 224B Oct 25 13:36 \u001b[34mhorse\u001b[m\u001b[m\r\n", "drwxr-xr-x 10 Brian wheel 320B Oct 25 13:36 \u001b[34mship\u001b[m\u001b[m\r\n", "drwxr-xr-x 7 Brian wheel 224B Oct 25 13:36 \u001b[34mtruck\u001b[m\u001b[m\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-classification-directory-tree" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 136\r\n", "drwxr-xr-x 19 Brian wheel 608B Oct 25 13:36 .\r\n", "drwxr-xr-x 12 Brian wheel 384B Oct 25 13:36 ..\r\n", "-rw-r--r-- 1 Brian wheel 1.1K Oct 25 13:36 002718.jpg\r\n", "-rw-r--r-- 1 Brian wheel 1.3K Oct 25 13:36 003006.jpg\r\n", "-rw-r--r-- 1 Brian wheel 898B Oct 25 13:36 003279.jpg\r\n", "-rw-r--r-- 1 Brian wheel 1.3K Oct 25 13:36 003343.jpg\r\n", "-rw-r--r-- 1 Brian wheel 1.3K Oct 25 13:36 003498.jpg\r\n", "-rw-r--r-- 1 Brian wheel 1.2K Oct 25 13:36 005828.jpg\r\n", "-rw-r--r-- 1 Brian wheel 1.1K Oct 25 13:36 006222.jpg\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-classification-directory-tree/airplane | head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting a detection dataset\n", "\n", "FiftyOne natively supports exporting object detection datasets in [COCO format](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#cocodetectiondataset):" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [738.2ms elapsed, 0s remaining, 135.5 samples/s] \n" ] } ], "source": [ "# Create a view\n", "view = detection_dataset.take(100)\n", "\n", "# Export in COCO format with detections from the `ground_truth_detections` field of\n", "# the samples\n", "view.export(\n", " \"/tmp/fiftyone-examples/export-coco\",\n", " fo.types.COCODetectionDataset,\n", " label_field=\"ground_truth_detections\"\n", ")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 224\r\n", "drwxr-xr-x 4 Brian wheel 128B Oct 25 13:39 \u001b[34m.\u001b[m\u001b[m\r\n", "drwxr-xr-x 6 Brian wheel 192B Oct 25 13:39 \u001b[34m..\u001b[m\u001b[m\r\n", "drwxr-xr-x 102 Brian wheel 3.2K Oct 25 13:39 \u001b[34mdata\u001b[m\u001b[m\r\n", "-rw-r--r-- 1 Brian wheel 112K Oct 25 13:39 labels.json\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-coco" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 25760\r\n", "drwxr-xr-x 102 Brian wheel 3.2K Oct 25 13:39 .\r\n", "drwxr-xr-x 4 Brian wheel 128B Oct 25 13:39 ..\r\n", "-rw-r--r-- 1 Brian wheel 84K Oct 25 13:39 000014.jpg\r\n", "-rw-r--r-- 1 Brian wheel 128K Oct 25 13:39 000111.jpg\r\n", "-rw-r--r-- 1 Brian wheel 101K Oct 25 13:39 000172.jpg\r\n", "-rw-r--r-- 1 Brian wheel 130K Oct 25 13:39 000173.jpg\r\n", "-rw-r--r-- 1 Brian wheel 107K Oct 25 13:39 000270.jpg\r\n", "-rw-r--r-- 1 Brian wheel 185K Oct 25 13:39 000272.jpg\r\n", "-rw-r--r-- 1 Brian wheel 102K Oct 25 13:39 000316.jpg\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-coco/data | head" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " {\r\n", " \"id\": 745,\r\n", " \"image_id\": 99,\r\n", " \"category_id\": 52,\r\n", " \"bbox\": [\r\n", " 0.0,\r\n", " 56.0,\r\n", " 427.0,\r\n", " 479.0\r\n", " ],\r\n", " \"segmentation\": null,\r\n", " \"area\": 76225.0,\r\n", " \"iscrowd\": 1\r\n", " }\r\n", " ]\r\n", "}\r\n" ] } ], "source": [ "!python -m json.tool /tmp/fiftyone-examples/export-coco/labels.json | tail -n 16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting entire samples" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [835.9ms elapsed, 0s remaining, 119.6 samples/s] \n" ] } ], "source": [ "# Create a view\n", "view = detection_dataset.take(100)\n", "\n", "# Export entire samples\n", "view.export(\n", " \"/tmp/fiftyone-examples/export-fiftyone-dataset\",\n", " fo.types.FiftyOneDataset\n", ")" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 496\r\n", "drwxr-xr-x 5 Brian wheel 160B Oct 25 13:43 \u001b[34m.\u001b[m\u001b[m\r\n", "drwxr-xr-x 7 Brian wheel 224B Oct 25 13:43 \u001b[34m..\u001b[m\u001b[m\r\n", "drwxr-xr-x 102 Brian wheel 3.2K Oct 25 13:43 \u001b[34mdata\u001b[m\u001b[m\r\n", "-rw-r--r-- 1 Brian wheel 3.2K Oct 25 13:43 metadata.json\r\n", "-rw-r--r-- 1 Brian wheel 242K Oct 25 13:43 samples.json\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-fiftyone-dataset" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 24944\r\n", "drwxr-xr-x 102 Brian wheel 3.2K Oct 25 13:43 .\r\n", "drwxr-xr-x 5 Brian wheel 160B Oct 25 13:43 ..\r\n", "-rw-r--r-- 1 Brian wheel 127K Oct 25 13:43 000026.jpg\r\n", "-rw-r--r-- 1 Brian wheel 90K Oct 25 13:43 000067.jpg\r\n", "-rw-r--r-- 1 Brian wheel 41K Oct 25 13:43 000107.jpg\r\n", "-rw-r--r-- 1 Brian wheel 128K Oct 25 13:43 000111.jpg\r\n", "-rw-r--r-- 1 Brian wheel 96K Oct 25 13:43 000154.jpg\r\n", "-rw-r--r-- 1 Brian wheel 101K Oct 25 13:43 000172.jpg\r\n", "-rw-r--r-- 1 Brian wheel 185K Oct 25 13:43 000272.jpg\r\n" ] } ], "source": [ "!ls -lah /tmp/fiftyone-examples/export-fiftyone-dataset/data | head" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\r\n", " \"name\": \"2020.10.25.13.23.54-view\",\r\n", " \"media_type\": \"image\",\r\n", " \"sample_fields\": {\r\n", " \"media_type\": \"fiftyone.core.fields.StringField\",\r\n", " \"filepath\": \"fiftyone.core.fields.StringField\",\r\n", " \"tags\": \"fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\",\r\n", " \"metadata\": \"fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\",\r\n", " \"ground_truth_detections\": \"fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\"\r\n", " },\r\n", " \"info\": {\r\n", " \"task_labels\": [\r\n", " {\r\n", " \"name\": \"airplane\",\r\n", " \"attributes\": []\r\n", " },\r\n" ] } ], "source": [ "!python -m json.tool /tmp/fiftyone-examples/export-fiftyone-dataset/metadata.json | head -n 16" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " {\r\n", " \"_id\": {\r\n", " \"$oid\": \"5f95b4abad3e1adb14658d5f\"\r\n", " },\r\n", " \"_cls\": \"Detection\",\r\n", " \"attributes\": {\r\n", " \"area\": {\r\n", " \"_cls\": \"NumericAttribute\",\r\n", " \"value\": 534.3859500000002\r\n", " },\r\n", " \"iscrowd\": {\r\n", " \"_cls\": \"NumericAttribute\",\r\n", " \"value\": 0.0\r\n", " }\r\n", " },\r\n", " \"label\": \"person\",\r\n", " \"bounding_box\": [\r\n", " 0.5515625,\r\n", " 0.42516268980477223,\r\n", " 0.04375,\r\n", " 0.06073752711496746\r\n", " ]\r\n", " }\r\n", " ]\r\n", " }\r\n", " }\r\n", " ]\r\n", "}\r\n" ] } ], "source": [ "!python -m json.tool /tmp/fiftyone-examples/export-fiftyone-dataset/samples.json | tail -n 28" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "!rm -rf /tmp/fiftyone-examples" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 4 }