Importing Datasets

Importing datasets lets you reuse datasets within the same tenancy, or merge and replace content, without the need to create a dataset from scratch.

Note

From a local directory, you can import a maximum of 201 files in a dataset, and the dataset can be no more than 4.9 GB in size. If the number of files or the dataset size exceeds these values, upload the folder to Object Storage. The following formats are supported:
Supported Dataset Formats and Structures
Format Dataset Type Annotation Type File Structure Maximum File Count and File Size
JSONL

Metadata

  • Data Labeling JSONL consolidated
  • Compact JSONL
Record
  • JPEG
  • JPG
Image
  • Single label
  • Multi-label
  • Object detection
prefix/
├── *.jsonl
├── image-1.jpg
├── image-2.jpg
└── ...
  • Metadata: 1 file, 15 MB
JSONL

Metadata

  • Data Labeling JSONL consolidated
  • Compact JSONL
Record
  • TXT
Text
  • Single label
  • Multi-label
  • NER
prefix/
├── *.jsonl
├── Textfile-1.txt
├── Textfile-2.txt
└── ...
  • Metadata: 1 file, 15 MB
JSONL

Metadata

  • Data Labeling JSONL consolidated
  • Compact JSONL
Record
  • PDF
  • TIF
  • TIFF
Document
  • Single label
  • Multi-label
prefix/
├── *.jsonl
├── document-1.pdf
├── document-2.pdf
└── ...
  • Metadata: 1 file, 15 MB
COCO

Metadata

  • JSON
Record
  • JPEG
  • JPG
Image Object detection
prefix/
├── *.json
├── image-1.jpg
├── image-2.jpg
└── ...
  • Metadata: 1 file, 9 MB
YOLO v5

Metadata

  • YAML
  • YML
Record
  • Image
    • JPEG
    • JPG
    • TIFF
  • Label
    • TXT
Image Object detection
prefix/
├── *.yml
├── train
│   ├── images
│   │   ├── image-1.jpg
│   │   ├── image-2.jpg
│   │   └── ...
│   ├── labels
│   │   ├── image-1.txt
│   │   ├── image-2.txt
│   │   └── ...
  • Metadata: 1 file, 5 MB
PASCAL VOC

Metadata

  • XML
Record
  • JPEG
  • JPG
Image Object detection
prefix/
├── annotation1.xml
├── annotation2.xml
├── annotation3.xml
├── ....
├── image-1.jpg
├── image-2.jpg
├── image-3.jpg
└── ...
  • Metadata: 100 files, 5 MB each
spaCy Text NER
prefix/
└── dataset-file.json
  • JSON: 1 file, 210 MB
CoNLL 2003 Text NER
prefix/
└── dataset-file.conll
  • CONLL: 1 file, 75 MB

For more information on supported file types and sizes, see Supported File Formats.

Sample Metadata Files Contents

Sample file contents for each of the metadata file options.

Data Labeling JSONL Consolidated
{"id":"<Dataset OCID>",
"compartmentId":"<Compartment OCID>",
"displayName":"<Dataset Name>",
"description":"<Dataset Description>",
"labelsSet":[{"name":"<Label Name>"},{"name":"<Label Name>"},...],
"annotationFormat":"<SINGLE_LABEL/MULTI_LABEL/BOUNDING_BOX/ENTITY_EXTRACTION>",
"datasetSourceDetails":{"namespace":"<Namespace>","bucket":"<Bucket>"},
"datasetFormatDetails":{"formatType":"<IMAGE/TEXT/DOCUMENT>"}
}
 
{"id":"<Record OCID>",
"timeCreated":"<Created datetime>",
"sourceDetails":{"sourceType":"OBJECT_STORAGE","path":"<Path of recrod file>"},
"annotations":[{"id":"<Annotation OCID>",
"timeCreated":"<Created datetime>",
"createdBy":"<User OCID>",
"entities":[{"entityType":"<GENERIC/IMAGEOBJECTSELECTION...>",
"labels":[{"label_name":"<Label Name>"},{"label_name":"<Label Name>"},...],
"boundingPolygon<IN CASE OF BOUNDING_BOX>":{"normalizedVertices":[{"x":"0.1752872","y":"0.18566811"},...]}}]}]
}
 
...other record objects
Compact JSONL
{"labelsSet":[{"name":"<Label Name>"},
{"name":"<Label Name>"},...],
"annotationFormat":"SINGLE_LABEL/MULTI_LABEL/ENTITY_EXTRACTION",
"datasetFormatDetails":{"formatType":"TEXT"}
}
 
{"sourceDetails":{"path":"<Path of text recrod file>"},
"annotations":[{"entities":[{"entityType":"GENERIC","labels":[{"label_name":"<Label Name>"},...]}]}]
}
 
...other record objects 
COCO
{
  "info": {
    "year": "<Year>",
    "version": "1",
    "description": "<Dataset description>",
    "contributor": "",
    "url": "<URL>",
    "date_created": "<Created datetime>"
  },
  "licenses": [
    {
      "id": 1,
      "url": "",
      "name": "Unknown"
    }
  ],
  "categories": [
    {
      "id": 0,
      "name": "animals",
      "supercategory": "none"
    },
    {
      "id": 1,
      "name": "cat",
      "supercategory": "animals"
    },
    {
      "id": 2,
      "name": "dog",
      "supercategory": "animals"
    }
  ],
  "images": [
    {
      "id": 1,
      "license": 1,
      "file_name": "<Record file path>",
      "height": 500,
      "width": 400,
      "date_captured": "<Captured datetime>"
    },
    ...
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [84, 44, 282.5, 143],
      "area": 40397.5,
      "segmentation": [],
      "iscrowd": 0
    },
    ...
  ]
}
YOLO v5
train: ../train/images
nc: 4
names: ["Label1", "Label2", "Label3", "Label4", "..."]
PASCAL VOC
<annotation>
    <folder/>
    <filename>recordFile.jpg</filename>
    <path>/n/Namespace/b/Bucket/o/recordFile.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>3800</width>
        <height>2534</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>LabelName</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <occluded>0</occluded>
        <bndbox>
            <xmin>186.94249</xmin>
            <xmax>1878.6903</xmax>
            <ymin>330.67606</ymin>
            <ymax>1396.7037</ymax>
        </bndbox>
    </object>
    <object>....</object>
    ...
</annotation>
spaCy
Example 1:
[
  {
    "content": "<Text Content>",
    "entities": [
       {
        "start": 0,
        "end": 29,
        "labelName": "<Label Name>"
      },
      {
        "start": 65,
        "end": 86,
        "labelName": "<Label Name>"
      },
      {
        "start": 80,
        "end": 104,
        "labelName": "<Label Name>"
      },
      ...
    ]
  },
  ...
]
Example 2
[
  {
    "text": "<Text Content>",
    "entities": [
      [0, 12, "<Label Name>"],
      [78, 91, "<Label Name>"],
      ...
    ]
  },
  ...
]
CoNLL 2003
-DOCSTART- -X-O
This -X- _ B-Label1
is -X- _ I-Label1
sample -X- _ I-Label1
data, -X- _ I-Label1
and -X- _ O
new -X- _ O
data -X- _ O
  
information -X- _ O
new -X- _ B-Label1
sample -X- _ I-Label1
Data -X- _ O
...