Creating a Dataset

Follow these steps to create a dataset in Data Labeling.

    1. Open the navigation menu, and click Analytics and AI. Under Machine Learning, click Data Labeling
    2. Click Datasets.
    3. Click Create dataset.
    4. In the Add dataset details page, populate the fields as follows:
      • Name: Give the dataset a suitable name.
      • Description: (Optional) Give the dataset a relevant description that you can use to help search for it.
      • Labeling instructions: (Optional) Enter instructions and directions for the team labeling the data.
      • Dataset format: Click Images, Text, or Documents, depending on whether you want to label images, pieces of text, or documents.
      • File type: If you select Text as the dataset format, this field is displayed. Select TXT or CSV, depending on whether you want to label a text file or a CSV file.
      • Annotation class: Select how to annotate the images, text, or documents.
        • Single labels: Categorizes images, text, or documents into one class.
        • Multiple Labels: Categorizes images, text, or documents into one or more classes.
        • Object Detection: For images only. Draws bounding boxes around object in the images.
        • Entity Extraction: For text only. Highlights and labels text into one or more classes.
        • Key Value: For documents only. Uses Document Understanding's Optical Character Recognition (OCR) to identify and extract information from documents.
      • Tags: (Optional) To apply tags to the dataset, select a tag namespace (for defined tags) and populate then specify a tag key and value. Add more tags as needed. For more information about tagging, see Overview of Tagging.
      Note

      The system generates two tags, CreatedBy and CreatedOn, when you create the dataset.
    5. Click Next.
      On the Add files and labels page, you specify whether upload the files for the dataset to Object Storage (go to step 6) or to use files that are already in Object Storage (skip to step 7).
    6. To upload the files for the dataset to Object Storage, click Upload local files and follow these steps:
      Note

      You can load no more than 100 local files at a time in the Console. The number of files selected is displayed. To load more files at a time, either load them into Object Storage before creating the dataset, or use the CLI or SDK.
      1. In the Object Storage location, specify the Object Storage destination (bucket) in which you load the local files:
        • Object Storage URL: A read-only field, already populated.
        • Compartment: Select the compartment that contains the bucket.
        • Namespace: Automatically populated based on the compartment selected.
        • Bucket: Select a the bucket from the list. If the list is long, you can choose to view all buckets. If you click it, a panel opens listing all the available buckets. If you need to create a bucket, click the link in the tool tip next to the Bucket label, which takes you to the Buckets list page in the Object Storage service. See Creating a Bucket.
        • (Optional) Prefix: Enter a prefix string added to add to the start of the files' names or paths.
      2. If the files to load are CSV format, provide the following information under Delimiter:
        • Column delimiter: Select the type of delimiter for columns. Comma is the default. If you choose Custom, enter the delimiter in Custom column delimiter.
        • Line delimiter: (Optional) Select this check box the Line delimiter and then enter a line delimiter in Custom line delimiter. If you don’t enter a value, the delimiter is detected from the CSV file.
        • Escape character: (Optional) Select this check box and then select an escape character. If you choose Custom, enter it the character in Custom escape character. If you don’t enter a value, then none of the text is escaped.
      3. Under Selected files, drag or select the files that you want to load to the bucket.
        Note

        All the files must be UTF-8 encoded and have the same column headers and indexes. If not, the dataset goes into the Needs Attention state. See Supported File Formats for the list of allowed file formats.
      4. Select a file to display a preview of its contents.
        Note

        Only the first five columns and rows are displayed of CSV files.
      5. (For CSV files.) For the column that you want to label, select its column name. If the column has no name, the index number is displayed instead.
      6. Under Add labels, enter the labels to use to annotate the dataset. After entering each label, press Enter.
      7. Click Next and skip to step 8.
    7. To load files that already exist in an Object Storage bucket, click Select from Object Storage and follow these steps:
      1. In Object Storage location specify the Object Storage destination (bucket) populate the files to use for the dataset:
        • Object Storage URL: A read-only field, already populated.
        • Compartment: Select the compartment that contains the bucket.
        • Namespace: Automatically populated based on the compartment selected.
        • Bucket: Select a the bucket from the list. If the list is long, you can choose to view all buckets. If you click it, a panel opens listing all the available buckets. If you need to create a bucket, click the link in the tool tip next to the Bucket label, which takes you to the Buckets list page in the Object Storage service. See Creating a Bucket.
        • (Optional) Prefix: Enter a prefix string added to add to the start of the files' names or paths.
        The files are listed under Selected files. See Supported File Formats for the list of allowed file formats.
      2. (Optional) If using the files are in CSV format, provide the following information under Delimiter:
        • Select the Column delimiter: Select the type of delimiter for columns. Comma is the default. If you choose Custom, enter it in Custom column delimiter.
        • Line delimiter: (Optional) Select this check box and then enter a line delimiter in Custom line delimiter. If you don’t enter a value, the delimiter is detected from the CSV file.
        • Escape character: (Optional) Select this check box and then select an escape character. If you choose Custom, enter the character in Custom escape character. If you don’t enter a value, then none of the text is escaped.
      3. Under Selected files, select a file to display a preview of its contents.
        Note

        Only the first five columns and rows are displayed for CSV files.
      4. (For CSV files only) For the column that you want to label, select its column name. If the column has no name, the index number is displayed instead.
        Note

        All the files must be UTF-8 encoded and have the same column headers and indexes. If not, the dataset goes into the Needs Attention state. See Supported File Formats for the list of allowed file formats.
      5. Under Add labels, enter the labels to use to annotate the dataset. After entering each label, press Enter.
      6. Click Next.
    8. On the Review page, verify the information that you entered. If the dataset details need editing, click Edit. If you need to go back and change any values click Edit.
    9. To create the dataset now, click Create.
      The records are generated when the dataset is created. The dataset state changes to Updating while the records are generated. Only after the records have been created do the files used appear in the dataset details page.
    10. To create the dataset later using Resource Manager and Terraform, click Save as stack to save the resource definition as a Terraform configuration.
      For information about saving stacks from resource definitions, see Creating a Stack from a Resource Creation Page.
  • Use the dataset create command and required parameters to create a dataset:
    oci data-labeling-service dataset create [OPTIONS]
    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
  • Run the CreateDataset operation to create a dataset.