モデル・メタデータの準備

モデル・メタデータはオプションですが、推奨されます。

モデルの来歴メタデータ

モデルの来歴をドキュメント化できます。これはオプションです。次の表に、サポートされているモデルの来歴メタデータを示します:


メタデータ	説明
`git_branch`	Gitリポジトリのブランチ。
`git_commit`	コミットID。
`repository_url`	リモートGitリポジトリのURL。
`script_dir`	アーティファクト・ディレクトリへのローカル・パス。
`training_id`	モデル、ノートブック・セッションまたはジョブ実行のトレーニングに使用されるリソースのOCID。次の環境変数は、OCI SDKでモデルを保存するときに使用できます: `NB_SESSION_OCID`

例

provenance_details = CreateModelProvenanceDetails(repository_url="EXAMPLE-repositoryUrl-Value",
                                                  git_branch="EXAMPLE-gitBranch-Value",
                                                  git_commit="EXAMPLE-gitCommit-Value",
                                                  script_dir="EXAMPLE-scriptDir-Value",
                                                  # OCID of the ML job Run or Notebook session on which this model was
                                                  # trained
                                                  training_id="<<Notebooksession or ML Job Run OCID>>"
                                                  )

モデル・タクソノミ・メタデータ

モデル・タクソノミをドキュメント化できます。これはオプションです。

モデル・分類に関連付けられたメタデータ・フィールドを使用すると、モデルの背後にある機械学習のユース・ケースおよびフレームワークを記述できます。定義済メタデータ・タグは、定義済メタデータのユース・ケース・タイプとフレームワーク、およびカスタム・メタデータのカテゴリ値に使用できる値のリストです。

事前設定モデル・タクソノミ

次の表に、サポートされているモデル・タクソノミ・メタデータを示します:


メタデータ	説明
`UseCaseType`	次に示すいずれかの値を使用して、モデルに関連付けられた機械学習ユース・ケースについて記述します: `binary_classification regression multinomial_classification clustering recommender dimensionality_reduction/representation time_series_forecasting anomaly_detection topic_modeling ner sentiment_analysis image_classification object_localization other`
`Framework`	次に示すいずれかの値を使用して、モデルに関連付けられた機械学習フレームワーク: `scikit-learn xgboost tensorflow pytorch mxnet keras lightGBM pymc3 pyOD spacy prophet sktime statsmodels cuml oracle_automl h2o transformers nltk emcee pystan bert gensim flair word2vec ensemble (more than one library) other`
`FrameworkVersion`	機械学習フレームワークのバージョン。これはフリー・テキスト値です。たとえば、`PyTorch 1.9`です。
`Algorithm`	アルゴリズムまたはモデル・インスタンス・クラス。これはフリー・テキスト値です。たとえば、`CART algorithm`です。
`Hyperparameters`	モデル・オブジェクトのハイパーパラメータ。これはJSON形式です。
`ArtifactTestResults`	クライアント側で実行されるアーティファクト・テストのJSON出力。

例

この例では、Metadata()オブジェクトのリストを形成する各キーと値のペアを捕捉して、モデル・タクソノミをドキュメント化する方法を示します:

# create the list of defined metadata around model taxonomy:
defined_metadata_list = [
    Metadata(key="UseCaseType", value="image_classification"),
    Metadata(key="Framework", value="keras"),
    Metadata(key="FrameworkVersion", value="0.2.0"),
    Metadata(key="Algorithm",value="ResNet"),
    Metadata(key="hyperparameters",value="{\"max_depth\":\"5\",\"learning_rate\":\"0.08\",\"objective\":\"gradient descent\"}")
]

カスタム・モデル・タクソノミ

独自のカスタム・メタデータを追加して、モデルをドキュメント化できます。定義済およびカスタム・メタデータの組合せで許可される最大ファイル・サイズは32000バイトです。

各カスタム・メタデータには、次の4つの属性があります:


フィールドまたはキー	必須か	説明
`key`	必須	カスタム・メタデータのキーおよびラベル。
`value`	必須	キーに付けられた値。
`category`	オプション	メタデータのカテゴリ。次のいずれかの値を選択します: `Performance` `Training Profile` `Training and Validation Datasets` `Training Environment` `other` カテゴリ属性は、カスタム・メタデータをフィルタする場合に役立ちます。特定のモデルに対して多数のカスタム・メタデータがある場合に便利です。
`description`	オプション	カスタム・メタデータの説明。

例

次の例は、モデルの正確性、環境およびトレーニング・データのソースを取得するためにカスタム・メタデータを追加する方法を示しています:

# Adding your own custom metadata:
custom_metadata_list = [
    Metadata(key="Image Accuracy Limit", value="70-90%", category="Performance",
             description="Performance accuracy accepted"),
    Metadata(key="Pre-trained environment",
             value="https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/",
             category="Training environment", description="Environment link for pre-trained model"),
    Metadata(key="Image Sourcing", value="https://lionbridge.ai/services/image-data/", category="other",
             description="Source for image training data")
]

モデル・データ・スキーマ定義

モデルの入力および出力データ・スキーマをドキュメント化できます。入力データ・スキーマ定義は、score.pyファイルのpredict()関数のdataパラメータのブループリントを提供します。入力データ・スキーマは、予測を成功させるためにモデルに必要な入力特徴ベクトルの定義と考えることができます。出力スキーマ定義は、predict()関数が返す内容をドキュメント化します。

重要

入力スキーマと出力スキーマを合せた最大許容ファイル・サイズは32000バイトです。

入力特徴ベクトルとモデル予測の両方のスキーマ定義は、ドキュメント化の目的で使用されます。このガイドラインは、表形式のデータセットにのみ適用されます。

モデルの入力特徴ベクトルと出力予測のスキーマはJSONオブジェクトです。オブジェクトには、schemaというキーを持つトップレベルのリストがあります。各列のスキーマ定義は、リスト内の異なるエントリです。

ヒント

特定のトレーニング・データセットから、ADSを使用してスキーマ定義を自動的に抽出できます。

各列について、次のすべての属性に値を割り当てることで、スキーマを完全に定義できます:


フィールドまたはキー	タイプ	必須か	説明
`name`	`STRING`	必須	列の名前。
`description`	`STRING`	オプション	列の説明。
`required`	`BOOL`	必須	列が、モデル予測を行うために必須の入力特徴かどうか。
`dtype`	`STRING`	必須	列のデータ型。
`domain`	`OBJECT`	オプション	特徴が取り得る許容値の範囲。

domainフィールドは、次のキーを含むディクショナリです:


フィールドまたはキー	タイプ	必須か	説明	備考
`domain.constraints`	`LIST`	オプション	特徴の許容値の範囲を制約するための述語のリストをサポートします。言語インタプリタおよびコンパイラで評価できる言語固有の文字列式テンプレートを入力できます。Pythonでは、文字列形式は`STRING`に従う必要があります。制約は、式のリストを使用して表すことができます。たとえば、`constraints=[Expression('$x > 5')]`です。	複数の制約を適用できます。式の例: schema: - description: Id domain: constraints: [] stats: 25%: 365.75 50%: 730.5 75%: 1095.25 count: 1460.0 max: 1460.0 mean: 730.5 min: 1.0 std: 421.6100093688479 values: Discreet numbers name: Id required: false type: int64 - description: MSSubClass domain: constraints: [] stats: 25%: 20.0 50%: 50.0 75%: 70.0 count: 1460.0 max: 190.0 mean: 56.897260273972606 min: 20.0 std: 42.300570993810425 values: Discreet numbers name: MSSubClass required: false type: int64 - description: MSZoning domain: constraints: - expression: '$x in ["RL", "RM", "C (all)", "FV", "RH"]' - RL - RM - C (all) - FV - RH stats: count: 1460 unique: 5 values: Category name: MSZoning required: false type: category
`domain.stats`	`OBJECT`	オプション	特徴を説明するサマリー統計のディクショナリ。 `float64`および`int64`型の場合: `X%` (Xは1から99の間のパーセンタイル値です。複数のパーセンタイル値を取り込むことができます) `count` `max` `mean` `median` `min` `std` カテゴリの場合: `count` `unique` `mode`	ADSでは、統計は特徴タイプの`feature_stat`に基づいて自動的に生成されます。
`domain.values`	`STRING`	オプション	列のセマンティック・タイプを表します。サポートされる値は次のとおりです: 離散数数値カテゴリフリー・テキスト
`domain.name`	`STRING`	オプション	属性の名前。
`domain.dtype`	`STRING`	必須	データのPandasデータ型。例: `int64 float category datettime`
`domain.dtype`	`STRING`	必須	データの特徴タイプ。例: `Category Integer LatLong,`

入力データ・スキーマの例

schema:
- description: Description of the column
  domain:
    constraints:
    - expression: '($x > 10 and $x <100) or ($x < -1 and $x > -500)' # Here user can input language specific string expression template which can be evaluated by the language interpreter/compiler. In case of python the string format expected to follow string.Template recognized format.
      language: python
    stats:  # This section is flexible key value pair. The stats will depend on what user wants to save. By default, the stats will be automatically generated based on the `feature_stat` in feature types
      mean: 20
      median: 21
      min: 5
    values: numbers # The key idea is to communicate what should be the domain of values that are acceptable. Eg rational numbers, discreet numbers, list of values, etc
  name: MSZoing # Name of the attribute
  required: false # If it is a nullable column

出力データ・スキーマの例

{
"predictionschema": [
    {
    "description": "Category of SR",
    "domain": {
    "constraints": [],
    "stats": [],
    "values": "Free text"
    },
    "name": "category",
    "required": true,
    "type": "category"
    }
    ]
}

モデル導入テスト

モデル・アーティファクトのartifact_introspection_testをアーティファクトの最上位ディレクトリにコピーします。
3.5 より大きい Pythonバージョンをインストールします。
pyyamlおよびrequests Pythonライブラリをインストールします。このインストールは1回のみ必要です。
アーティファクト・ディレクトリに移動し、アーティファクト・イントロスペクション・テストをインストールします。
```
python3 -m pip install --user -r artifact_introspection_test/requirements.txt
```

アーティファクト・パスを設定し、イントロスペクション・テストを実行します。

python3 artifact_introspection_test/model_artifact_validate.py --artifact

イントロスペクション・テストでは、ローカルのtest_json_output.jsonおよびtest_json_output.htmlファイルが生成されます。これは、JSON形式のイントロスペクション・テスト結果の例です:

{
    "score_py": {
        "category": "Mandatory Files Check",
        "description": "Check that the file \"score.py\" exists and is in the top level directory of the artifact directory",
        "error_msg": "File score.py is not present.",
        "success": true
    },
    "runtime_yaml": {
        "category": "Mandatory Files Check",
        "description": "Check that the file \"runtime.yaml\" exists and is in the top level directory of the artifact directory",
        "error_msg": "File runtime.yaml is not present.",
        "success": true
    },
    "score_syntax": {
        "category": "score.py",
        "description": "Check for Python syntax errors",
        "error_msg": "Syntax error in score.py: ",
        "success": true
    },
    "score_load_model": {
        "category": "score.py",
        "description": "Check that load_model() is defined",
        "error_msg": "Function load_model is not present in score.py.",
        "success": true
    },
    "score_predict": {
        "category": "score.py",
        "description": "Check that predict() is defined",
        "error_msg": "Function predict is not present in score.py.",
        "success": true
    },
    "score_predict_data": {
        "category": "score.py",
        "description": "Check that the only required argument for predict() is named \"data\"",
        "error_msg": "Function predict in score.py should have argument named \"data\".",
        "success": true
    },
    "score_predict_arg": {
        "category": "score.py",
        "description": "Check that all other arguments in predict() are optional and have default values",
        "error_msg": "All other arguments in predict function in score.py should have default values.",
        "success": true
    },
    "runtime_version": {
        "category": "runtime.yaml",
        "description": "Check that field MODEL_ARTIFACT_VERSION is set to 3.0",
        "error_msg": "In runtime.yaml field MODEL_ARTIFACT_VERSION should be set to 3.0",
        "success": true
    },
    "runtime_env_type": {
        "category": "conda_env",
        "description": "Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_TYPE is set to a value in (published, data_science)",
        "error_msg": "In runtime.yaml field MODEL_DEPLOYMENT.INFERENCE_ENV_TYPE should be set to a value in (published, data_science)",
        "success": true,
        "value": "published"
    },
    "runtime_env_slug": {
        "category": "conda_env",
        "description": "Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_slug is set",
        "error_msg": "In runtime.yaml field MODEL_DEPLOYMENT.INFERENCE_ENV_slug should be set.",
        "success": true,
        "value": "mlgpuv1"
    },
    "runtime_env_path": {
        "category": "conda_env",
        "description": "Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is set",
        "error_msg": "In runtime.yaml field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH should be set.",
        "success": true,
        "value": "oci://service_conda_packs@ociodscdev/service_pack/gpu/General Machine Learning for GPUs/1.0/mlgpuv1"
    },
    "runtime_path_exist": {
        "category": "conda_env",
        "description": "If MODEL_DEPLOYMENT.INFERENCE_ENV_TYPE is data_science and MODEL_DEPLOYMENT.INFERENCE_ENV_slug is set, check that the file path in MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is correct.",
        "error_msg": "In runtime.yaml field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH doesn't exist.",
        "success": true
    },
    "runtime_slug_exist": {
        "category": "conda_env",
        "description": "If MODEL_DEPLOYMENT.INFERENCE_ENV_TYPE is data_science, check that the slug listed in MODEL_DEPLOYMENT.INFERENCE_ENV_slug exists.",
        "error_msg": "In runtime.yaml the value of the fileld INFERENCE_ENV_slug doesn't exist in the given bucket."
    }
}

エラーが発生しないまで、ステップ4と5を繰り返します。

ADSを使用したイントロスペクション・テスト

イントロスペクションを手動で起動するには、ModelArtifactオブジェクトで.introspect()メソッドをコールします。

rf_model_artifact.introspect()
rf_model_artifact.metadata_taxonomy['ArtifactTestResults']

モデル・イントロスペクションの結果は、タクソノミ・メタデータおよびモデル・アーティファクトに自動的に保存されます。モデル・アーティファクトを準備するために.prepare()メソッドが起動されると、モデル・イントロスペクションが自動的にトリガーされます。

.save()メソッドは、通常、モデル・アーティファクトの準備段階でモデル・イントロスペクションを実行しません。ただし、ignore_introspectionをFalseに設定すると、保存操作中にモデルのイントロスペクションが実行されます。

Oracle Cloud Infrastructureドキュメント