Metric Component¶
Metric components are responsible for calculating all statistical metrics and algorithms of the data. These includes metrics from different groups like data integrity, summary or quality, drift score, or performance. Metrics are calculated on the input data and can produce output which can be scalar, list, or a dictionary. Different metrics might expect a specific type of input data (for example, the model performance metric needs the prediction and target column type), or a specific variable type (for example, regression performance expects prediction and target to be continuous variable types). They might be incompatible with specific data types (for example, count metric can’t work with string data type) and might require data from different dimensional partitions (for example, drift detection might need data from a different time frame to generate a score).
While metrics can be quite diverse and fall into different groups based on their use case, they all use the same interface. In the next sections you see what different types of metric there are, how to use them, and we will briefly discuss the internal workings of them.
Types of Metric¶
Currently, ML Insights support two distinct types of metric, the univariate and the dataset metric. Univariate metrics only expect a single feature as input and provide some form of statistics on them, for example, sum, mean or kurtosis. Dataset metrics, take more than one input feature (which can extend to all columns). Some examples for dataset metrics are a metric that describes the entire data set like the number of rows, or the data or column type of a dataset or multivariate metric like correlation or model performance.
The types of metric are captured in the MetricDetail class.
class MetricDetail:
univariate_metric: Dict[str, List[MetricMetadata]]
dataset_metrics: Optional[List[MetricMetadata]]
These metric have to be passed in the proper way for the framework to behave correctly. The dataset level metric must be passed in the dataset_metric list and the univariate metric must be passed to the univariate_metric list.
How to use¶
In this section we see how to construct a metric and pass it to the builder object.
Hint
If no metric is passed to the builder, the builder can automatically process features with a specific set of metrics heuristically. This is done based on the data type, variable type, and column type of the feature.
- Import the right metric class we want to use
from mlm_insights.core.metrics.kurtosis import Kurtosis from mlm_insights.core.metrics.max import Max from mlm_insights.core.metrics.mean import Mean from mlm_insights.core.metrics.min import Min from mlm_insights.core.metrics.mode import Mode from mlm_insights.core.metrics.range import Range from mlm_insights.core.metrics.is_quasi_constant_feature import IsQuasiConstantFeature
- Import some needed dependencies
from mlm_insights.core.metrics.metric_metadata import MetricMetadata from mlm_insights.builder.builder_component import MetricDetail
- Construct metricmetadata class for metrics we want to use with proper parameters (if any) and push it in a list
metrics = [ MetricMetadata(klass=Max), MetricMetadata(klass=Min), MetricMetadata(klass=Mean), MetricMetadata(klass=IsQuasiConstantFeature), MetricMetadata(klass=Kurtosis), MetricMetadata(klass=Mode), MetricMetadata(klass=Range) ]
- Create a dictionary with keys as the features and metrics as the MetricMetadata list (We have taken example features from the iris dataset)
uni_variate_metrics = { "sepal length (cm)": metrics, "sepal width (cm)": metrics, "petal length (cm)": metrics, "petal width (cm)": metrics }
- Create the MetricDetail object with the univariate and dataset metric dictionary/list we just created
metric_details = MetricDetail(univariate_metric=uni_variate_metrics, dataset_metrics=[])
- Pass on the MetricDetails object to the corresponding api in the builder object
InsightsBuilder().with_metrics(metrics=metric_details)
Note
Note: Instead of creating the actual metrics, we passed on a different construct called MetricMetadata. This is because, while the actual logic and code of calculating the metric remains with the Metric class, the framework controls the entire lifecycle of metric. Hence, based on when a metric needs to be created, the runner object constructs the actual metric object by itself.
How metric works¶
In this section we briefly discuss some important aspects of metrics. This gives some more insight into how metrics work and how they can scale.
All metrics in ML Insights must fulfill the following functionality:
Mergeable - The metrics must be mergeable. Two metrics of the same type calculated on two sets of data (say D1, D2) can be merged to provide a metric that represents the metric score of the combined data (D1 + D2).
Serializable - Metrics should be able to serialise their current state.
De-serializable - Metrics can be de-serialized to a previously stored state.
Single pass - All metrics are single pass, as the runner reads through the input data only once.
Approximate or accurate - Metrics can be approximate or accurate. The type of score a metric produces can be identified from the API documentation.
No input data persistence - Metrics do not store subsets of raw data in memory except for the one being processed. If the input data is partitioned into a number of parts, only one part is be in memory at any time.