Details for Data Flow

Logging details for Data Flow Spark diagnostic logs.

Resources

  • applications

Log Categories

API value (ID): Console (Display Name) Description
all Diagnostic Includes all logs generated by the Apache Spark framework (driver and executors).

Availability

Data Flow logging is available in all the regions of the commercial realms.

Comments

Spark diagnostic logs can be enabled only at the Data Flow application level, and can't be overridden.

If you enable logging for a Data Flow application, Spark diagnostic logs are streamed for any new Data Flow run submission. Already accepted or in-progress runs aren't updated.

Contents of a Data Flow Spark Diagnostic Log

Property Description
specversion Oracle Cloud Infrastructure Logging schema version of the log.
type Log category that follows the com.oraclecloud.{service}.{resource-type}.{category} convention.
com.oraclecloud.dataflow.run.driver
com.oraclecloud.dataflow.run.executor
source Name of the resource that generated the message.
subject A specific subresource that generated the event.
id A source-unique identifier for this batch ingestion.
time Time the function output was generated, in RFC 3339 timestamp format.
oracle.logid OCID of the Oracle Cloud Infrastructure Logging log object.
oracle.loggroupid OCID of the Oracle Cloud Infrastructure Logging log group.
oracle.compartmentid OCID of the compartment the Oracle Cloud Infrastructure Logging log group is in.
oracle.tenantid OCID of the tenant.
oracle.ingestedtime Time the log line was ingested by Oracle Cloud Infrastructure Logging, in RFC 3339 timestamp format.
data[i].id Unique Identifier for this log event.
data[i].time Time when this specific log entry was generated. Must adhere to the format specified in RFC 3339.
data[i].data Non-empty data representing a log event.
data.data[i].level The logging level of the logging event.
data.data[i].message A message describing the event details.
data.data[i].opcRequestId A unique Oracle-assigned request ID generated when the Data Flow run was submitted and included in the createRun response.
data.data[i].runId The OCID of the Data Flow run whose resource (a Spark driver or executor) generated this message.
data.data[i].thread The name of the thread that generated the logging event.

Example Data Flow Spark Diagnostic Log

{
  "datetime": 1687551602245,
  "logContent": {
    "data": {
      "logLevel": "INFO",
      "message": "Execution complete.",
      "opcRequestId": "<unique_ID>",
      "runId": "ocid1.dataflowrun.oc1.ca-toronto-1.<unique_ID>",
      "thread": "shaded.dataflow.oracle.dfcs.spark.wrapper.DataflowWrapper"
    },
    "id": "<unique_ID>",
    "oracle": {
      "compartmentid": "ocid1.tenancy.oc1..<unique_ID>",
      "ingestedtime": "2023-06-23T20:20:06.974Z",
      "loggroupid": "ocid1.loggroup.oc1.ca-toronto-1.<unique_ID>",
      "logid": "ocid1.log.oc1.ca-toronto-1.<unique_ID>",
      "tenantid": "ocid1.tenancy.oc1..<unique_ID>"
    },
    "source": "Sample CSV Processing App",
    "specversion": "1.0",
    "subject": "spark-driver",
    "time": "2023-06-23T20:20:02.245Z",
    "type": "com.oraclecloud.dataflow.run.driver"
  },
  "regionId": "ca-toronto-1"
}

Using the CLI

See Enable Oracle Cloud Infrastructure Logging Spark Diagnostic Logs for an example command to enable Data Flow Spark diagnostic logging.