Initialization

init ( engine = 'ray' , engine_opts = None , logger = None , loglevel = 30 , cache_directory = None , dataset_cache_enabled = True )

Initialize the AutoMLx framework’s execution engine. AutoMLx can work with a variety of parallelization platforms.

Parameters
  • engine ( str , default='ray' ) –

    Name of the parallelization framework. Can be one of:
    • 'ray' : Use ray multiprocessing framework

    • 'local' : Use Python’s inbuilt multiprocessing

    framework. - 'threading' : Use Python’s inbuilt multithreading framework.

  • engine_opts ( dict or None , default=None ) –

    Options for the parallelization framework. When engine is:
    • 'ray' : a dictionary with the following keys
      • "n_jobs" ( int ), degree of inter-model

      parallelism * "model_n_jobs" ( int ), the degree of intra-model parallelism * "ray_setup" ( dict ), specifies the arguments to pass to ray.init * "cluster_mode" ( bool ) specifies whether Ray should detect a running cluster on the node and connects to is. Needs to be set both for head and worker nodes. * “enable_object_spilling” ( bool , by default False ), determines if ray object spilling is enabled. If object spilling is enabled and no further object spilling configuration is provided in ray_setup , the object spilling directory is automatically set to the secure AutoMLx caching directory.

    • 'local' : engine_opts is of the form

    {'n_jobs' : val1, 'model_n_jobs' : val2} , where val1 is the degree of inter-model parallelism and val2 is the degree of intra-model parallelism. - 'threading' : engine_opts is of the form {'n_jobs' : val} , where val is the degree of parallelism.

  • logger ( logging.Logger , str or None , default=None ) –

    Logging mode. One of
    • None : Log to console with specified loglevel (by default

    logging.WARNING ). - str : Log to the provided file path and console. - logging.Logger : Use existing Logger object.

  • loglevel ( int or None , default=``logging.WARNING`` ) –

    Log level is derived from the python logging module, and adjusts the logging verbosity in the following increasing order:

    • logging.CRITICAL < logging.WARNING < logging.INFO < logging.DEBUG .

    • Set to None to avoid any logging initialization and use the

    current logging module configuration. - Setting the loglevel here does nothing if the root logger already has handlers configured. The parameter is also ignored if a logging.Logger object is passed to the logger parameter, or the AutoMLx package has already been configured with a different loglevel.

  • cache_directory ( str or None , default=None ) –

    Cache directory to be used to store intermediate results of AutoMLx.

    • If a path is provided here, the user is responsible for

    managing the directory. - If cache_directory is None , the cache is created as a temporary directory and cleaned-up by AutoMLx. - The caching directory location may also be controlled by setting the TMPDIR environment variable, which will serve as a parent directory to the AutoMLx cache (please ensure the environment variable is set before AutoMLx is imported, for example by running your python script as TMPDIR=/path/to/dir python3 run_automlx.py ). - The caching directory is cleared at the end of the execution of the python process or when the AutoMLx engine is explicitly shutdown via automlx.shutdown() . The cache may not be cleared if the process is terminated abruptly (for example, by a SIGTERM event). - If guaranteed cleanup of the temporary files and directories is desired, a cleanup EXIT trap may be utilized. For example, it the AutoMLx cache_directory is set to /tmp/mydir , a cleanup EXIT trap can be defined at the top of a shell script running the AutoMLx python scripts as trap “rm -f /tmp/mydir” EXIT .

  • dataset_cache_enabled ( bool , default=True ) – If the dataset cache is enabled, transformed versions of the data may be stored to disk (to the AutoMLx cache directory) to speed-up subsequent transformations of the same data.