Using Apache Flume

Apache Flume efficiently collects and moves large amounts of streaming event data.

For more information on Apache Flume, see Flume documentation.

Apache Flume Source, Channel, and Sink Configurations

A Flume event is a unit of data flow having a byte payload and an optional set of string attributes. A Flume agent is a (JVM) process that hosts the components that events flow from an external source to the next destination (hop).

Flume source is designed to consume the events from any outside source, for example, an IOT device in a readable format to the Flume source. The format might be Avro, JSON, plain text, and so on per the configured Flume source. This data is received by the Flume sink.

When a Flume source receives an event, it's stored in a channel. The most commonly used channels are in memory channel, file channel, and Kafka channel. The channel holds the data until it's read by the sink.

The Flume sink removes the data from the channel and forwards it to another Flume source or an external storage, for example, HDFS or an Object storage for downstream processes to consume.

The following are example source, channel, and sink configurations.

Was this article helpful?