Perform Advanced Analytics with Cluster Compare

Following are some typical scenarios for using the Cluster Compare utility. You can compare two sets of log data by reducing the duplicates and showing only the unique clusters found in each set. This can possibly find the root-cause for an issue by removing the duplicate clusters.

Topics:

For steps to use the Cluster Compare utility, see Use Cluster Compare Utility.

For the syntax and other details of the clustercompare command, see clustercompare.

Cluster Compare by Time Shift

To generate useful analytics by reducing the number of clusters to only the clusters that are unique in the current time period, then use the Time Shift option. This is the default option available with the cluster compare utility.

Consider that we want to compare the log data from the source Linux Syslog Logs collected over the current week, and the past week.

|========================|========================|
   Baseline Time Range      Current Time Range
<----Use the same query in both the time ranges---->

Select the current time range from the time selector as Last 7 days and specify the query 'Log Source' = 'Linux Syslog Logs' | cluster. For the cluster compare utility, this qualifies as the current time range and current query.

Click Cluster Compare and notice that the baseline query is the same as the current query. Also, note that the baseline time range is already selected by default, which is a week before the current week. Click Compare.


Description of cluster_compare_case1.png follows

The Cluster Compare summary is displayed as follows:

  • 10 clusters are found only in he current range
  • 248 clusters are found only in the baseline range
  • 13 common clusters are found in both the ranges

Description of cluster_compare_case1_result.png follows

Using this data, you can identify the unique potential issue in the current week, and find a root-cause. Narrow down your selection of log records to those are the cause for the potential issue.

Note: The time shift value is subtracted from the start and end of the current time. If the time shift is less than the duration of the current time, there will be an overlap. This will show all the common (duplicate) clusters from that overlap period. A message will be shown when this is detected. In such a case, the baseline query is the same as the current query.

Cluster Compare by Custom Time

If you want to compare the log data from the same source but over two custom time ranges, then use the Custom Time option in the cluster compare utility.

Consider that we want to compare the log data of the entity type Host (Linux) collected over the current time range in the month June 2019 and the baseline time range in the month August 2016.

|========================|                          |========================|
   Baseline Time Range                                   Current Time Range
<---------------->Use the same query in both the time ranges<---------------->

Select the current time range from the time selector for the period June 1, 2019 12:00 AM to June 27, 2019 8:21 PM and specify the query 'Entity Type' = 'Host (Linux)' | cluster. For the cluster compare utility, this qualifies as the current time range and current query.

Click Cluster Compare and notice that the baseline query is the same as the current query. Click the edit icon icon next to the Baseline Time Range and select Use Custom Time. Specify the custom time range Aug 15, 2016 12:00 AM to Aug 20, 2016 12:00 AM. Click Compare.

The Cluster Compare summary is displayed as follows:

  • 278 clusters are found only in he current range
  • 7 clusters are found only in the baseline range
  • 4 common clusters are found in both the ranges

Description of cluster_compare_case2_result.png follows

This analysis can enable you to compare the syslog data from the entity type over the two periods, eliminate the common clusters, and view the unique clusters. In this case, the increase in the number of potential issues from the baseline range to current time range can be analyzed by viewing the logs pertaining to the potential issues in the current time range.

Cluster Compare by Current Time

If you want to compare the logs from different sources in the same time range, then use Cluster Compare by current time and select the logs from different entity types or sources.

Consider a case where an error is reported on the node of a Rideshare application rs_host01 but not on the node rs_host03. Both the nodes can then be compared using the same time range Aug 14, 2016, 9:30:00 AM to Aug 20, 2016, 9:30:00 AM to detect variations and identify issues which can then be root-caused. Both the nodes have approximately 20,000 log records to compare and analyze.

|=================================================|
<----Baseline Time Range = Current Time Range----->
<-----------------Baseline Query------------------>
<------------------Current Query------------------>

Select the current time range from the time selector as Aug 14, 2016, 9:30:00 AM to Aug 20, 2016, 9:30:00 AM and specify the query Entity = rs_host01. For the cluster compare utility, this qualifies as the current time range and current query.

Click Cluster Compare and notice that the baseline query is the same as the current query. Click edit iconand modify the baseline query to Entity = rs_host03. By default, the baseline time range is time shifted. Click edit icon next to the baseline time range and select the option Use Current Time. Click Compare.

The Cluster Compare summary is displayed as follows:

  • 2 clusters are found only in he current range
  • 0 clusters are found only in the baseline range
  • 9 common clusters are found in both the ranges

Description of cluster_compare_case3_result.png follows

Note that in the same time range, the two Rideshare nodes have 9 common clusters, and the node rs_host01 has 2 unique clusters. Evidently, the cluster table lists the fatal error which caused the issue in the node that's analyzed.

This analysis eliminates the complexity of comparing 20,000 records from both the nodes by removing the common clusters, and identifying unique clusters resulting in fewer number of records to analyze.