The character classes specify the characters that you're
trying or not trying to match. Ensure to replace the . in your
.*s with a more specific character. The .* will
invariably shift to the end of your input and will then backtrack, that is return to a
previously saved state to continue the search for a match. When using a specific character
class, you have control over how many characters the * will cause the regex
engine to consume, giving you the power to stop the rampant backtracking.
Consider the following example regular
expression:
As the result of the
specified regular expression, the match can run into backtracking. This situation is
detected by Oracle Logging Analytics and the match
operation is aborted.
By changing the regular expression to the example
below, you can ensure that the match completes faster. Notice that [\S\s]*
is changed to [^,] which avoids unnecessary
backtracking.
In many regexes, greedy quantifiers (.*s) can be safely replaced by lazy quantifiers (.*?s), thus giving the regex a performance boost without changing the result.
Consider the input:
Trace file /u01/app/oracle/diag/rdbms/navisdb/NAVISDB/trace/NAVISDB_arc0_3941.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: NAVISDB
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Instance name: NAVISDB
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 3941, image: oracle@NAVISDB (ARC0)
Consider the following greedy regex for the given input:
The regex engine shoots to the end of the input every time it encounters .*.. The first time that the .* appears, it consumes all the input and then backtracks until it gets to ORACLE_HOME. This is an inefficient way of matching. The alternative lazy regex is as shown below:
The above lazy regex consumes starting from the beginning of the string until it reaches ORACLE_HOME, at which point it can proceed to match the rest of the string.
Note: If the ORACLE_HOME field appears toward the beginning of the input, the lazy quantifier should be used. If the ORACLE_HOME field appears toward the end, it might be appropriate to use the greedy quantifier.
Anchors
Anchors tell the regex engine that you intend the cursor to be in a particular place in the input. The most common anchors are ^ and $, indicating the beginning and end of the input.
Consider the following regexes to find an IPv4 address:
\d{1,3}\.d{1,3}.\d{1,3}.\d{1,3}
^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Notice that the second regex begins with ^ and is specific about the IP address appearing at the beginning of the input.
We're searching for the regex in input that looks like the following example:
The second regex (which starts with ^) runs faster on the non-matching input because it discards the non-matching input immediately.
The Importance of Alternation
The order of alternation counts, so place the more common options in the front so they can be matched faster. If the rarer options are placed first, then the regex engine will waste time in checking those before checking the more common options which are likelier to succeed. Also, try to extract common patterns. For example, instead of (abcd|abef) use ab(cd|ef).
2014-06-16 12:13:46.743: [UiServer][1166092608] {0:7:2} Done for
ctx=0x2aaab45d8330
The second regex matches faster as the alternation looks for character [ first, followed by null. As the input has [, the match runs faster.
Sample Parse Expressions 🔗
You can refer to the following sample parse expressions to create a suitable parse expression for extracting values from your log file.
A log file comprises entries that are generated by concatenating multiple field values. You may not need to view all the field values for analyzing a log file of a particular format. Using a parser, you can extract the values from only those fields that you want to view.
A parser extracts fields from a log file based on the parse expression that you’ve defined. A parse expression is written in the form of a regular expression that defines a search pattern. In a parse expression, you enclose search patterns with parentheses (), for each matching field that you want to extract from a log entry. Any value that matches a search pattern that’s outside the parentheses isn’t extracted.
Example 1
If you want to parse the following sample log entries:
Jun 20 15:19:29 hostabc rpc.gssd[2239]: ERROR: can't open clnt5aa9: No such file or directory
Jul 29 11:26:28 hostabc kernel: FS-Cache: Loaded
Jul 29 11:26:28 hostxyz kernel: FS-Cache: Netfs 'nfs' registered for caching