Grok Constructor

What is this about?

GrokConstructor is a helper for testing and incremental construction of regular expressions for the grok filter that parses logfile lines for Logstash.

Logstash, part of the ELK-Stack, is a tool to collect log files from various sources, parse them into a JSON format and put them into one or more databases, index engines and so forth - often elasticsearch. In the simplest case you can slurp log files from the filesystem, parse them using grok - a collection of named regular expressions - and put them into the integrated elastic search engine with a simple web frontend to search them. In my experience the hardest part is to get the regular expressions for parsing the log files right. The Grok debugger can help you test your regular expressions and provides Grok Discovery that sometimes can suggest regular expressions. This site, GrokConstructor, goes beyond that by providing an incremental construction process that helps you to construct a regular expression that matches all of a set of given log lines, and provides you a matcher where you can simultaneously try out your regular expression on several log lines. You can find the source on GitHub. If you are not comfortable with running this on a public platform / have hampered internet access, you can also run it locally / deploy it as a WAR somewhere.

Impatient? Know it?

TLDR? If you want to get started right now you can just use the main menu to go to the applications. There is a short explanation at the top, and you can call up a random example to try things out. If you want to know more beforehand, read on.

Please contact Hans-Peter Störr for bugs, suggestions or praise, or create an issue on GitHub.

The applications

Incremental Construction

The incremental construction of grok expressions aides you in a step by step construction of a grok regular expression that simultaneously matches all of a given set of log lines.

As input you provide those lines to match and select the libraries of grok patterns you want to choose from, and possibly give additional patterns.

The construction starts with \A (beginning of string) as an expression. In each step you are prompted to select either a common prefix of the yet unmatched rests of the log lines, or select one of the patterns from the grok library that matches a start segment of all rests of the log lines, or input a pattern that matches the next segment. In each case the next segment is usually the next logical entity to be parsed in the lines.

Matcher

The matcher allows you to try out grok expressions on a couple of log lines simultaneously. It shows for all lines whether they are matched, and displays the named groups that result from matching the line.

If the expression matches only a start segment of the line the unmatched rest is displayed. If a line is not matched at all we show the longest prefix of the expression that matches the line.

As input you provide those lines to match and select the libraries of grok patterns you want to choose from, and possibly give additional patterns.

(New!) Pattern Translator

This experimental service tries to generate a grok regular expression from a log4j PatternLayout format that parses the logfile output generated by that format. You will want to check and refine the pattern with the Matcher.

Warning: this is alpha state, so don't expect it to work or anything. :-) Please report problems and, if possible, make good suggestions how to translate troublesome placeholders to a appropriate grok expressions.

It would be comparatively easy to extend this to other logging libraries like logback etc. if someone comes up with good suggestions how to translate the different placeholders.

Automatic Construction

My first attempt was to try a automatic construction of grok regular expressions. Unfortunately, the algorithm generates too many results to be usable on real world examples, so this is included more for fun than for usefulness.

The algorithm starts with \A as an expression. If all the not yet unmatched parts of the log lines start with identical strings that are not alphanumeric, the longest string of such characters is appended to the expression. If not, we find for all grok patterns from the library that match all unmatched rests of the log lines simultaneously. If several patterns match exactly the same strings in every log line, they are grouped together and presented as a drop down list. Since the number of possible regular expressions grows exponentially with the length of the lines, the result list is cut off at 200 results.

Common features

All services are session-less. Thus, you can open arbitrarily many windows simultaneously without conflicts.

Applying a multiline filter or multiline codec to collect several lines that comprise one log message (e.g. the message and a stacktrace) is also supported.

Contact

If you like it please tell me, as this mightily encourages further improvements to this service. If you want to make me really happy please include suggestions, bugs, regular expressions you found useful, real life examples where the service gives particularily good or bad results. You can also create an issue on GitHub.

DISCLAIMER

This is done just for fun and you don't pay me for this, thus you get absolutely and utterly and completely no warranties of any kind to the extend permitted by law. Even if the program jumps out of your screen and chews at your leg. ;-)