Posts Tagged ‘egrep’

regular expression (e?grep) log search

Wednesday, February 27th, 2008

Regular expression is a very big word and world, they are very popupal in the last years and used at the most programming languages (perl, php), unix tools (grep, egrep, awk, sed), unix text editors (ed, vi, vim), servers (Apache, mysql, nginx).

I will demonstrate a method of searching log files using the grep/egrep tool. Most of us are familiar with the ‘grep’ tool we have in Linux/Unix systems, (Debian system has the GNU grep/egrep tool version). We apply grep tool usually when we want to find a word or a sentense in a text file; as you know grep scan the file line by line and so is the match applied, for instance if we want to find accesses to the web server on the 24 Feb 2008:

grep 24/Feb/2008 /var/log/httpd/access_log yields:

84.95.86.128 – - [24/Feb/2008:14:17:42 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
87.68.37.48 – - [24/Feb/2008:14:50:32 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
87.68.37.48 – - [24/Feb/2008:15:28:51 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:16:41:20 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:15:06 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:24:35 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:41:05 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:18:03:52 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:18:24:45 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:19:08:57 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:19:58:26 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:00:48 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:37:34 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:53:12 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:21:21:16 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:21:24:54 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:21:46:11 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:22:11:31 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:22:20:48 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:22:29:12 +0200] "GET / HTTP/1.0" 200 198 "-" "-"

If want to to match accesses on 24 Feb 2008 but only at 17 o’clock:

grep 24/Feb/2008:17 /var/log/httpd/access_log yields:

84.95.106.251 – - [24/Feb/2008:17:15:06 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:24:35 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:41:05 +0200] "GET / HTTP/1.0" 200 198 "-" "-"

But what if we want to find matches from 17 and 18 o’clock ? We can use regex class:

grep 24/Feb/2008:1[78] /var/log/httpd/access_log

This will search matches for both 17 and 18 because we used a class of possibilities.

What about matches between 17-20?

grep 24/Feb/2008:[12][07-9] /var/log/httpd/access_log

You may notice that this grep regex will also match hours as 10, 27, 28, 29. Which will cause 10 o’clock also to be matched sometimes it may be critic to match exactly what we need, therefore you need to extend the regex by using alternations:

grep "24/Feb/2008:\(1[7-9]\|20\)" /var/log/httpd/access_log

In the grep tool you have to escape the special characters with a backslash or just switching to egrep or ‘grep -E’, also the double quotes are added to escape shell special characters and to treat them as regex.

egrep "24/Feb/2008:(1[7-9]|20)" /var/log/httpd/access_log

84.95.106.251 – - [24/Feb/2008:17:15:06 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:24:35 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:17:41:05 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:18:03:52 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:18:24:45 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:19:08:57 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:19:58:26 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:00:48 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:37:34 +0200] "GET / HTTP/1.0" 200 198 "-" "-"
84.95.106.251 – - [24/Feb/2008:20:53:12 +0200] "GET / HTTP/1.0" 200 198 "-" "-"

It is also good to add the –color parametes for colorized match output.

egrep "24/Feb/2008:(1[7-9]|20)" /var/log/httpd/access_log –color

The program’s name derives from the command used to perform a similar operation, using the Unix text editor ‘ed’: g/re/p This command searches a file globally for lines matching a given regular expression, and prints them.