one line log parser

ruby-one-liners

ruby global variables

log文件形如:

method=GET path=/v1/foods/near_by_want1 format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20
method=GET path=/v1/foods/near_by_want2 format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20
method=GET path=/v1/foods/near_by_want3 format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20
method=GET path=/v1/foods/near_by_want  format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20
method=GET path=/v1/foods/near_by_want4 format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20
method=GET path=/v1/foods/near_by_want  format=html controller=api/v1/foods action=near_by_want status=200 duration=2005.69 view=0.20

得到near_by_want的平均处理时间

grep 'near_by_want' production.log-20130827 | awk '{print $7}' | ruby -n -e '@sum ||= 0; $_ =~ /^duration=(.+)/; @sum += $1.to_f; END { puts @sum / $. }'

gawk analyze log file

今天有个需求,需要分析服务器上面的log日志,找出其中响应时间大于400ms的请求。

log日志大概的样子如下:

Started GET "/xxx/xxx/l1/pnxx9" for 218.107.18.137 at Mon May 07 10:32:00 +0800 2012
  Processing by XXController#action as HTML
  Parameters: {"x"=>"xxx", "xx"=>"xxxx", "x"=>"xxxx", "x"=>nil}
...
...
...
... lot of render
Completed 200 OK in 222ms (Views: 166.1ms | ActiveRecord: 0.0ms)

每段记录是以空行隔开的,文件比较大,大概在二百兆的样子。

我用gawk来处理:

gawk -vRS= -F'\n' 'split($NF,a," ") match(a[5],/([0-9]*)/,b) {if (b[1] > 400) print $1,"\n",$2,"\n",$NF,"\n"}' production.log > analyze.log

处理时间不到一分钟,还是可以的。