lunes, 28 de julio de 2014

Analyzing logs

As I was coding my new server for apartmentsml I redirected all of my server's exceptions to the log file. After a while I remembered about the file and when I decided to check it it was already 30Mb long... So in comes linux tools to view logs. As of now the most useful is

tail -100f /myserver/logs/mylogfile

This command lets you view the log in real time as people hit your server. Well maybe not useful but quite interesting. The most traditional way to view the log is just to

cat /myserver/logs/mylogfile

However when the log is too long you might want to view it with

less /myserver/logs/mylogfile
more /myserver/logs/mylogfile

Well maybe just less since "less is more, and more". This will let you scroll up and down as if it were a read-only vim editor. Remember to exit by pressing 'q'!

If you just want to see the beginning or the end you can also do

tail -10 /myserver/logs/mylogfile
head -10 /myserver/logs/mylogfile

As you can imagine tail gives you the last n (10 in this case) lines and head the n first.
Now this is all fine but how about actually crunching some number on the logs? Imagine you logged for each call to your page the ip address as IP: xxx.xxx.xxx.xxx. So now you want to do some stats about your page. First let's count how many impressions your page has gotten:

cat /myserver/logs/mylogfile | grep -c "IP:"

For this example we used a new command called 'grep' which basically is designed to match lines of text based off on the regular expression you provide. This will then count the number of times 'IP:' occurs in the logs which will be the number of impressions of your webpage. Now more interesting yet let's grab all the unique ip's

cat /myserver/logs/mylogfile | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -u

You can now use this ips to track where people are accessing your page from. This regex might seem complicated but it is simple once you understand that '[]' tells which characters to match and '{}' tells how many times these characters will appear. So in principle the regex used will also match 999.999.999.999 which is not correct but I think we can live with it. Finally sort -u sorts your IPs and deletes repetitions. If you do not want to sort them but delete repetitions then you can use uniq.

No hay comentarios:

Publicar un comentario