This is a quick post about one of many ways you may want to parse Microsoft DNS server logs. I this case, I simply wanted to know the top talkers. We use shell and Python in this entry on a Linux host. We follow-up with an all inclusive Python script if you want to skip to the end.
Here is the example data or you can follow along with your own:
DNS Server log file creation at 6/15/2014 6:11:48 PM UTC
Log file wrap at 6/15/2014 5:00:23 PM
Message logging key (for packets - other items use a subset of these fields):
Field # Information Values
------- ----------- ------
1 Date^M
2 Time^M
3 Thread ID
4 Context
5 Internal packet identifier^M
6 UDP/TCP indicator^M
7 Send/Receive indicator^M
8 Remote IP^M
9 Xid (hex)^M
10 Query/Response R = Response^M
blank = Query^M
11 Opcode Q = Standard Query^M
N = Notify^M
U = Update^M
? = Unknown^M
12 [ Flags (hex)^M
13 Flags (char codes) A = Authoritative Answer^M
T = Truncated Response^M
D = Recursion Desired^M
R = Recursion Available^M
14 ResponseCode ]^M
15 Question Type^M
16 Question Name^M
20140816 16:08:57 588 PACKET 019B99F0 UDP Rcv 192.168.0.2 80fd Q [0001 D NOERROR] A (3)www(1)l(6)google(3)com(0)
20140816 16:08:57 588 PACKET 019CEFF0 UDP Snd 192.168.0.2 622d Q [0001 D NOERROR] A (3)www(1)l(6)google(3)com(0)
20140816 16:08:57 588 PACKET 01C61480 UDP Rcv 192.168.0.2 622d R Q [8081 DR NOERROR] A (3)www(1)l(6)google(3)com(0)
20140816 16:08:57 588 PACKET 01C61480 UDP Snd 192.168.0.2 80fd R Q [8081 DR NOERROR] A (3)www(1)l(6)google(3)com(0)
20140816 15:51:47 588 PACKET 02131B00 UDP Snd 192.168.0.2 1b77 Q [0001 D NOERROR] A (9)messaging(9)microsoft(3)com(0)
20140816 15:51:47 588 PACKET 0242BD70 UDP Rcv 192.168.0.2 1b77 R Q [8081 DR NOERROR] A (9)messaging(9)microsoft(3)com(0)
20140816 16:28:56 588 PACKET 02447E50 UDP Rcv 192.168.0.2 6a24 Q [0001 D NOERROR] A (10)akamaiedge(3)net(0)
20140816 16:28:56 588 PACKET 01E8B070 UDP Snd 192.168.0.2 f11d Q [0001 D NOERROR] A (10)akamaiedge(3)net(0)
20140816 16:28:56 588 PACKET 01BDA5A0 UDP Rcv 192.168.0.2 f11d R Q [8081 DR NOERROR] A (10)akamaiedge(3)net(0)
20140816 16:28:56 588 PACKET 01BDA5A0 UDP Snd 192.168.0.2 6a24 R Q [8081 DR NOERROR] A (10)akamaiedge(3)net(0)
Since there is a header, cut the 28 header lines.
$ sed '1,29d' log
Convert log from Windows to Unix format to handle pesky line returns:
$ awk '{ sub("\r$", ""); print }' log > log.wintounix
Get rid of blank lines:
$ sed '/^$/d' log.wintounix > log.nolines
Python code we are going to use to parse the file we have cleaned up.
import re
from collections import Counter
with open('log.nolines') as f:
c = Counter('.'.join(re.findall(r'(\w+\(\d+\))',line.split()[-1])[-2:]) for line in f)
for domain, count in c.most_common():
print domain,count
Sort the values returned from the Python script above, modify the key as needed.
$ sort -t" " -k3 -n -r parsed > parsed.sorted
That was a lot of work to parse a file. Lets make it a little easier. Run the following with an input file: parseMSDNS.py log
#!/usr/bin/env python
import re
import sys
import fileinput
import operator
import time
ret = {}
filename = sys.argv[1]
myfile = open(filename,'r')
start_time = time.time()
with myfile as theFile:
for line in theFile:
# normalize newlines
#line = line.replace('\r\n', '\n').line.replace('\r', '\n')
# match pattern returns true of false
match = re.search(r'Q \[.+\].+\(\d+\)([^\(]+)\(\d+\)([^\(]+)',line.strip())
if match != None:
# if a match, determine the value
key = ' '.join(match.groups())
# calculate the number of key
if key not in ret.keys():
ret[key] = 1
else:
ret[key] += 1
for k in sorted(ret.keys(), key=lambda k:ret[k], reverse=True):
print "{:15} - {}".format(k, ret[k])
print time.time() - start_time, "seconds"
That should do it. Leave a comment if something is not working as expected.
Comments
comments powered by Disqus