Analyzing HTTP packets with Wireshark and Python
I’m doing some reverse-engineering stuff and it has been quite fun so far (hopefully I’ll blog more about why I’m doing this in the future). I needed to dump some HTTP traffic and analyse the data. Of course, Wireshark comes straight to mind for something like this and it is indeed really useful. It took me some time to understand the Wireshark interface and I still think it’s hiding some great functionality from me. But anyway, I was able to set the filters I wanted and it was showing me exactly the data I wanted. But I still had to right-click the data I wanted and save it to disk, which was not ideal.
Then I thought, if people were smart enough to build such a powerful tool, they probably created a command-line interface as well, probably with scripting. Indeed they did! The command-line interface is called Tshark and the scripting is done in Lua. But I don’t know Lua and it would take too much time to learn it for this task. So I started to look a way to dump everything and then write a small script in Python to extract the data I really want. Took some time but the solution was much simpler then I thought (by the way, there are probably other solutions for this, but my Google skills were not good enough to find anything obvious).
First you run Tshark to dump any HTTP traffic to a XML file (I usually hate XML, but this time it was useful). This is what I used:
sudo tshark -i wlan0 "host 192.168.1.100 and port 45000" -d tcp.port==45000,http -T pdml > dump.xml
Of course, it all depends on what you want to dump. You should read the “man pcap-filter” to get the capture filter right and it is really useful (crucial sometimes) to only get the traffic you want. And I wanted to treat traffic in port 45000 as HTTP, so I think that is what the -d switch does
The most important thing it “-T pdml”, which tells tshark to dump in this XML format.
Next thing is to analyze in Python, which was much easier than I thought. I was only worried about the data field in the HTTP packets, but if you take a look in the dumped file, you’ll see you have information about all kind of things. My script turned out to be this:
from lxml import etree
import binascii
tree = etree.parse('dump.xml')
data = [binascii.unhexlify(e.get("value")) for e in tree.xpath('/pdml/packet/proto[@name="http"]/field[@name="data"]')]
I used lxml because I found it has great support for XPath, which is quite useful here. Also, the HTTP data is stored as a hex string, which you can easily convert with unhexlify. So, in the end I was able to automate an annoying process with just a few lines of code. And if I need anything else, it’s quite easy to expand the script. I’m quite happy with the results!
Update: Someone pointed out in the comments about Scapy (http://www.secdev.org/projects/scapy/), which by reading it’s documentation seems awesome!

Hi,
this looks to me like one of those examples for:
“If the only tool you have is a hammer, suddenly everything starts to look like a nail …”
Try scapy (http://www.secdev.org/projects/scapy/)
Hi,
this looks to me like an example for:
“If the only tool you have is a hammer, suddenly everything starts to look like a nail”
Try scapy (http://www.secdev.org/projects/scapy/)
I knew I was missing something from my Google searchs. But it’s too late for me
Thanks!
Scappy doesn’t do what I want at all. At least I wasn’t able to analyze HTTP traffic with it, I could only analyze lower-level protocols, such as TCP, etc.
Hi,
Good work…
I also need similar stuff with python and pdml. Could u send me the full python source code with sample input files . it will be really useful fore me.

Thanks in advance
Abel
That is the full Python source code
The magic is in the xpath (/pdml/packet/proto[@name="http"]/field[@name="data"]). You can access almost any XML field by XPath. If you don’t know XPath already, I recommed you read something about it, it’s worth the effort.
And any PDML file is too big to send here. Open any PDML file in a text editor, you’ll see all the XML and then you can write the XPath for the fields you want to access.
Hey Alex,
I am trying to do something similar… trouble is I am trying to do that in C. I am looking the code of snort (http://www.snort.org/) for client side. I need your help, if you could tell me how to write code to get specific fields out of the http packet…
Thank you!
Saad Rehman