networking - How to I capture HTTP header traffic in my home network

07
2014-04
  • sfactor

    I'm trying to experiment with some HTTP data for a personal project. What I would like to do is capture all the HTTP traffic that passes through my Wifi Router at home and use that data for some analysis.

    Only capturing the HTTP header information will suffice for this task. I want to monitor traffic from my laptop, phone, iPad everything. I don't have a dedicated server at home for capturing this data, so I want to use Amazon's cloud servers for this. I have an AWS account.

    So, how would I go about doing this? I'm guessing I'll need to set up a proxy of some sort, but I'm a novice at networking things.

    I have a FTTH modem from my ISP. And the Wifi router connects to it.

  • Answers
  • Scott Chamberlain

    Inserting something in your network between the router and the modem seems ideal. This could be achieved with a raspberry pi. It only has a single ethernet interface, but you could get a USB nic so that you have an incoming and outgoing interface.

    These should be bridged so that the pi does not need to participate as a router. Anything coming into the pi on one interface will go out the other. You may need a third USB nic to act as a management port that you can connect to the inside of your network.

    One approach is to have netcat running on AWS, then run tcpdump filtering out any http headers from port 80, and sending them to AWS.

    tcpdump -s 0 -U -n -w - -i br0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'  | nc AWS_IP -p 10000
    

    This is saying

    -s 0   Get the whole packet
    -U     Send output immediately, don't buffer
    -n     No name conversion with dns etc
    -w -   write to standard output
    -i br0 Listen for packets on the br0 interface (assumes the ethernet ports are bridged)
    

    The filter then pulls out any http headers. The output from this is piped into netcat, which sends it to the AWS IP address to port 10000

    And on AWS

    nc -l -p 10000 > http.pcap
    

    This sets up a listener on port 10000 and outputs anything that arrives on this port to a file called http.pcap.

    This file can then be opened using something like wireshark.

    To secure this traffic, look into tunnelling the data over ssh.

  • Malt

    There's a program called fiddler which might suit your needs without being an overkill.


  • Related Question

    command line - How do I return just the Http header from tshark?
  • tzenes

    I'm using tshark to sniff my packets and I'm only concerned with the http header (preferably in the form its sent, but I'll take what I can get).

    I tried using:

    tshark tcp port 80 or tcp port 443 -V -R "http"
    

    Which gave me the header, but also content (which I don't want as its a large amount of garbage to parse). I really only care about the header, is there any easy way to get just that (other than parsing the data myself).

    Edit: I should qualify I also care about host/port so I can keep track of requests across multiple packets.


  • Related Answers
  • heavyd

    You can use the specific HTTP header display filters to show either just the request headers, just the response headers or both.

    For just the request headers:

    tshark tcp port 80 or tcp port 443 -V -R "http.request"
    

    For just the response headers:

    tshark tcp port 80 or tcp port 443 -V -R "http.response"
    

    And for both the request and response headers:

    tshark tcp port 80 or tcp port 443 -V -R "http.request || http.response"
    

    Note: This does not filter out just the headers, just the packets that contain the headers, so you will likely still get some data, but the amount of data should be less than you would otherwise.

  • quickshiftin

    I was able to combine the answer from @heavyd and run it through a sed filter I got from an SO article - (F.J.'s answer) to cook up this baby, which filters out just the headers :)

    sudo tshark tcp port 80 or tcp port 443 -V -R "http.request || http.response" | sed -rn '/Hypertext Transfer Protocol/{:a;N;/    \\r\\n:?/{/.*/p;d};ba}' >> /tmp/filtered
    
  • Miquel Adrover

    My own filter version for easy reading:

    tshark -V -R "tcp.port ==80 && (http.request || http.response)" | awk "/Hypertext Transfer Protocol/,/Frame/ { print };/Transmission Control Protocol/{print};/Internet Protocol/{print}" | grep -v Frame
    

    This way I see only relevant IP and TCP information, without all the low level stuff, plus the complete HTTP info.