USPTO - Second Generation DPI-transcription - Gerald Combs

Hello everyone. I’m Gerald Combs I’m the creator and lead developer of Wireshark. I’m also the CFO of the Wireshark Foundation which is a nonprofit set up to support Wireshark. Today I will be talking about second-generation DPI and how Wireshark fits into that field.

Wireshark and second-generation DPI‚ start where Sniffer left off and if you recall that was discussed in Len Schustek’s presentation‚ and it starts in the nineteen nineties. In the nineties, two important developments shaped not only Wireshark but computing in general. Off-the-shelf hardware‚ became much more powerful and much more capable. And because of that, you‚ had less of a need for specialized hardware to capture network packets. As an example, on the screen, you can see a title from a bug report that the Wireshark project received, and somebody was complaining that they were unable to open a capture file with more than fifty-four million packets. Now when I first started the project, the very notion of having that many packets in a capture file was pretty much unheard of. But over time computers have become more capable and that’s a reasonable expectation nowadays. And we did end up fixing the bugs so you can open and inspect as many‚ packets as your computer’s memory will support.

Another thing that happened‚ was that the internet became usable in the nineties and that allowed for large-scale distributed collaboration and work on various projects. This had a big effect on how software was developed because you weren’t limited to developing software with just a few people together in the same room. You could develop software in a distributed fashion with people working on the same code around the world and this allowed a bunch of projects to rise, and it allowed‚ open-source development itself to arise, and you ended up with‚ projects that are still around to today like the Linux kernel‚ or the Apache Web server or the VLC video player, and Wireshark is one of those projects.

There’s a quick history of Wireshark. I‚ used a sniffer as I said you know Len covered that well in the previous presentation. But I used sniffer early on in my career I worked at a university and needed to manage the network there and help troubleshoot it. At one point I changed jobs, and I went to work for a small ISP that didn’t have a budget for a sniffer. I ended up writing my own and I needed one that ran on Linux and Solaris there weren’t many products that supported those two platforms very well at least products that had a graphical user interface. I ended up writing something called Ethereal‚ which‚ ultimately became Wireshark and we ended up having to change the name. And since this talk is for the patent and trademark office, I will point out that when I learned about the importance of retaining your trademark and then protecting it.

The project grew after I released it and became very popular initially with developers but then we built up a very strong user community and it grew into what we have today. The numbers on this slide‚ show two things one is that Wireshark is very popular, we get about a million downloads a month and we have a couple of yearly conferences where people show up and are very enthusiastic about the project. But the other thing that shows is that the product is very powerful. We support‚ the numbers are here kind of old. We support close to three thousand protocols now and over two hundred and fifty thousand display filter fields, and we’ve gotten contributions from I think twenty-two or twenty-three hundred authors. I’ll explain what protocols and fields mean in a bit but‚ these just show that the product has grown and has a lot of capabilities now that that people find very useful.

So, what does Wireshark do? Wireshark takes packets, and packets are just blobs of numbers that look like this if you know display them in as hexadecimal. But if you look at this packet, there are different numbers in different places, and those numbers‚ determine how the packet‚ moves across the network and how it’s handled as it moves across the network. These numbers also determine‚ what happens when‚ the machine at the end you know the recipient‚ receives this packet and what happens to it then. You can memorize what all these numbers and all these different places do, and some people do memorize it, or you can take packets like this and feed them to Wireshark.

Wireshark‚ takes all these numbers and, uses a process that we call dissection‚ And‚ through that process, Wireshark‚ creates a bunch of display filter fields I’ll talk about the fields in a bit. But it‚ shows you all the numbers in this packet in a format that humans can work with and analyze and use to troubleshoot and gain knowledge about a network. This also‚ lets us within the Wireshark application, add‚ powerful features that that you know that that users do power level analysis. In this particular example, we’ve taken the first six bytes which are the destination Ethernet address and that’s shown in the destination field‚ little further in the bytes at zero eight zero zero are the Ethernet type in this case it’s IPv4 a little further down, there’s a hex value zero six which is the protocol value for IPv4. In this case, it’s TCP and so on. Wireshark takes all these numbers and breaks them down into all these fields.

Wireshark also‚ will show you fields that aren’t in the packet I mean that you can do things like take an IP address and look it up in‚ a geolocation database and it’ll show you the location info information for that packet. Wireshark can compare this packet with other packets in the capture and‚ show you how far into a TCP stream you are and what the next TCP sequence number is for the case of that particular protocol‚ But the goal here is to add as much‚ useful information to the view that people are looking at so that they can‚ do more effective analysis and more informed analysis.

I keep talking about protocols and fields. Network protocols are a‚ defined way for computers to talk to each other TCP lets you send streams of data back and forth HTTP lets you deliver information that’s in a webpage and so on. Wireshark takes‚ all those bits that are that are described in protocol‚ specifications and assigns fields to them. In this case, fields have a bunch of information attached to them like the name the type of the field whether it’s text or a number or an address, the size of the field, and all sorts of other information, as much information as we can reasonably assign to each field and retain performance and yet be useful to users. The number you see on the screen is two hundred thousand but the source code that I have on my machine right now supports over two hundred and fifty thousand fields in Wireshark. Wireshark is continually growing and then so we’re constantly adding display filter fields to the product‚

If we take that process of dissection and kind of place it in a typical, workflow that Wireshark‚ uses you end up with something like this. You know the network packets come to Wireshark they’re showing on the left you know it’s just these blobs and bites. They are‚ first pulled into a capture library‚ called the libpcap. libpcap is a separate library that a lot of projects use, and it supports its own filtering language and filtering engine. You can restrict what packets come into Wireshark‚ at that stage if you want to do that. Within Wireshark, we go through the process of dissection which I talked about earlier and we break down each packet into it’s all its component fields. At that point, the user can now look at those fields if they want, or‚ they can then use a higher-level analysis feature within Wireshark and a lot of those features‚ use those display folder fields‚ and leverage them‚ at that point you know you can do whatever analysis task that you need to do.

Who uses a Wireshark? Well as it says on the slides computers’ networks don’t make themselves fast reliable and secure. We depend on people to do that and those people‚ oftentimes‚ use Wireshark to do that job or those particular jobs. Wireshark is used in a lot of fields of network administration, and security research whether you’re on the‚ penetration side or the defense side of that equation typically you’re going to use Wireshark. Educators use Wireshark to teach people how computer networks work, and the product developers do as well. An example of that is shown here on the screen, there is a‚ relatively new protocol called QUIC which modern web browsers use to transfer‚ web data. As that protocol was developed it went through different draft stages as it was developed, refined, enhanced and problems were fixed. Wireshark‚ followed along in the development of this protocol and supported each draft as it was added. This allowed people who were developing web browsers and web servers to test, verify, and troubleshoot the products that they were making and ensure that they were working correctly, and securely. This has been the case with a bunch of different protocols, but I think QUIC is probably the most recent famous example of that.

As I mentioned educators use Wireshark if you look on YouTube you can find all sorts of‚ tutorials on how to use Wireshark and use it effectively. For instance, DPI Consortium Advisory Board members like Laura Chappell and Mike Pennacchi, are great examples of educators who you know have Wireshark videos on YouTube but there are others as well. There are different blog posts, Julia Evans‚ makes, what are called zines and she has a couple about Wireshark and how to use it. As I said you can find a lot of examples if you just do a simple Google search‚ on how to use Wireshark and find different tutorials and information‚ and informative blog posts.

So‚ earlier I mentioned that Wireshark‚ shows you what’s in network packets and you know on the screen I showed you know just a blob of arbitrary bytes. But the thing is Wireshark’s dissection engine doesn’t care if you’re looking at network data or not. You can also feed it non-network data. As shown on the top example you can feed it a JPEG and you can see the contents and the structure of that file. And, you know if you have a corrected JPEG you can kinda look at it and maybe find how where things are going wrong and maybe fix that same with PNG files.

Wireshark supports‚ USB and Bluetooth analysis‚ we don’t think of it that way but see the‚ USB and Bluetooth buses on our computers are little, tiny networks. They’re constantly sending messages back and forth and Wireshark can capture those and analyze that data. A lot of people do use Wireshark for USB and Bluetooth troubleshooting.

Finally, ‚ Wireshark can look at log data. One of the things I’m working on in my day job is adapting Wireshark to look at log data for instance in this case you can see Wireshark is analyzing‚ AWS Cloud Trail log data. Cloud Trail is a product that Amazon‚ offers for their customers that lets you track what’s going on and you’re computing your environment in AWS and a lot of these environments are very large and complex. You can pull that cloud trail data down and then look at it within, Wireshark and assist your application called Logry‚

I’ve been talking about Wireshark but I, haven’t shown you what it does, maybe I should rectify that. Let me switch over to‚ the application. This is what Wireshark looks like if you open it up. I have opened some captures before so up here at the top there’s a list of recently open capture files and down below there are there is a list of interfaces as you can see some of the interfaces have traffic and you can capture on those interfaces if you want just by double clicking. But for this, demonstration I’m going to go ahead and open up a capture file. I’m going to maximize the screen as well and maybe make things a little bit easier to look at.

This is what a network capture looks like within Wireshark. Up at the top, you have a list of packets, you know packet number one came out at relative time zero, the next one came out at relative time I think at eleven milliseconds‚ and so on. I can click on a packet and use my up and down arrow to move up and down the packet list and go from packet to packet. If I go to the bottom half of the screen it’s divided into two halves on the left half‚ is the packet detail and this shows you all the display filter fields, I was talking about each packet having all these different fields. I can go and browse through those and inspect those. On the right side, we’re looking at the actual raw bytes that the packet is comprised of, and I can highlight them I can click on it and see what field that corresponds to. If I‚ go back to the packet detail, I can‚ filter on things for instance if I go to the source address here on this packet‚ I can right-click on that‚ and show only packets with‚ that source IP address and I know that I’m doing that because if I go up to the top of the window I can see in this little text area called the display filter entry field and I can type in filters that are built on these display filters I can build up very complex and hairy display filters and get down to the exact traffic that I’m looking for.

I can also clear out that display filter. You have no doubt noticed that the‚ packet list is shown in all these different garish colors and that’s because‚ Wireshark uses those display filter fields to apply what are called coloring rules, if I bring up the coloring rules list‚ you can see that I have all these different rules defined and they have a name and they also have the filter defined. What happens as Wireshark reads through the capture‚ it will check this list of coloring rules and based on that list of rules show a packet and any matching colors‚

The TCP Protocol lets you send streams of data that are broken down into separate packets and in this case, if I go ahead and right-click on the TCP packet, I can look at the contents of a stream and see that I’m getting a lot of basically binary data here. But if I go down to the very bottom, I can see that the client sent the data in red and the web server‚ sent a response a 404 unknown response saying I received this, but I have no idea what you’re talking about, and I don’t know what to do here. That might be useful information and it might not, but it does show you where a web server connection failed. Go ahead and clear out that display filter‚ as I mentioned down in the bottom right corner of the window, we can see the packet bytes. If I don’t care about that and if I want to see a higher-level view, I can change that to a packet diagram. Instead of packet bytes, I can see that this is broken down into a diagram which looks very similar to what you would see in a networking textbook‚ and Wireshark has all sorts of features like this.

Thank you for joining me today and thank you for supporting the DPI consortium.

NOTE: This transcript has received minor edits from the original to improve readability.”