Fiddler is an awesome web debugging proxy. It allows you to see and drill into each HTTP request as they happen. It shows the URL, headers, content, results and much more.
I started using Fiddler when I was bound by a very strict internet policy. It has the ability to auto-respond to HTTP requests from my computer. It acts as a proxy and sits inbetween your browser and network card. These auto-responders proved very useful when I was trying to load a page that had several blocked items on it. I could first use it to identify which requests were showing the generic blocked page message, then setup an auto-responder to have fiddler return a simple blank.gif instead of my computer actually making the HTTP request. This prevented my computer from showing up in a list on some network admins software. This website loaded a bunch of unnecessary images as well and I was only concerned with the text. I setup a *.jpg and *.gif auto responder to speed up the load time as well.
I hadn’t found a good use for fiddler in a while until I was working on a project to make several HTTP requests. I started off single threading and got the application working. Then I decided to add multi-threading and it appeared to go a little faster. I was happy until I need to add a longer list of HTTP requests to the program. I wasn’t quite sure I was getting a performance benefit from the multi-threading. Stepping through the code in the debugger didn’t really show me exactly when the requests were being made because the break-points would get hit one at a time.
I had starting reading up on all the ways VB.NET can do multi-threading to see if there was an easier way. I looked into parallel.foreach loops but read that they would only spawn extra threads if there were extra processors available. I am interested in getting that particular parallel loop working so I can visualize that with fiddler as well. The other threading tool I looked into was the background worker.
Background workers seemed to be designed to help with windows forms. This type of threading helps forms from doing the classic “Not Responding” because you are trying to use the thread that is processing the form appearance and controls.
Thread pooling is the route I want to take in the future. It gives you the ability to limit the number of threads you spawn at a time. This would be perfect for making HTTP requests because you don’t want to overload a server using a form of “free threading”. Fire and Forget is another name for a particular style of threading where you don’t care about the response of the thread. Fire and Forget can be used when the results are logged to a database.
My program was based around one function. It would simply take a URL parameter and return a string of all the HTML on the page. Each URL in my list would take 1-5 seconds to pull a response depending on the server load. So in a single threaded program would take at least 5 seconds to complete 5 URLs and up to 25 seconds just to check 5 URLs.
This was an obvious candidate for multi-threading. If those same 5 HTTP requests were sent in the first second of execution the program could complete in 1-5 seconds instead of 5-25 seconds. The scaleablility would also be much better because the execution time wouldn’t go up that much if I added 100 URLs.
So I wrote the program. The keys to multi threading success were:
1. setup a class wrapper around my function
2. pass URL in with the thread.start overload
3. create an event to handle the function response
4. use synclock so the threads don’t run into each other when writing their results
5. join the thread list together so you don’t continue on in your code before the multi-threading part is complete
Imports System.Net Imports System.IO Imports System.Security.Cryptography Imports System.Text Imports System.Threading Module Module1 Dim WithEvents oReadSiteHTML As New ReadSiteHTMLClass Sub Main() Dim URLlist() As String = System.IO.File.ReadAllLines("sitelist.txt") 'Multi-Threaded Dim ThreadList As New ArrayList Dim stopper As New Stopwatch stopper.Start() For Each tempurl In URLlist Dim t As New Thread(AddressOf oReadSiteHTML.GetHTML) ThreadList.Add(t) t.Start(tempurl) Next For Each t In ThreadList t.join() Next stopper.Stop() Dim timetook As String = stopper.ElapsedMilliseconds.ToString() 'single threaded Dim stopper2 As New Stopwatch stopper2.Start() For Each tempurl In URLlist oReadSiteHTML.GetHTML(tempurl) Next stopper2.Stop() Dim timetook2 As String = stopper2.ElapsedMilliseconds.ToString() End Sub Private Function GenerateHash(ByVal SourceText As String) As String Dim Uni As New UnicodeEncoding() Dim ByteSourceText() As Byte = Uni.GetBytes(SourceText) Dim Md5 As New MD5CryptoServiceProvider() Dim ByteHash() As Byte = Md5.ComputeHash(ByteSourceText) Return Convert.ToBase64String(ByteHash) End Function Public Class ReadSiteHTMLClass Public SiteHtmlHash As String Public Event ThreadHash(ByVal SiteHtmlHash2 As String) Public Sub GetHTML(ByVal URL As String) Dim request As HttpWebRequest = WebRequest.Create(URL) Dim response As HttpWebResponse = request.GetResponse() Dim reader As StreamReader = New StreamReader(response.GetResponseStream()) Dim str As String = reader.ReadLine() Dim sitefulltext As String = "" Do While (Not str Is Nothing) sitefulltext = String.Concat(sitefulltext, str) str = reader.ReadLine() If str Is Nothing Then Exit Do End If Loop reader.Close() SiteHtmlHash = GenerateHash(sitefulltext) RaiseEvent ThreadHash(SiteHtmlHash & ":" & URL) End Sub End Class Sub ThreadHash(ByVal SiteHtmlHash3 As String) Handles oReadSiteHTML.ThreadHash SyncLock GetType(ReadSiteHTMLClass) File.AppendAllText("hashes.txt", SiteHtmlHash3 & System.Environment.NewLine) End SyncLock End Sub End Module
Timing the code I could tell that the multi threading part was faster. But I was concerned that the sync lock was somehow preventing my code from being able to instantiate the class multiple times. I am not sure why I thought this but it was easy to prove myself wrong by opening up Fiddler. When I ran the code below I saw the requests all pop in simultaneously and respond at almost the same time, then the single threaded fashion each subsequent HTTP request had to wait for the previous response. It was a real “ah ha” moment where I realized my code was working just the way I wanted it to… well almost. Once I get into really high numbers of URLs, I want to add the thread pooling. Hopefully then I can use fiddler and see only 8 or 10 active HTTP requests at a time. Right now the program will launch a thread for every URL which can be called “free threading”.
One thing I have explained a couple times before is the difference between threads and cores. Hyper threading is like a thread pool of 2 but at the hardware level. A 4 core box with hyperthreading on will show 8 cores in task manager. You can have a multi-threaded app perform much better than a single threaded app even if there is only 1 CPU. Furthermore, you can have as many threads as you want and the CPU will break these down into simple units also allowing for other operations to interrupt your app. Back to the example, the key performance gain is the fact you don’t have to wait for one response to send another request.
The screen capture below shows that the requests multiple requests downloading simultaneously and several are already finished.
One thing to be cautious of is overloading a server. All servers are not super computers. If the server is doing all the work and all you have to do is send a request its easy to overload a server with just one client. The trick is to find the sweet spot in performance and be able to adjust that. Thread pooling gives you a single variable number that you can adjust with each change in the environment.
There are also some automatic security switches that can be used on networks and inside website code to identify malicious behavior. Don’t be the one responsible for getting your company on some blacklist.