Alp Mestanogullari's Blog

Playing around Control.Concurrent and Network.Curl.Download

Posted in Uncategorized by alpmestan on 2009/09/27

Hi,

I’ve been playing with Control.Concurrent and Network.Curl.Download today, willing to write a program that would spawn threads to download web pages… It’s now done !

Here is the Haskell code, minimally commented (I think the Control.Concurrent doc is enough explicit, and my explanations wouldn’t be better).

module Main where

import Control.Concurrent -- multithreading related functions and types
import Control.Exception
import Network.Curl.Download -- HTTP page download related functions and types
import System.IO
import System.Time

-- like it is said on 
-- http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html
-- it lets you block the main thread until all the children terminates     
waitForChildren :: MVar [MVar ()] -> IO ()
waitForChildren children = do
  cs  return ()
    m:ms -> do
       putMVar children ms
       takeMVar m
       waitForChildren children

-- creates a new thread within the thread syncrhonization mechanism
forkChild :: MVar [MVar ()] -> IO () -> IO ThreadId
forkChild children io = do
    mvar <- newEmptyMVar
    childs <- takeMVar children
    putMVar children (mvar:childs)
    forkIO (io `finally` putMVar mvar ())

-- downloads the content of the web page and then saves it into a file in the current directory
doDl url = do
  Right content <- openURIString url
  let filename = (takeWhile (/= '/') . drop 7 $ url) ++ ".html"
  writeFile filename content
     
-- spawns 8 threads to download the corresponding web pages and then waits for the 8 threads to terminate before exiting
main = do
  children <- newMVar []
  mapM_ (forkChild children . doDl) ["http://www.haskell.org/", "http://java.sun.com/", "http://www.developpez.com/", "http://xkcd.com/", "http://donsbot.wordpress.com", "http://comonad.com/reader/", "http://blog.mestan.fr/", "http://alpmestan.wordpress.com/"]       
  waitForChildren children

Now, let’s compile it :

ghc -threaded --make Main.hs -o hsmultidl

and execute it, with the -N2 option (2 cores on my computer here) to the RunTime System, and RTS informations (-s option) :

$ time ./hsmultidl +RTS -N2 -s
./hsmultidl +RTS -N2 -s 
      11,470,748 bytes allocated in the heap
      11,930,464 bytes copied during GC
       1,726,380 bytes maximum residency (4 sample(s))
          85,004 bytes maximum slop
               5 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    17 collections,     0 parallel,  0.02s,  0.03s elapsed
  Generation 1:     4 collections,     1 parallel,  0.02s,  0.06s elapsed

  Parallel GC work balance: 1.00 (155513 / 155242, ideal 2)

  Task  0 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  1 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  2 (worker) :  MUT time:   0.00s  (  1.60s elapsed)
                      GC  time:   0.00s  (  0.05s elapsed)

  Task  3 (worker) :  MUT time:   0.01s  (  1.60s elapsed)
                      GC  time:   0.02s  (  0.02s elapsed)

  Task  4 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  5 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  6 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.01s  (  0.01s elapsed)

  Task  7 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  8 (worker) :  MUT time:   0.01s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.01s elapsed)

  Task  9 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 10 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 11 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 12 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 13 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.02s  (  1.61s elapsed)
  GC    time    0.04s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.01s elapsed)
  Total time    0.05s  (  1.71s elapsed)

  %GC time      80.0%  (5.4% elapsed)

  Alloc rate    1,147,304,260 bytes per MUT second

  Productivity  13.3% of total user, 0.4% of total elapsed

recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0

real	0m1.714s
user	0m0.050s
sys	0m0.037s

(there isn’t a significant difference whether I activate the -N2 option or not, for 8 pages, but I guess there would be for 100, 1000, … — maybe more on that soon !)

I’m now wondering if it would be that much insane to use my 3D Text Rendering application to render the HTML code of the pages in a 3D OpenGL/GLUT context. Would it ? :)

About these ads
Tagged with: ,

One Response

Subscribe to comments with RSS.

  1. brian said, on 2009/09/27 at 11:33 pm

    Yes.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 251 other followers

%d bloggers like this: