## Why expression templates matter ?

Imagine we are using a matrix library, with a naive implementation of +, that simply adds column and row-wise each coefficient. Thus, for a, m1, m2, m3 being for example square matrices of order n, if we want to compute the sum of m1, m2 and m3 and to put the result in a, corresponding to the following code :

a = m1 + m2 + m3

then the computation will look like the following tree :

Thus, there will be a first for loop to compute *m1 + m2* which will result in a matrix we’ll call *t*, then another one to compute *t + m3*. For 3×3 matrices, it’s ok. Imagine n is actually 40000. Doing two for loops of 40000 iterations where we would do a single one… quite annoying, isn’t it ? Actually, we would rather want an evaluation tree like the following :

Here comes expression templates ðŸ˜‰

It’ll need a bit of refactoring though. First, let’s say our matrix type is the following (we’ll only deal with float to avoid ambedding an additional typename template parameter representing the number type we use) :

template <unsigned int N> class matrix { float data[N*N]; public: float operator()(unsigned int row, unsigned int col) const { return data[row + N*col]; } float& operator()(unsigned int row, unsigned int col) { return data[row + N*col]; } }; template <unsigned int N> std::ostream & operator<<(std::ostream& o, const matrix<N>& m) { o << "["; for(unsigned int row = 0; row < N; row++) { for(unsigned int col = 0; col < N; col++) { o << m(row,col); if(col != N-1) o << ";"; } if(row != N-1) o << "\n"; } o << "]"; return o; }

Now, we’ll have to define a **Domain Specific Embedded Language** for matrix operations. Like in any language people design, we will have a tree representing what’s happening in the code. In our case, it’ll represent the evaluation tree of the matrix operations we’re dealing with. Thus, like in any expression tree, we need an *Expression* type. Ours will look like this :

template <typename LeftOperandType, typename OperationTag, typename RightOperandType> struct Expression { const LeftOperandType& l; const RightOperandType& r; Expression(const LeftOperandType& l_, const RightOperandType& r_) : l(l_), r(r_) { } float operator() (unsigned int row, unsigned int col) const { return OperationTag::apply(l(row, col), r(row,col)); } };

Ok, now, what should an *OperationTag* look like ? Well, we’ll implement the operation tag corresponding to additions :

struct plus { static float apply(float a, float b) { return a+b; } };

and the *+ operator* that’ll let us create an expression with a ‘plus’ operation.

template <typename L, typename R> Expression<L, plus, R> operator+(const L& l, const R& r) { return Expression<L, plus, R>(l, r); }

Thanks to the definition of *Expression*, we can embed matrix operations in Expressions, but for the moment we can’t do the reverse way, that is we can’t convert an *Expression* to a *matrix*. So let’s write an *operator=* in the *matrix* class.

// inside the matrix class template <typename ExprType> matrix<N>& operator=(const ExprType& e) { for(unsigned int row = 0; row < N; row++) { for(unsigned int col = 0; col < N; col++) { (*this)(row, col) = e(row, col); } } return (*this); }

So you see, this is the only moment where we call the *operator()(unsigned int, unsigned int)* of the *Expression* type. Thus, this is the only moment when the whole expression is being evaluated. Let’s study the following code.

#include <iostream> #include "matrix.h" #include "expression.h" int main() { matrix<2> m1; m1(0,0) = 1.0; m1(1,0) = 0.0; m1(0,1) = 4.0; m1(1,1) = 1.0; matrix<2> m2; m2(0,0) = 0.0; m2(1,0) = -1.0; m2(0,1) = 1.0; m2(1,1) = 2.0; matrix<2> m3; m3(0,0) = 1.0; m3(1,0) = -2.0; m3(0,1) = 3.0; m3(1,1) = 5.0; matrix<2> a; a = m1 + m2 + m3; std::cout << a << std::endl; }

Here, “m1 + m2 + m3” crates an

Expression< Expression<matrix<2>, plus, matrix<2> >, plus, matrix<2> >

instance. The coefficients aren’t computed until *operator=* is called. Indeed, we have the following expression tree.

That is, we know which operations we have to call, on which matrices, but nothing is evaluated. The only place where we call *operator() (unsigned int, unsigned int)* on a value of type *Expression* is in *matrix<N>::operator=*, and this call actually computes each coefficient, one by one, applying the whole computation tree (here, two calls to ‘+’) for each coefficient. This way, we only execute the two for loops once, and it’ll remain the same whatever the number of computations is. Moreover, the only change we had to make to our matrix class was to add an operator= to be able to assign an expression to a matrix.

By the way, the output of our main function is the following.

[2;8

-3;8]

I hope you enjoyed this post ðŸ˜‰

## Templates for unique id generation for types : a beginning

After reading one of the comments on this blog post, I decided to publish here a proof of concept for the generation of unique ids for types.

Basically, it’s just about one template structure, using very basical templates techniques : specialization, metaprogramming-level recursion and nested enum for the computation. Either a familiarity with template metaprogramming or with functional programming is enough to understand the following code.

If you’re familiar with any scheme of functional recursion, or recursion in its mathematical sense, you’ll get it quite easily. Mathematically, we’re somehow defining a recursive function *generate_id* this way :

Which, in C++, becomes :

template <typename T> struct id_generator; template <typename T> struct id_generator<T*> { enum { result = id_generator<T>::result*2 }; }; template <typename T> struct id_generator<T&> { enum { result = id_generator<T>::result*3 }; }; template <typename T> struct id_generator< std::vector<T> > { enum { result = id_generator<T>::result*11 }; }; template <> struct id_generator<int> { enum { result = 5 }; }; template <> struct id_generator<char> { enum { result = 7 }; };

And a minimal example of use :

int main() { std::cout << "int : " << id_generator< int >::result << std::endl << "char**** : " << id_generator< char**** >::result << std::endl << "vector<int> : " << id_generator< std::vector<int> >::result << std::endl; return 0; } /* Output : int : 5 char**** : 112 vector<int> : 55 */

To make it really usable, you’ll have to add much (much much much) more :

- base types, like int and char here, which help for terminating the recursive call
- composition rules, like vector, references and pointers here, which help for propagating the recursion

To ensure the uniqueness here, I use prime numbers, simply [1]. Indeed, for the base types, I give a value being a prime number. For the different compositions, I multiply by a different prime number for each. However, most of you may have noticed that the id only lets you know about what type compositions and base types are used, but gives nothing about the order. If we add a rule for std::list, multiplying by 13 for example, then *a list of vectors of ints* will have the same id as *a vector of lists of int*. It’d need additional code and workarounds to take the order of the compositions in account — I guess it is possible though. If any reader here comes up with a solution, let us know. A possible solution would be to drop prime numbers, since they rely on a commutative (abelian) ring [2].

[1] http://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic

[2] http://en.wikipedia.org/wiki/Commutative_ring

## Introduction to SFINAE

Basically, SFINAE (for the smart guys who can remember the whole name : *Substitution Failure Is Not An Error*) is what makes this code compile.

struct Test { typedef int type; }; template < typename T > void f( typename T::type ) {} // definition #1 template < typename T > void f( T ) {} // definition #2 f< Test > ( 10 ); // calls #1 f< int > ( 10 ); //calls #2 without error, thanks to SFINAE

To make it short (but not too much), if you have several overloads for a function, but particularly one of them being template, then if the template one doesn’t match the call you’re currently doing, the compiler — instead of raising a bright, clear and boring error — will try the other ones. In our case, the first definition of **f** asks for the type parameter to have a nested *type* type defined (either via class, struct, enum or typedef). Fortunately, *Test* has one ! But our lonely *int* type hasn’t. Again, fortunately, the compiler, thanks to SFINAE, will not issue an error and will call *#2*.

Ok, you got it, but now wonder how can this be useful ? Okay, imagine we basically are working on a widget library. After a huge amount of work, we came up with the following code :

class label { }; class button { };

Yeah, impressive, isn’t it ?

Okay, now, say we want to write *show* functions, to be able to show our two widgets. Basically, we can just write two overloads. But it’ll base the overload resolution on the strict name of the type. What if we rather want the overload resolution to be done using the structure of our type ? For example, depending on the presence of a ‘label_tag’ inner typedef, or ‘button_tag’, for button.

Then,, first, we’d modify our two classes this way :

class label { public: typedef void label_tag; }; class button { public: typedef void button_tag; };

Then, how can we apply the SFINAE principle here ? Well, here is one way :

template <typename widget_type> typename widget_type::label_tag show(widget_type& w) { std::cout<< "I'm a label" << std::endl; } template <typename widget_type> typename widget_type::button_tag show(widget_type& w) { std::cout << "I'm a button" << std::endl; }

And then, the main function and its output :

int main() { label l; button b; show(l); show(b); return 0; } /* Output : * I'm a label * I'm a button */

Got it ? ðŸ˜‰

Maybe I’ll dive into more advanced uses of SFINAE (more complex and powerful interface detection) in further posts.

Stay tuned !

## Can I haz teh boost::any C++ class plz ?

Welcome,

This blog post will be about rewriting the boost::any class. For those of you who don’t know it, it lets you assign a value of any type to your *boost::any* instance, like in the following code :

boost::any a = 42; a = "C++"; a = 3.14; // and so on

First, ok, we must have templates somewhere, since there isn’t any base class of all the possible types, even less when we deal with primitive types. So, our *any* cass must hold a value than actually can hold any kind of stuffs… Some value_type template class then ! But wait, we must be able to store value_type<int>, value_type<std::string> and so on ! OK, fortunately, **type erasure** will save us here. We’ll create a base class *value_base* and then make a template class value_type inherit it.

class value_base { public: virtual ~value_base() { } }; template <class T> class value_type : public value_base { T t; public: value_type(const T& t_) : t(t_) { } };

Thus, we can write the any class this way :

class any { value_base* v; public: any() : v(0) { } template <class value_t> any(const value_t& v_) : v(new value_type<value_t>(v_)) { } ~any() { delete v; } };

Ok, now we miss a copy constructor and an overload for the “=” operator.

We must first add a cloning function in value_base (pure virtual) and value_type for being able to write a good copy constructor.

class value_base { public: virtual ~value_base() { } virtual value_base* clone() const = 0; /* new */ }; template <class T> class value_type : public value_base { T t; public: value_type(const T& t_) : t(t_) { } value_base* clone() const /* new */ { return new value_type(t); } };

And now, *any(const any&)* and *any& operator=(const any&)* can be implemented ðŸ˜‰

class any { value_base* v; public: any() : v(0) { } template <class value_t> any(const value_t& v_) : v(new value_type<value_t>(v_)) { } /* new */ any(any const & other) : v(other.v ? other.v->clone() : 0) {} /* new */ any& operator=(const any& other) { if(&other != this) { any copy(other); swap(copy); } return *this; } /* new */ void swap(any& other) { std::swap(v, other.v); } ~any() { delete v; } };

And you’re done !

For those of you who either already knew any or just read a bit its documentation, you may be interested in the any cast functions, that let you get back a value from an *any* instance. Here is the code (it requires very little modifications to our previous code).

class any; template <class T> T any_cast(any& a); // declaration for being able to make classes friend of it // value_type template class modification template <class T> class value_type : public value_base { friend T any_cast<>(any& a); // ... }; // any class modification class any { template <class T> friend T any_cast(any& a); // ... }; // bad_any_cast exception class class bad_any_cast : public std::exception { public: const char* what() const throw() { return "Bad any_cast exception"; } }; template <class T> T any_cast(any& a) { value_type<T>* v = dynamic_cast<value_type<T>*>(a.v); if(v == 0) { throw bad_any_cast(); } else { return v->t; } }

I haven’t implemented all the cast functions. The ones with constant reference, pointer, and constant pointer are left as exercise for the reader.

Here is a code using all the things we’ve done so far.

int main() { any a = 42; any b = 'c'; std::cout << "[1] a=" << any_cast<int>(a) << " b='" << any_cast<char>(b) << "'" << std::endl; a.swap(b); std::cout << "[2] a='" << any_cast<char>(a) << "' b=" << any_cast<int>(b) << std::endl; try { std::string s = any_cast<std::string>(b); } catch(const std::exception& e) { std::cout << "[3] " << e.what() << std::endl; } any c(a); std::cout << "[4] c='" << any_cast<char>(c) << "'" << std::endl; return 0; } /* alp@mestan:~/cpp$ g++ -o te3 type_erasure3.cpp alp@mestan:~/cpp$ ./te3 [1] a=42 b='c' [2] a='c' b=42 [3] Bad any_cast exception [4] c='c' */

Enjoy ðŸ˜‰

## Playing around Control.Concurrent and Network.Curl.Download

Hi,

I’ve been playing with Control.Concurrent and Network.Curl.Download today, willing to write a program that would spawn threads to download web pages… It’s now done !

Here is the Haskell code, minimally commented (I think the Control.Concurrent doc is enough explicit, and my explanations wouldn’t be better).

module Main where import Control.Concurrent -- multithreading related functions and types import Control.Exception import Network.Curl.Download -- HTTP page download related functions and types import System.IO import System.Time -- like it is said on -- http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html -- it lets you block the main thread until all the children terminates waitForChildren :: MVar [MVar ()] -> IO () waitForChildren children = do cs return () m:ms -> do putMVar children ms takeMVar m waitForChildren children -- creates a new thread within the thread syncrhonization mechanism forkChild :: MVar [MVar ()] -> IO () -> IO ThreadId forkChild children io = do mvar <- newEmptyMVar childs <- takeMVar children putMVar children (mvar:childs) forkIO (io `finally` putMVar mvar ()) -- downloads the content of the web page and then saves it into a file in the current directory doDl url = do Right content <- openURIString url let filename = (takeWhile (/= '/') . drop 7 $ url) ++ ".html" writeFile filename content -- spawns 8 threads to download the corresponding web pages and then waits for the 8 threads to terminate before exiting main = do children <- newMVar [] mapM_ (forkChild children . doDl) ["http://www.haskell.org/", "http://java.sun.com/", "http://www.developpez.com/", "http://xkcd.com/", "http://donsbot.wordpress.com", "http://comonad.com/reader/", "http://blog.mestan.fr/", "https://alpmestan.wordpress.com/"] waitForChildren children

Now, let’s compile it :

ghc -threaded --make Main.hs -o hsmultidl

and execute it, with the -N2 option (2 cores on my computer here) to the RunTime System, and RTS informations (-s option) :

$ time ./hsmultidl +RTS -N2 -s ./hsmultidl +RTS -N2 -s 11,470,748 bytes allocated in the heap 11,930,464 bytes copied during GC 1,726,380 bytes maximum residency (4 sample(s)) 85,004 bytes maximum slop 5 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 17 collections, 0 parallel, 0.02s, 0.03s elapsed Generation 1: 4 collections, 1 parallel, 0.02s, 0.06s elapsed Parallel GC work balance: 1.00 (155513 / 155242, ideal 2) Task 0 (worker) : MUT time: 0.00s ( 0.00s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 1 (worker) : MUT time: 0.00s ( 0.00s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 2 (worker) : MUT time: 0.00s ( 1.60s elapsed) GC time: 0.00s ( 0.05s elapsed) Task 3 (worker) : MUT time: 0.01s ( 1.60s elapsed) GC time: 0.02s ( 0.02s elapsed) Task 4 (worker) : MUT time: 0.00s ( 1.62s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 5 (worker) : MUT time: 0.00s ( 1.62s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 6 (worker) : MUT time: 0.00s ( 1.62s elapsed) GC time: 0.01s ( 0.01s elapsed) Task 7 (worker) : MUT time: 0.00s ( 1.61s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 8 (worker) : MUT time: 0.01s ( 1.61s elapsed) GC time: 0.00s ( 0.01s elapsed) Task 9 (worker) : MUT time: 0.00s ( 1.62s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 10 (worker) : MUT time: 0.00s ( 1.61s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 11 (worker) : MUT time: 0.00s ( 1.61s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 12 (worker) : MUT time: 0.00s ( 1.61s elapsed) GC time: 0.00s ( 0.00s elapsed) Task 13 (worker) : MUT time: 0.00s ( 1.61s elapsed) GC time: 0.00s ( 0.00s elapsed) INIT time 0.00s ( 0.00s elapsed) MUT time 0.02s ( 1.61s elapsed) GC time 0.04s ( 0.09s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 0.05s ( 1.71s elapsed) %GC time 80.0% (5.4% elapsed) Alloc rate 1,147,304,260 bytes per MUT second Productivity 13.3% of total user, 0.4% of total elapsed recordMutableGen_sync: 0 gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 0 gen[1].steps[0].sync_large_objects: 0 real 0m1.714s user 0m0.050s sys 0m0.037s

(there isn’t a significant difference whether I activate the -N2 option or not, for 8 pages, but I guess there would be for 100, 1000, … — maybe more on that soon !)

I’m now wondering if it would be that much insane to use my 3D Text Rendering application to render the HTML code of the pages in a 3D OpenGL/GLUT context. Would it ? ðŸ™‚

## 3D Text Rendering in Haskell with FTGL

Hi,

While I was writing a sort of password manager for my personal use (in Haskell) — actually, a friend of mine was doing the same in Java, so there was some competition ðŸ™‚ — I decided that just printing the password in the terminal wasn’t enough, at all. That’s why I decided to look at how it was possible to render text easily inside an OpenGL context, with possibly some depth given to each letter.

You know what ? Nearly like every time you look for some library in Haskell, you find it ! Mine was ftgl. You can, with only few lines, while in an OpenGL context, create some 3D text from given text and (TrueType) font and render it.

Since it is really easy to get working, here’s a little howto.

– be sure to have libftgl (the C version of the library), glut (often *freeglut3* in distro repositories) and opengl correctly installed.

– be sure to have the Haskell OpenGL binding installed, and also GLUT (thus we can get a working 3D context in a (GLUT) window quite easily).

– get the Haskell binding of FTGL from hackage using cabal : *cabal install ftgl*.

Then you can try to play around the following code, that you may want to compile with *ghc –make hs3dtext.hs -o hs3dtext*.

module Main where import Data.List import Data.IORef import Graphics.Rendering.OpenGL import Graphics.Rendering.FTGL import Graphics.UI.GLUT import System.IO main :: IO () main = do (_, _) <- getArgsAndInitialize initialDisplayMode $= [DoubleBuffered] createWindow "3D Text Renderer" angle <- newIORef 0.0 font <- createExtrudeFont "FreeSans.ttf" setFontFaceSize font 7 7 setFontDepth font 1.0 displayCallback $= display "Haskell" angle font idleCallback $= Just (animate angle) mainLoop display text angle font = do clear [ColorBuffer] loadIdentity scale 0.04 0.05 (0.5 :: GLfloat) a > 179 of True -> angle $= -180 False -> angle $= a + 0.03 postRedisplay Nothing

Don’t forget to get FreeSans.ttf from anywhere for it to work. The depth of the letters here is 1.0, you can play with it, but it doesn’t look as well as it does like I pasted.

Here is a cool screenshot ðŸ™‚

Enjoy !

8comments