Alp Mestanogullari's Blog

Why expression templates matter ?

Posted in Uncategorized by alpmestan on 2009/11/01

Imagine we are using a matrix library, with a naive implementation of +, that simply adds column and row-wise each coefficient. Thus, for a, m1, m2, m3 being for example square matrices of order n, if we want to compute the sum of m1, m2 and m3 and to put the result in a, corresponding to the following code :

a = m1 + m2 + m3

then the computation will look like the following tree :

Evaluation tree of m1 + m2 + m3

Evaluation tree of m1 + m2 + m3

Thus, there will be a first for loop to compute m1 + m2 which will result in a matrix we’ll call t, then another one to compute t + m3. For 3×3 matrices, it’s ok. Imagine n is actually 40000. Doing two for loops of 40000 iterations where we would do a single one… quite annoying, isn’t it ? Actually, we would rather want an evaluation tree like the following :

The desired evaluation tree of m1 + m2 + m3

The desired evaluation tree of m1 + m2 + m3

Here comes expression templates ;)

It’ll need a bit of refactoring though. First, let’s say our matrix type is the following (we’ll only deal with float to avoid ambedding an additional typename template parameter representing the number type we use) :

template <unsigned int N>
class matrix
{
    float data[N*N];

public:
    float operator()(unsigned int row, unsigned int col) const
    {
        return data[row + N*col];
    }

    float& operator()(unsigned int row, unsigned int col)
    {
        return data[row + N*col];
    }
};

template <unsigned int N>
std::ostream & operator<<(std::ostream& o, const matrix<N>& m)
{
    o << "[";
    for(unsigned int row = 0; row < N; row++)
    {
        for(unsigned int col = 0; col < N; col++)
        {
            o << m(row,col);
            if(col != N-1) o << ";";
        }
        if(row != N-1) o << "\n";
    }
    o << "]";
    return o;
}

Now, we’ll have to define a Domain Specific Embedded Language for matrix operations. Like in any language people design, we will have a tree representing what’s happening in the code. In our case, it’ll represent the evaluation tree of the matrix operations we’re dealing with. Thus, like in any expression tree, we need an Expression type. Ours will look like this :

template <typename LeftOperandType, typename OperationTag, typename RightOperandType>
struct Expression
{
    const LeftOperandType& l;
    const RightOperandType& r;

    Expression(const LeftOperandType& l_, const RightOperandType& r_) : l(l_), r(r_) { }

    float operator() (unsigned int row, unsigned int col) const
    {
        return OperationTag::apply(l(row, col), r(row,col));
    }
};

Ok, now, what should an OperationTag look like ? Well, we’ll implement the operation tag corresponding to additions :

struct plus
{
    static float apply(float a, float b)
    {
        return a+b;
    }
};

and the + operator that’ll let us create an expression with a ‘plus’ operation.

template <typename L, typename R>
Expression<L, plus, R> operator+(const L& l, const R& r)
{
    return Expression<L, plus, R>(l, r);
}

Thanks to the definition of Expression, we can embed matrix operations in Expressions, but for the moment we can’t do the reverse way, that is we can’t convert an Expression to a matrix. So let’s write an operator= in the matrix class.

    // inside the matrix class
    template <typename ExprType>
    matrix<N>& operator=(const ExprType& e)
    {
        for(unsigned int row = 0; row < N; row++)
        {
            for(unsigned int col = 0; col < N; col++)
            {
                (*this)(row, col) = e(row, col);
            }
        }
        return (*this);
    }

So you see, this is the only moment where we call the operator()(unsigned int, unsigned int) of the Expression type. Thus, this is the only moment when the whole expression is being evaluated. Let’s study the following code.

#include <iostream>
#include "matrix.h"
#include "expression.h"

int main()
{
    matrix<2> m1;
    m1(0,0) = 1.0;
    m1(1,0) = 0.0;
    m1(0,1) = 4.0;
    m1(1,1) = 1.0;

    matrix<2> m2;
    m2(0,0) = 0.0;
    m2(1,0) = -1.0;
    m2(0,1) = 1.0;
    m2(1,1) = 2.0;

    matrix<2> m3;
    m3(0,0) = 1.0;
    m3(1,0) = -2.0;
    m3(0,1) = 3.0;
    m3(1,1) = 5.0;

    matrix<2> a;
    a = m1 + m2 + m3; 
    std::cout << a << std::endl;
}

Here, “m1 + m2 + m3″ crates an

Expression< Expression<matrix<2>, plus, matrix<2> >, plus, matrix<2> >

instance. The coefficients aren’t computed until operator= is called. Indeed, we have the following expression tree.

Expression tree

Expression tree

That is, we know which operations we have to call, on which matrices, but nothing is evaluated. The only place where we call operator() (unsigned int, unsigned int) on a value of type Expression is in matrix<N>::operator=, and this call actually computes each coefficient, one by one, applying the whole computation tree (here, two calls to ‘+’) for each coefficient. This way, we only execute the two for loops once, and it’ll remain the same whatever the number of computations is. Moreover, the only change we had to make to our matrix class was to add an operator= to be able to assign an expression to a matrix.

By the way, the output of our main function is the following.

[2;8
-3;8]

I hope you enjoyed this post ;)

Templates for unique id generation for types : a beginning

Posted in Uncategorized by alpmestan on 2009/10/31

After reading one of the comments on this blog post, I decided to publish here a proof of concept for the generation of unique ids for types.

Basically, it’s just about one template structure, using very basical templates techniques : specialization, metaprogramming-level recursion and nested enum for the computation. Either a familiarity with template metaprogramming or with functional programming is enough to understand the following code.

If you’re familiar with any scheme of functional recursion, or recursion in its mathematical sense, you’ll get it quite easily. Mathematically, we’re somehow defining a recursive function generate_id this way :

generateid(int) = 5
generateid(char) = 7
\forall T,\, generateid(\textit{pointer to T}) = 2 \cdot generateid(T)
\forall T,\, generateid(\textit{ref to T}) = 3 \cdot generateid(T)
\forall T,\, generateid(\textit{vector of T}) = 11 \cdot generateid(T)

Which, in C++, becomes :

template <typename T>
struct id_generator;

template <typename T>
struct id_generator<T*>
{
    enum { result = id_generator<T>::result*2 };
};

template <typename T>
struct id_generator<T&>
{
    enum { result = id_generator<T>::result*3 };
};

template <typename T>
struct id_generator< std::vector<T> >
{
    enum { result = id_generator<T>::result*11 };
};

template <>
struct id_generator<int>
{
    enum { result = 5 };
};

template <>
struct id_generator<char>
{
    enum { result = 7 };
};

And a minimal example of use :

int main()
{
    std::cout << "int : " << id_generator< int >::result << std::endl
              << "char**** : " << id_generator< char**** >::result << std::endl
              << "vector<int> : " << id_generator< std::vector<int> >::result << std::endl;
    return 0;
}
/* Output :
int : 5
char**** : 112
vector<int> : 55
*/

To make it really usable, you’ll have to add much (much much much) more :

  • base types, like int and char here, which help for terminating the recursive call
  • composition rules, like vector, references and pointers here, which help for propagating the recursion

To ensure the uniqueness here, I use prime numbers, simply [1]. Indeed, for the base types, I give a value being a prime number. For the different compositions, I multiply by a different prime number for each. However, most of you may have noticed that the id only lets you know about what type compositions and base types are used, but gives nothing about the order. If we add a rule for std::list, multiplying by 13 for example, then a list of vectors of ints will have the same id as a vector of lists of int. It’d need additional code and workarounds to take the order of the compositions in account — I guess it is possible though. If any reader here comes up with a solution, let us know. A possible solution would be to drop prime numbers, since they rely on a commutative (abelian) ring [2].

[1] http://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic
[2] http://en.wikipedia.org/wiki/Commutative_ring

Tagged with: , ,

Introduction to SFINAE

Posted in Uncategorized by alpmestan on 2009/10/29

Basically, SFINAE (for the smart guys who can remember the whole name : Substitution Failure Is Not An Error) is what makes this code compile.

struct Test 
{
    typedef int type;
};
 
template < typename T > 
void f( typename T::type ) {} // definition #1
 
template < typename T > 
void f( T ) {}                // definition #2
 
f< Test > ( 10 ); // calls #1 
f< int > ( 10 );  //calls #2 without error, thanks to SFINAE

To make it short (but not too much), if you have several overloads for a function, but particularly one of them being template, then if the template one doesn’t match the call you’re currently doing, the compiler — instead of raising a bright, clear and boring error — will try the other ones. In our case, the first definition of f asks for the type parameter to have a nested type type defined (either via class, struct, enum or typedef). Fortunately, Test has one ! But our lonely int type hasn’t. Again, fortunately, the compiler, thanks to SFINAE, will not issue an error and will call #2.

Ok, you got it, but now wonder how can this be useful ? Okay, imagine we basically are working on a widget library. After a huge amount of work, we came up with the following code :

class label
{
};

class button
{
};

Yeah, impressive, isn’t it ?
Okay, now, say we want to write show functions, to be able to show our two widgets. Basically, we can just write two overloads. But it’ll base the overload resolution on the strict name of the type. What if we rather want the overload resolution to be done using the structure of our type ? For example, depending on the presence of a ‘label_tag’ inner typedef, or ‘button_tag’, for button.

Then,, first, we’d modify our two classes this way :

class label
{
public:
    typedef void label_tag;
};

class button
{
public:
    typedef void button_tag;
};

Then, how can we apply the SFINAE principle here ? Well, here is one way :

template <typename widget_type>
typename widget_type::label_tag show(widget_type& w)
{
    std::cout<< "I'm a label" << std::endl;
}

template <typename widget_type>
typename widget_type::button_tag show(widget_type& w)
{
    std::cout << "I'm a button" << std::endl;
}

And then, the main function and its output :

int main()
{
    label l; 
    button b;
    show(l);
    show(b);
    return 0;
}
/* Output :
 * I'm a label
 * I'm a button
 */

Got it ? ;)

Maybe I’ll dive into more advanced uses of SFINAE (more complex and powerful interface detection) in further posts.

Stay tuned !

Tagged with: , ,

Can I haz teh boost::any C++ class plz ?

Posted in Uncategorized by alpmestan on 2009/10/18

Welcome,

This blog post will be about rewriting the boost::any class. For those of you who don’t know it, it lets you assign a value of any type to your boost::any instance, like in the following code :

boost::any a = 42;
a = "C++";
a = 3.14;
// and so on

First, ok, we must have templates somewhere, since there isn’t any base class of all the possible types, even less when we deal with primitive types. So, our any cass must hold a value than actually can hold any kind of stuffs… Some value_type template class then ! But wait, we must be able to store value_type<int>, value_type<std::string> and so on ! OK, fortunately, type erasure will save us here. We’ll create a base class value_base and then make a template class value_type inherit it.

class value_base
{
	public:
	virtual ~value_base() { }
};

template <class T>
class value_type : public value_base
{
	T t;
	
	public:
	value_type(const T& t_) : t(t_) { }
};

Thus, we can write the any class this way :

class any
{
	value_base* v;
	
	public:
	any() : v(0) { }
	
	template <class value_t>
	any(const value_t& v_) : v(new value_type<value_t>(v_)) { }
	
	~any() { delete v; }
};

Ok, now we miss a copy constructor and an overload for the “=” operator.
We must first add a cloning function in value_base (pure virtual) and value_type for being able to write a good copy constructor.

class value_base
{
  	public:
  	virtual ~value_base() { }
  	virtual value_base* clone() const = 0; /* new */
};

template <class T>
class value_type : public value_base
{
	T t;
	
	public:
	value_type(const T& t_) : t(t_) { }
	value_base* clone() const /* new */
	{
		return new value_type(t);
	}
};

And now, any(const any&) and any& operator=(const any&) can be implemented ;)

class any
{
	value_base* v;
	
	public:
	any() : v(0) { }
	
	template <class value_t>
	any(const value_t& v_) : v(new value_type<value_t>(v_)) { }

        /* new */
	any(any const & other) : v(other.v ? other.v->clone() : 0) {}

        /* new */
	any& operator=(const any& other)
  	{
  		if(&other != this)
  		{
  			any copy(other);
  			swap(copy);
  		}
    	        return *this;
  	}

        /* new */
  	void swap(any& other)
  	{
    	        std::swap(v, other.v);
  	}

	~any() { delete v; }
};

And you’re done !

For those of you who either already knew any or just read a bit its documentation, you may be interested in the any cast functions, that let you get back a value from an any instance. Here is the code (it requires very little modifications to our previous code).

class any;

template <class T>
T any_cast(any& a); // declaration for being able to make classes friend of it 		
		
// value_type template class modification
template <class T>
class value_type : public value_base
{
  	friend T any_cast<>(any& a);
  	// ...
};

// any class modification
class any
{
  	template <class T>
  	friend T any_cast(any& a);
  	// ...
};

// bad_any_cast exception class
class bad_any_cast : public std::exception
{
  	public:
  	const char* what() const throw()
  	{
    	return "Bad any_cast exception";
  	}
};

template <class T>
T any_cast(any& a)
{
  	value_type<T>* v = dynamic_cast<value_type<T>*>(a.v);
  	if(v == 0)
	{
		throw bad_any_cast();
	}
  	else
	{
		return v->t;
	}
}

I haven’t implemented all the cast functions. The ones with constant reference, pointer, and constant pointer are left as exercise for the reader.

Here is a code using all the things we’ve done so far.

int main()
{
  	any a = 42;
  	any b = 'c';
  	std::cout << "[1] a=" << any_cast<int>(a) << " b='" << any_cast<char>(b) << "'" << std::endl;
  	a.swap(b);
  	std::cout << "[2] a='" << any_cast<char>(a) << "' b=" << any_cast<int>(b) << std::endl;
  	try
	{
		std::string s = any_cast<std::string>(b);
	}
  	catch(const std::exception& e)
	{
		std::cout << "[3] " << e.what() << std::endl;
	}
  	any c(a);
  	std::cout << "[4] c='" << any_cast<char>(c) << "'" << std::endl;
  	return 0;
}
/* 
alp@mestan:~/cpp$ g++ -o te3 type_erasure3.cpp
alp@mestan:~/cpp$ ./te3
[1] a=42 b='c'
[2] a='c' b=42	
[3] Bad any_cast exception
[4] c='c'
*/

Enjoy ;)

Tagged with: , ,

A little WIP Haskell editor

Posted in Uncategorized by alpmestan on 2009/10/04
haskelled

haskelled

Tagged with: ,

Playing around Control.Concurrent and Network.Curl.Download

Posted in Uncategorized by alpmestan on 2009/09/27

Hi,

I’ve been playing with Control.Concurrent and Network.Curl.Download today, willing to write a program that would spawn threads to download web pages… It’s now done !

Here is the Haskell code, minimally commented (I think the Control.Concurrent doc is enough explicit, and my explanations wouldn’t be better).

module Main where

import Control.Concurrent -- multithreading related functions and types
import Control.Exception
import Network.Curl.Download -- HTTP page download related functions and types
import System.IO
import System.Time

-- like it is said on 
-- http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html
-- it lets you block the main thread until all the children terminates     
waitForChildren :: MVar [MVar ()] -> IO ()
waitForChildren children = do
  cs  return ()
    m:ms -> do
       putMVar children ms
       takeMVar m
       waitForChildren children

-- creates a new thread within the thread syncrhonization mechanism
forkChild :: MVar [MVar ()] -> IO () -> IO ThreadId
forkChild children io = do
    mvar <- newEmptyMVar
    childs <- takeMVar children
    putMVar children (mvar:childs)
    forkIO (io `finally` putMVar mvar ())

-- downloads the content of the web page and then saves it into a file in the current directory
doDl url = do
  Right content <- openURIString url
  let filename = (takeWhile (/= '/') . drop 7 $ url) ++ ".html"
  writeFile filename content
     
-- spawns 8 threads to download the corresponding web pages and then waits for the 8 threads to terminate before exiting
main = do
  children <- newMVar []
  mapM_ (forkChild children . doDl) ["http://www.haskell.org/", "http://java.sun.com/", "http://www.developpez.com/", "http://xkcd.com/", "http://donsbot.wordpress.com", "http://comonad.com/reader/", "http://blog.mestan.fr/", "http://alpmestan.wordpress.com/"]       
  waitForChildren children

Now, let’s compile it :

ghc -threaded --make Main.hs -o hsmultidl

and execute it, with the -N2 option (2 cores on my computer here) to the RunTime System, and RTS informations (-s option) :

$ time ./hsmultidl +RTS -N2 -s
./hsmultidl +RTS -N2 -s 
      11,470,748 bytes allocated in the heap
      11,930,464 bytes copied during GC
       1,726,380 bytes maximum residency (4 sample(s))
          85,004 bytes maximum slop
               5 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    17 collections,     0 parallel,  0.02s,  0.03s elapsed
  Generation 1:     4 collections,     1 parallel,  0.02s,  0.06s elapsed

  Parallel GC work balance: 1.00 (155513 / 155242, ideal 2)

  Task  0 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  1 (worker) :  MUT time:   0.00s  (  0.00s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  2 (worker) :  MUT time:   0.00s  (  1.60s elapsed)
                      GC  time:   0.00s  (  0.05s elapsed)

  Task  3 (worker) :  MUT time:   0.01s  (  1.60s elapsed)
                      GC  time:   0.02s  (  0.02s elapsed)

  Task  4 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  5 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  6 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.01s  (  0.01s elapsed)

  Task  7 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  8 (worker) :  MUT time:   0.01s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.01s elapsed)

  Task  9 (worker) :  MUT time:   0.00s  (  1.62s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 10 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 11 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 12 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task 13 (worker) :  MUT time:   0.00s  (  1.61s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.02s  (  1.61s elapsed)
  GC    time    0.04s  (  0.09s elapsed)
  EXIT  time    0.00s  (  0.01s elapsed)
  Total time    0.05s  (  1.71s elapsed)

  %GC time      80.0%  (5.4% elapsed)

  Alloc rate    1,147,304,260 bytes per MUT second

  Productivity  13.3% of total user, 0.4% of total elapsed

recordMutableGen_sync: 0
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 0
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 0
gen[1].steps[0].sync_large_objects: 0

real	0m1.714s
user	0m0.050s
sys	0m0.037s

(there isn’t a significant difference whether I activate the -N2 option or not, for 8 pages, but I guess there would be for 100, 1000, … — maybe more on that soon !)

I’m now wondering if it would be that much insane to use my 3D Text Rendering application to render the HTML code of the pages in a 3D OpenGL/GLUT context. Would it ? :)

Tagged with: ,

3D Text Rendering in Haskell with FTGL

Posted in Uncategorized by alpmestan on 2009/09/27

Hi,

While I was writing a sort of password manager for my personal use (in Haskell) — actually, a friend of mine was doing the same in Java, so there was some competition :-) — I decided that just printing the password in the terminal wasn’t enough, at all. That’s why I decided to look at how it was possible to render text easily inside an OpenGL context, with possibly some depth given to each letter.

You know what ? Nearly like every time you look for some library in Haskell, you find it ! Mine was ftgl. You can, with only few lines, while in an OpenGL context, create some 3D text from given text and (TrueType) font and render it.

Since it is really easy to get working, here’s a little howto.

- be sure to have libftgl (the C version of the library), glut (often freeglut3 in distro repositories) and opengl correctly installed.
– be sure to have the Haskell OpenGL binding installed, and also GLUT (thus we can get a working 3D context in a (GLUT) window quite easily).
– get the Haskell binding of FTGL from hackage using cabal : cabal install ftgl.

Then you can try to play around the following code, that you may want to compile with ghc –make hs3dtext.hs -o hs3dtext.

module Main where

import Data.List
import Data.IORef
import Graphics.Rendering.OpenGL
import Graphics.Rendering.FTGL
import Graphics.UI.GLUT
import System.IO

main :: IO ()
main = do
  (_, _) <- getArgsAndInitialize
  initialDisplayMode $= [DoubleBuffered]
  createWindow "3D Text Renderer"
  angle <- newIORef 0.0
  font <- createExtrudeFont "FreeSans.ttf"
  setFontFaceSize font 7 7
  setFontDepth font 1.0
  displayCallback $= display "Haskell" angle font
  idleCallback $= Just (animate angle)
  mainLoop

display text angle font = do
  clear [ColorBuffer]
  loadIdentity
  scale 0.04 0.05 (0.5 :: GLfloat)
  a > 179 of
    True  -> angle $= -180
    False -> angle $= a + 0.03
  postRedisplay Nothing

Don’t forget to get FreeSans.ttf from anywhere for it to work. The depth of the letters here is 1.0, you can play with it, but it doesn’t look as well as it does like I pasted.

Here is a cool screenshot :-)

Screen

Enjoy !

Tagged with: ,
Follow

Get every new post delivered to your Inbox.

Join 251 other followers