Kean Walmsley

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Twitter Updates

    follow me on Twitter



    « Pointing at clouds: more random musings on AutoCAD and F# | Main | Recursive F# code to generate random point clouds inside AutoCAD »

    February 27, 2008

    Using Reflector to diagnose tail call optimization in F#

    I've talked about Lutz Roeder's Reflector tool a couple of times and it's proven to be very useful to me, once again.

    I mentioned in my last post about some problems I was having with tail recursion, and my choice to replace certain recursive functions with iterative versions. Today we're going to use Reflector to take a look under the hood of some compiled assemblies, to determine which recursive functions have been optimised correctly and which have not.

    Let's start by taking a look at the two recursive functions I mentioned last time, starting with the most simple:

    // A recursive function to show the contents of a list


    let rec drawPointList (x:Point3d list) =

    match x with

    | [] -> ()

      | h :: t ->

      ed.DrawVector(h,h,1,true)

        drawPointList t

    Here's what we see when we open the compiled assembly in Reflector and disassemble the equivalent function into C#:

    [Serializable]

    internal class drawPointList@140 : FastFunc<List<Point3d>, Unit>

    {

      // Fields

      public Editor ed;


      // Methods

      public drawPointList@140(Editor ed)

      {

        this.ed = ed;

      }


      public override Unit Invoke(List<Point3d> x)

      {

        while (true)

        {

          List<Point3d> list = x;

          if (!(list is List<Point3d>._Cons))

          {

              break;

          }

          List<Point3d> list2 = ((List<Point3d>._Cons) list)._Cons2;

          Point3d from = ((List<Point3d>._Cons) list)._Cons1;

          this.ed.DrawVector(from, from, 1, true);

          x = list2;

        }

        return null;

      }

    }



    You can see that the implementation of Invoke contains a while loop, which loops forever (until the list is empty), rather than calling recursively into itself.

    Now let's take a look at the other, more complex function, which generates a list of points via recursion:

    // A function that accepts an ObjectId and returns

    // a list of random points on its surface


    let rec getNPoints n (sol:Solid3d) =

      if n <= 0 then

        []

      else

        let mp = sol.MassProperties


        let pl = new Plane()


        pl.Set(mp.Centroid,randomVector3d sol.ObjectId.OldId)

        let reg = sol.GetSection(pl)

        let ray = new Ray()

        ray.BasePoint <- mp.Centroid

        ray.UnitDir <- randomVectorOnPlane pl


        let pts = new Point3dCollection()

        reg.IntersectWith

          (ray,

           Intersect.OnBothOperands,

           pts,

           0, 0)


        pl.Dispose()

        reg.Dispose()

        ray.Dispose()


        let ptlist = Seq.untyped_to_list pts

        ptlist @ getNPoints (n - pts.Count) sol

    This function, when compiled by the F# compiler integrated into Visual Studio and disassembled by Reflector, equates to this C# code:

    [Serializable]

    internal class getNPoints@101T<T> : OptimizedClosures.FastFunc2<int, Solid3d, List<T>>

    {

      // Fields

      public TypeFunc _self0;


      // Methods

      public getNPoints@101T(TypeFunc _self0)

      {

        this._self0 = _self0;

      }


      public override List<T> Invoke(int n, Solid3d sol)

      {

        TypeFunc func = this._self0;

        int num = n;

        int t = 0;

        if ((num > t) ^ true)

        {

          return List<T>.Nil_uniq;

        }

        Solid3dMassProperties h = sol.MassProperties;

        Plane plane = new Plane();

        plane.Set(h.Centroid, MyApplication.randomVector3d<int>(sol.ObjectId.OldId));

        Region section = sol.GetSection(plane);

        Ray entityPointer = new Ray();

        entityPointer.BasePoint = h.Centroid;

        entityPointer.UnitDir = MyApplication.randomVectorOnPlane<Plane>(plane);

        Point3dCollection points = new Point3dCollection();

        section.IntersectWith(entityPointer, Intersect.OnBothOperands, points, 0, 0);

        plane.Dispose();

        section.Dispose();

        entityPointer.Dispose();

        List<T> list = Seq.untyped_to_list<Point3dCollection, T>(points);

        Solid3d u = sol;

        int num2 = n - points.Count;

        return Pervasives.op_Append<T>(FastFunc<int, Solid3d>.InvokeFast2<List<T>>((FastFunc<int, FastFunc<Solid3d, List<T>>>) func.Specialize<T>(), num2, u), list);

      }

    }

    While a little harder to follow (please don't worry about the details), we can see that the recursive tail call has not been optimised out, as we don't have an iterative construct (e.g. a while loop). Instead we see a call to InvokeFast2 (which will result in Invoke being called recursively), prior to the value being returned.

    Why exactly this case hasn't been handled isn't clear to me - in fact I'm pretty sure it's a bug in F#, which I will go ahead and report - but it is clear that (at least for now) we need to change the implementation if we're to avoid a stack overflow. And that's fine: the last post contains an iterative version we can use.

    Finallly we're going to take a look at parts of the assembly created by building the code in the recent Robotic Hatching posts. As I mentioned previously, I ended up hitting problems when creating 400 or so segments in the polyline hatch when using my F# implementation. Rather than automatically changing each of the recursive functions (those prefixed by "let rec") to iterative versions, I checked the disassembly listings one by one until I found the problematic one:

    // A recursive function to get the points

    // for our path


    let

      if times <= 0 then

        []

      else

        try

          let pt =

            nextBoundaryPoint cur start doTrace

          (pt :: definePath pt (times-1))

        with exn ->

          if exn.Message = "Could not get point" then

            definePath start times

          else

            failwith exn.Message

    rec definePath start times =

    Here's how the compiled (and then disassembled/reassembled) function looks in C#:

    [Serializable]

    internal class definePath@281 : OptimizedClosures.FastFunc2<Point3d, int, List<Point3d>>

    {

      // Fields

      public Curve cur;

      public bool doTrace;


      // Methods

      public definePath@281(bool doTrace, Curve cur)

      {

        this.doTrace = doTrace;

        this.cur = cur;

      }


      public override List<Point3d> Invoke(Point3d start, int times)

      {

        int num = times;

        int num2 = 0;

        if ((num > num2) ^ true)

        {

          return List<Point3d>.Nil_uniq;

        }

        try

        {

          Point3d pt = Commands.nextBoundaryPoint(this.cur, start, this.doTrace);

          return new List<Point3d>._Cons(pt, FastFunc<Point3d, int>.InvokeFast2<List<Point3d>>(this, pt, times - 1));

        }

        catch (object obj1)

        {

          Exception exn = (Exception) obj1;

          string message = exn.Message;

          string b = "Could not get point";

          if (string.Equals(message, b))

          {

            return FastFunc<Point3d, int>.InvokeFast2<List<Point3d>>(this, start, times);

          }

          return Operators.failwith<List<Point3d>>(exn.Message);

        }

      }

    }

    This one is actually only recurses when we catch an exception we expect to occur (and actually have thrown ourselves elsewhere in the code) occasionally during run-time. A better implementation - one that F#'s compiler is able to tail call optimise and meets my aesthetic requirements - is here:

    // An iterative function to get the points

    // for our path


    let definePath start times =

      let pts = new Point3dCollection()

      pts.Add(start) |> ignore


      let mutable i = 0

      while i < times do

        try

          let pt =

            nextBoundaryPoint

              cur pts.[pts.Count-1] doTrace runAsync

          pts.Add(pt) |> ignore

          i <- i + 1

        with exn ->

          if exn.Message <> "Could not get point" then

            failwith exn.Message


      Seq.untyped_to_list pts

    I won't bother showing this function's C# equivalent, as determined by Reflector, you should just trust me that it's better. :-)

    I did change the implementation slightly, so we no longer add our first vertex when we create the polyline - we simply add it to the array of vertices, which is cleaner - and we also have to pass a start index of 0 into the definePath function, rather than 1. Very minor changes, but they actually do make the code slightly simpler.

    Anyway, this kind of analysis and control is made possible by the fact that F# compiles to IL, just like the other .NET languages. This allows us to take advantage of existing tools for code analysis & obfuscation, and ultimately give us better, more consistent control than we would get from a functional programming implementation outside of .NET.

    TrackBack

    TrackBack URL for this entry:
    http://www.typepad.com/services/trackback/6a00d83452464869e200e5509a59e98834

    Listed below are links to weblogs that reference Using Reflector to diagnose tail call optimization in F#:

    Comments

    In order for a recursive call to be tail call, there needs to be no pending operations when the recursive call is made. Wouldn't that explain why getNPoints isn't tail-call optimized? Indeed, the @ operation must happen after the recursive call:

    ptlist @ getNPoints (n - pts.Count) sol

    It's clearer, if you write this:
    let recres = getNPoints (n - pts.Count) sol

    ptlist @ recres

    You see clearly that the @ must happen after the recursive call to getNPoints, so it cannot be tail-call optimized, because of this pending operation.

    You could re-write getNPoints to be iterative, accumulating the list of points in an extra parameter.

    It does indeed seem to be the reason - I wasn't aware of the pending operation restriction, although I think I suspected something like this at the root of it.

    If I pass an accumulator list it can indeed be defined recursively and get optimized to an iterative operation:

    let rec getNPoints n (sol:Solid3d) ptlist =
    if n <= 0 then
    ptlist
    else
    ...

    getNPoints (n - pts.Count) sol (ptlist @ Seq.untyped_to_list pts)

    Obviously the initial call is made with an empty list, but then we're in business.

    Thanks for your comment, namin - nice to know someone's listening! :-)

    Kean

    Kean,

    I'm listening but don't have time to try out the new F# code. C# works just fine for me. Maybe if you could demonstrate in F# post the c# along with example would be nice and easier for me to test?

    Hi Kean,

    Yup, I am really enjoying your F# series :)

    Just to develop a bit on tail-call, the idea is basically that if there are no pending operations, then there's no need to push the recursive call on the stack, because we know that we can just jump away and we'll be in the right place. However, as soon as there are pending operations, we need to remember what we need to do once we're done with the recursive call -- hence, we cannot just "jump".

    Jose - thanks, I understand it's very different for those used to "classic" languages. From time to time I will try to post both C# and F#, but it takes time & effort to do, of course (the recent Robotic Hatching series is an example where I chose to provide both).

    Namin - thanks, that does indeed make sense. I just needed to take the time to think about it.

    To the best of my knowledge, C# cannot represent tail calls so your approach of decompiling compiled F# code back into C# code will not work reliably. What you're seeing in the first case is probably the F# compiler generating a loop and not using the tail call ILX op (because the current .NET JIT compiles loops much more efficiently than tail call ops). So you see a faithful decompilation to C# in that specific case but not in general.

    Hi Jon,

    You can indeed tell (at least I could) from the C# code generated by Reflector from the IL whether the tail call has been optimized or not.

    The ultimate intent is to see a difference in the way the F# compiler optimizes your code: this approach does help you spot when code gets optimized to an iterative construct rather than remaining recursive.

    Or maybe I've missed your point?

    Regards,

    kean

    Verify your Comment

    Previewing your Comment

    This is only a preview. Your comment has not yet been posted.

    Working...
    Your comment could not be posted. Error type:
    Your comment has been posted. Post another comment

    The letters and numbers you entered did not match the image. Please try again.

    As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

    Having trouble reading this image? View an alternate.

    Working...

    Post a comment

    Feed & Share

    Search