More fun with F# and AutoCAD: string extraction and manipulation
I've been working through some draft chapters of Don Syme's Expert F# book (posted here, while the final version will be available in hardcover from early December). I'm definitely enjoying working with F#: the beauty of functional programming combined with the flexibility of .NET is a killer combination.
Before I dive into the sample I put together for today's post, I thought I'd scribble down some musings on the language, to help position the technology in comparison with more popular imperative/object-oriented languages...
Functional programming is great for deep mathematical problems, and so will play well with developers needing to perform complex scientific calculations (for example in the fields of analysis and simulation - domains that are increasing converging with and integrating into design). As I showed in this previous post, you can very easily represent - and even display - complicated scientific functions using this a functional language such as F#.
The other aspect of great interest to me is how this fits with the increasing need to harness multi-processor & multi-core environments: another of my fields of study (many moons ago) was parallel computing, especially using occam 2 to program transputers. That was great fun, but as a programmer working in this type of environment you end up spending a lot of time deciding which tasks are to be executed in parallel. What makes functional programming interesting with respect to concurrency - and this is clearly a major driver behind Microsoft's interest in developing languages in this area - is the ability to harness the power of multiple processing cores to run, in parallel, code that adopts the purely functional paradigm. "Pure" functional code does not create "side-effects", such as when maintaining internal program state, which makes it perfect for distributing across multiple processors/cores. It's this ability to automatically farm out computing operations across your processing resources that is interesting: the manual way is simply too laborious to scale to real-world needs.
While F# is not a "pure" language (and frankly this lack of purity is what makes F# most interesting for AutoCAD programming, as it gives us the freedom to do fun things inside AutoCAD using F#), it is no doubt possible to enforce higher levels of purity in sections of code that can then make use of the increasingly parallel capabilities of modern processors. At least that's what I would expect. :-)
OK, so that's a look at the "high end" of functional programming. I also see a great deal of relevance for this type of technology at the "low end", where you simply want to throw together some quick code to do something simple. And that's the case I'm presenting today.
As I've mentioned before, functional programming can be incredibly succinct. Here's some code F# that goes through the modelspace and paperspace of the current drawing inside AutoCAD, and prints a list of all the distinct words used inside the various MText objects:
// Use lightweight F# syntax
(* Declare a specific namespace
and module name
// Import managed assemblies
#I @"C:\Program Files\Autodesk\AutoCAD 2008"
// Now we declare our command
let listWords () =
// Let's get the usual helpful AutoCAD objects
let doc =
let ed = doc.Editor;
let db = doc.Database;
// "use" has the same effect as "using" in C#
use tr =
// Get appropriately-typed BlockTable and BTRs
let bt =
let ms =
let ps =
// Now the fun starts...
// A function that accepts an ObjectId and returns
// a list of the text contents, or an empty list.
// Note the valid use of tr, as it is in scope
let extractText (x : ObjectId) =
let obj = tr.GetObject(x,OpenMode.ForRead);
match obj with
| :? MText -> [(obj :?> MText).Contents];
| _ -> 
// A recursive function to print the contents of a list
let rec printList x =
match x with
| h :: t -> ed.WriteMessage("\n" + h); printList t
|  -> ed.WriteMessage("\n")
// Partial application of split which can then be
// applied to a string to retrieve the contained words
let words = String.split [' ']
// And here's where we plug everything together...
Seq.untyped_to_list ms @ Seq.untyped_to_list ps |>
List.map extractText |> List.flatten |> List.map words |>
List.flatten |> Set.of_list |> Set.to_list |> printList
// As usual, committing is cheaper than aborting
Hopefully the beginning section is understandable, given what we usually need to do from a C# or VB.NET program. So I'll start my descriptions from the extractText function.
extractText takes an ObjectId and uses the open transaction to open it for read. Then, depending on the type of the object, it gets the text contained and returns it within a list (currently containing either 0 or 1 member, depending). Currently this is only implemented for MText objects, but it could very easily be extended to handle other textual objects. For non-MText objects (matched by the wildcard character '_') an empty list () is returned.
printList is a recursive function which uses our old friend Editor.WriteMessage() to write the "head" of the list (h) to the command-line, and then recurses to print the "tail" of the list (t). When the list is empty, we simply print a newline character and return.
words is a function defined by partial application of String.split, which typically takes two arguments - a list of characters to consider delimiters, plus a string to split. So you would call it using:
String.Split [' '] "This is my string"
which would return a list of strings:
["This", "is", "my", "string"]
Our definition of the function words actually allows us to get the same results by passing one argument:
words "This is my string"
Now we get to the guts of the command, which I'm going break down call-by-call:
Seq.untyped_to_list ms @ Seq.untyped_to_list ps |>
Here we get a list of the ObjectIds from the modelspace and append it (@) to the ObjectIds of the contents of the first paperspace layout. BlockTableRecord implements IEnumerable - aka "seq" in F# - so we use untyped_to_list. If it implemented the more modern, generics-derived IEnumerable<ObjectId> we would use Seq.to_list instead. We then pass the results of this operation - a list of ObjectIds - to the next in the chain using the pipeline operator (|>).
List.map extractText |>
Here we call the extractText function on each of the items in the list passed in (the list of ObjectIds), and the results of this operation get returned in a new list. The extractText function returns a list of strings for each ObjectId, so the result of this "map" is a list of a list of strings. Which gets piped to the next function.
As we have a list of a list of strings, we "flatten" it to only have a list of strings. And we pipe it on.
List.map words |>
Now we map our words function to return the list of words contained in each string in the list. This - once again - gives us a list of a list of strings.
Which - once again - we flatten to a list of strings (now a list of individual words).
We create a "set" of this list, which creates an alphabetized, de-duplicated set of the words contained in our various MText objects.
We now create a list of this set, which ultimately means we now have an ordered, minimal list of the words contained in the drawing.
And finally we print the contents to the command-line using our recursive function.
To test the code, I created an MText object in modelspace with the imaginitive contents "Here is some text in modelspace", followed by one in paperspace containing "And now some in paperspace".
Here's what happens when we run our command:
That's it for this post. Hopefully you're able to see that F# is not only interesting for developing mathematics-intensive applications, but also for simpler operations on data such as lists and sets (it's ideal for text processing, for instance).