Late last week I received an interesting email from Bruno Saboia, who’s been experiencing some performance issues with code he’d written to return all the objects of a particular type – in his case, Lines – from the model-space of the active drawing.
We exchanged a few emails on the topic, and Bruno kindly allowed me to post his code and my suggestions for changes. In today’s post, we’re going to look at both the code and my suggestions, while in the next post (in this mini-series, at least) we’ll look at a simple execution framework that can allows you to record crude performance timings of different command implementations.
Here’s Bruno’s original code, for us to look at in detail.
1 public static IEnumerable<Line> GetAllLines(Database db)
2 {
3 using (
4 var docLock =
5 Application.DocumentManager.MdiActiveDocument.LockDocument()
6 )
7 {
8 using (var tr = db.TransactionManager.StartTransaction())
9 {
10 var bt =
11 tr.GetObject(db.BlockTableId, OpenMode.ForRead)
12 as BlockTable;
13 var btr =
14 tr.GetObject(
15 bt[BlockTableRecord.ModelSpace],
16 OpenMode.ForRead
17 ) as BlockTableRecord;
18
19 foreach (var obj in btr)
20 {
21 var line =
22 tr.GetObject(obj, OpenMode.ForRead) as Line;
23
24 if (line != null)
25 {
26 yield return line;
27 }
28 }
29 }
30 }
31 }
The first thing to notice is that the function returns an IEnumerable. This is a “lazy” type – which actually just exposes an enumerator back to the calling function that can then be used to step through the results set – and this may actually have an impact on the logic of the code.
Skipping ahead, you’ll see on line 26 that we “yield” the results to the calling function. What actually happens is that the calling function, as it steps through the code, essentially only executes the loop one iteration at a time. The reason IEnumerable is lazy – and in many situations therefore more efficient – is that we’re executing “when needed” rather than “in case it’s needed”. This is a very good strategy if wanting to pass back what might be a very long – perhaps even infinite – set of data, such as paging through search results. It’s also a very good way to spread the load of performing complex calculations throughout the execution of a loop. If the calling function is actually going to want to enumerate the whole list – which is likely in this case – then it may prove a better strategy to return a List<>. This thread provides some useful background information on this.
In this case, the code is returning an IEnumerable of Lines. I strongly recommend not doing this: the Lines have been opened by the current transaction, and so will become invalid as soon as the transaction completes. Because we’re actually getting the results lazily – and therefore the calling function is getting the results one at a time, and so any function you call right afterwards will still have the transaction in scope – then this should be OK in this specific context. But code written in this way ends up being much less composable and reusable: it’s far safer to have the function return a set of ObjectIds that can be opened and queried for information when needed.
There are different strategies for this, too: you might pass in a Transaction object into the function, to make sure it’s clearly in scope, or you might pass in a lambda to evaluate and return the information you need to collect for each Line. In this case I’m going to go with returning an IEnumerable<ObjectId> and leave it to the calling function to (re)open the objects as it needs to.
Right then – onwards to lines 3-6: I would tend to avoid explicit locking of a document in a lower-level function. This is done automatically for the active document in all commands that are not registered as “session context” commands (i.e. any “document context” command has an implicit lock on the active document). I would assume that any session context command that calls this function would have locked the document manually beforehand. To be clear: this is not going to cause a measurable performance impact, but it’s cleaner to remove it.
On line 8: we’re using a traditional Transaction (started via StartTransaction()) to get the information we need for the contents of the model-space. Standard transactions provide some very helpful capabilities – for instance, they’re nestable and can be aborted to roll-back any changes that have been made within the scope of that particular transaction – but they come with some performance overhead. As we’re not making use of any advanced capabilities, we can gain some additional performance by using an OpenCloseTransaction (started via StartOpenCloseTransaction()) instead. See this post on the AutoCAD DevBlog for a more detailed discussion of the performance benefits of this change.
It should also be noted – and this can cause a big impact – that you should always call Commit() on a successful Transaction, even when it’s nominally read-only in nature. If Commit() is not called then the Transaction will be aborted, and depending on what you’re doing this can be a performance killer. This call would be added between lines 28 and 29, just before the Transaction drops out of scope.
Lines 10-17 are OK, although for the sake of being thorough I’ll suggest a few minor changes. I very rarely bother doing this, but rather than opening the BlockTable to get the ObjectId of the model-space from it, it is also possible to use SymbolUtilityServices.GetBlockModelSpaceId(), passing in the database object. This calls through into native code to get the ObjectId of the model-space for that database, which may shave off a few virtual function calls, here and there. Once again, probably not a change with a measurable impact, but hey.
Another somewhat pedantic change would be to use a cast rather than using the “as” keyword (which checks runtime type information before performing any reference conversion) on lines 12 and 17. As we’re not actually checking the pointer returned by “as” to see whether it’s null, we may as well perform a traditional cast (which will avoid a check or two).
On lines 21 and 22, we open the object to check whether it is a Line or not. I tend to avoid this unless specifically opening the object for another reason (which admittedly Bruno wants to do, as he wants to collect some information about each of the Lines in the model-space). My personal preference is to change this line to check the ObjectClass property on the ObjectId, to see whether it IsDerivedFrom() the Line class. This avoids opening the object – which clearly will improve performance – although it shifts responsibility to the calling function to go ahead and open it, as needed.
So here’s my version of the function. Remember that this doesn’t do as much as the prior version – which leaves the Line open – and so any performance comparison will, of course, be apples-to-oranges:
public static IEnumerable<ObjectId> ObjectsOfType1(
Database db, RXClass cls
)
{
using (
var tr = db.TransactionManager.StartOpenCloseTransaction()
)
{
var btr =
(BlockTableRecord)tr.GetObject(
SymbolUtilityServices.GetBlockModelSpaceId(db),
OpenMode.ForRead
);
foreach (ObjectId id in btr)
{
if (id.ObjectClass.IsDerivedFrom(cls))
{
yield return id;
}
}
tr.Commit();
}
}
You’ll notice that the function has been generalised to take an RXClass for the type of object for which we wish to query. This just makes it a bit more flexible.
Building on this function, here’s a slightly different version that functions nearly as quickly (I found it to run a few percent slower, on average) but uses LINQ:
public static IEnumerable<ObjectId> ObjectsOfType2(
Database db, RXClass cls
)
{
using (
var tr = db.TransactionManager.StartOpenCloseTransaction()
)
{
var btr =
(BlockTableRecord)tr.GetObject(
SymbolUtilityServices.GetBlockModelSpaceId(db),
OpenMode.ForRead
);
var lineIds =
from ObjectId id in btr
where id.ObjectClass.IsDerivedFrom(cls)
select id;
tr.Commit();
return lineIds;
}
}
We could also force the ObjectsOfType2() function to return a List<ObjectId> by changing the last line to “return lineIds.ToList<ObjectId>();”, but I found that this additional step slows down execution marginally. My somewhat limited knowledge of the underpinnings of LINQ leads me to believe that, as they stand, ObjectsOfType1() and ObjectsOfType2() are basically equivalent – you apparently also have lazy evaluation of select statements in LINQ – but I’d be very happy for someone to educate me if that isn’t the case. For that matter, I’d be happy to hear from anyone who has further suggestions for Bruno – I may very well have missed something.
Here’s the complete listing, with some commands implemented to call the various functions and measure/report the time taken for their execution:
using Autodesk.AutoCAD.ApplicationServices;
using Autodesk.AutoCAD.DatabaseServices;
using Autodesk.AutoCAD.Runtime;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System;
namespace ObjectEnumeration
{
public class Commands
{
public void MeasureTime(Document doc, Func<int> func)
{
// Get the name of the running command(s)
// (might also have queried the CommandMethod attribute
// via reflection, but that would be a lot more work)
var cmd = (string)Application.GetSystemVariable("CMDNAMES");
// Start a Stopwatch to time the execution
var sw = new Stopwatch();
sw.Start();
// Run the function, getting back the count of the results
var cnt = func();
// Stop the Stopwatch and print the results to the command-line
sw.Stop();
doc.Editor.WriteMessage(
"\n{0}: found {1} lines in {2}.", cmd, cnt, sw.Elapsed
);
}
[CommandMethod("LL1")]
public void ListLines1()
{
var doc = Application.DocumentManager.MdiActiveDocument;
MeasureTime(
doc,
() =>
{
var ids = GetAllLines(doc.Database);
return ids.Count<Line>();
}
);
}
[CommandMethod("LL2")]
public void ListLines2()
{
var doc = Application.DocumentManager.MdiActiveDocument;
MeasureTime(
doc,
() =>
{
var ids =
ObjectsOfType1(
doc.Database, RXObject.GetClass(typeof(Line))
);
return ids.Count<ObjectId>();
}
);
}
[CommandMethod("LL3")]
public void ListLines3()
{
var doc = Application.DocumentManager.MdiActiveDocument;
MeasureTime(
doc,
() =>
{
var ids =
ObjectsOfType2(
doc.Database, RXObject.GetClass(typeof(Line))
);
return ids.Count<ObjectId>();
}
);
}
public static IEnumerable<Line> GetAllLines(Database db)
{
using (
var docLock =
Application.DocumentManager.MdiActiveDocument.LockDocument()
)
{
using (var tr = db.TransactionManager.StartTransaction())
{
var bt =
tr.GetObject(db.BlockTableId, OpenMode.ForRead)
as BlockTable;
var btr =
tr.GetObject(
bt[BlockTableRecord.ModelSpace],
OpenMode.ForRead
) as BlockTableRecord;
foreach (var obj in btr)
{
var line =
tr.GetObject(obj, OpenMode.ForRead) as Line;
if (line != null)
{
yield return line;
}
}
}
}
}
public static IEnumerable<ObjectId> ObjectsOfType1(
Database db, RXClass cls
)
{
using (
var tr = db.TransactionManager.StartOpenCloseTransaction()
)
{
var btr =
(BlockTableRecord)tr.GetObject(
SymbolUtilityServices.GetBlockModelSpaceId(db),
OpenMode.ForRead
);
foreach (ObjectId id in btr)
{
if (id.ObjectClass.IsDerivedFrom(cls))
{
yield return id;
}
}
tr.Commit();
}
}
public static IEnumerable<ObjectId> ObjectsOfType2(
Database db, RXClass cls
)
{
using (
var tr = db.TransactionManager.StartOpenCloseTransaction()
)
{
var btr =
(BlockTableRecord)tr.GetObject(
SymbolUtilityServices.GetBlockModelSpaceId(db),
OpenMode.ForRead
);
var lineIds =
from ObjectId id in btr
where id.ObjectClass.IsDerivedFrom(cls)
select id;
tr.Commit();
return lineIds;
}
}
}
}
Bear in mind that as we’re dealing with lazy results, we need to include the Count<>() method in the measurement for the results to be representative (even if, as we’ve mentioned, Bruno’s code includes the operation to open the Lines rather than just deal with ObjectIds). This clearly defeats the object of using a lazy structure in the first place, but it’s ultimately more fair.
Now I don’t actually think that these changes will have helped resolve the core problem in Bruno’s application – for some reason he’s seeing querying of 1K entities taking seconds to complete (whereas his code executes on 100K+ entities in tenths of a second on my system). I’m looking forward to understanding more about Bruno’s specific problem, but felt that posting this in the meantime would be of interest to people.
On a side note, due to the performance issues he’s been experiencing, Bruno looked into using Parallel.For – from the TPL – to harness multiple cores to speed up execution. Unfortunately this is not possible with AutoCAD: as we’ve seen in a number of previous posts, AutoCAD is not thread-safe and its APIs really need to be called on the UI thread.
That’s it for today: please post comments if you have any further suggestions (or find any of the suggestions I’ve made to be inappropriate). In the post that follows from this, we’ll look at extending the MeasureTime() method to keep track of execution information so we can (for instance) more easily bring it across into Excel for analysis.