In the last post we saw some code that downloaded data - serially - from a number of websites via RSS and created AutoCAD entities linking to the various posts.
As promised, in today's post we take that code and enable it to query the same data in parallel by using Asynchronous Workflows in F#. Asynchronous Workflows are an easy-to-use yet powerful mechanism for enabling concurrent programming in F#.
Firstly, a little background as to why this type of technique is important. As many - if not all - of you are aware, the days of raw processor speed doubling every couple of years are over. The technical innovations that enabled Moore's Law to hold true for half a century are - at least in the area of silicon-based microprocessor design - hitting a wall (it's apparently called the Laws of Physics :-). Barring some disruptive technological development, the future gains in computing performance are to be found in the use of parallel processing, whether via multiple cores, processors or distributed clouds of computing resources.
Additionally, with an increasing focus on distributed computing and information resources, managing tasks asynchronously becomes more important, as information requests across a network inevitably introduce a latency that can be mitigated by the tasks being run in parallel.
The big problem is that concurrent programming is - for the most-part - extremely difficult to do, and even harder to retro-fit into existing applications. Traditional lock-based parallelism (where locks are used to control access to shared computing resources) is both unwieldy and prone to blocking. New technologies, such as Asynchronous Workflows and Software Transactional Memory, provide considerable hope (and this is a topic I have on my list to cover at some future point).
Today's post looks at a relatively simple scenario, in the sense that we want to perform a set of discrete tasks in parallel, harnessing those fancy multi-core systems for those of you lucky enough to have them (I'm hoping to get one when I next replace my notebook, sometime in March), but that these tasks are indeed independent: we want to wait until they are all complete, but we do not have the additional burden of them communicating amongst themselves or using shared resources (e.g. accessing shared memory) during their execution.
We are also going to be very careful only to run parallel tasks unrelated to AutoCAD. Any access made into AutoCAD's database, for instance, needs to be performed in series: AutoCAD is not thread-safe when it comes to the vast majority of its programmatically-accessible functionality. So we're going to run a set of asynchronous, parallel tasks to query our various RSS feeds, and combine the results before creating the corresponding geometry in AutoCAD. This all sounds very complex, but the good (actually great) news is that Asynchronous Workflows does all the heavy lifting. Phew.
Here's the modified F# code, with the modified/new lines coloured in red:
1 // Use lightweight F# syntax
2
3 #light
4
5 // Declare a specific namespace and module name
6
7 module MyNamespace.MyApplication
8
9 // Import managed assemblies
10
11 #I @"C:\Program Files\Autodesk\AutoCAD 2008"
12
13 #r "acdbmgd.dll"
14 #r "acmgd.dll"
15
16 open Autodesk.AutoCAD.Runtime
17 open Autodesk.AutoCAD.ApplicationServices
18 open Autodesk.AutoCAD.DatabaseServices
19 open Autodesk.AutoCAD.Geometry
20 open System.Xml
21 open System.Collections
22 open System.Collections.Generic
23 open System.IO
24 open System.Net
25 open Microsoft.FSharp.Control.CommonExtensions
26
27 // The RSS feeds we wish to get. The first two values are
28 // only used if our code is not able to parse the feed's XML
29
30 let feeds =
31 [ ("Through the Interface",
32 "http://blogs.autodesk.com/through-the-interface",
33 "http://through-the-interface.typepad.com/through_the_interface/rss.xml");
34
35 ("Don Syme's F# blog",
36 "http://blogs.msdn.com/dsyme/",
37 "http://blogs.msdn.com/dsyme/rss.xml");
38
39 ("Shaan Hurley's Between the Lines",
40 "http://autodesk.blogs.com/between_the_lines",
41 "http://autodesk.blogs.com/between_the_lines/rss.xml");
42
43 ("Scott Sheppard's It's Alive in the Lab",
44 "http://blogs.autodesk.com/labs",
45 "http://labs.blogs.com/its_alive_in_the_lab/rss.xml");
46
47 ("Lynn Allen's Blog",
48 "http://blogs.autodesk.com/lynn",
49 "http://lynn.blogs.com/lynn_allens_blog/index.rdf");
50
51 ("Heidi Hewett's AutoCAD Insider",
52 "http://blogs.autodesk.com/autocadinsider",
53 "http://heidihewett.blogs.com/my_weblog/index.rdf") ]
54
55 // Fetch the contents of a web page, asynchronously
56
57 let httpAsync(url:string) =
58 async { let req = WebRequest.Create(url)
59 use! resp = req.GetResponseAsync()
60 use stream = resp.GetResponseStream()
61 use reader = new StreamReader(stream)
62 return reader.ReadToEnd() }
63
64 // Load an RSS feed's contents into an XML document object
65 // and use it to extract the titles and their links
66 // Hopefully these always match (this could be coded more
67 // defensively)
68
69 let titlesAndLinks (name, url, xml) =
70 let xdoc = new XmlDocument()
71 xdoc.LoadXml(xml)
72
73 let titles =
74 [ for n in xdoc.SelectNodes("//*[name()='title']")
75 -> n.InnerText ]
76 let links =
77 [ for n in xdoc.SelectNodes("//*[name()='link']") ->
78 let inn = n.InnerText
79 if inn.Length > 0 then
80 inn
81 else
82 let href = n.Attributes.GetNamedItem("href").Value
83 let rel = n.Attributes.GetNamedItem("rel").Value
84 if href.Contains("feedburner") then
85 ""
86 else
87 href ]
88
89 let descs =
90 [ for n in xdoc.SelectNodes
91 ("//*[name()='description' or name()='content' or name()='subtitle']")
92 -> n.InnerText ]
93
94 // A local function to filter out duplicate entries in
95 // a list, maintaining their current order.
96 // Another way would be to use:
97 // Set.of_list lst |> Set.to_list
98 // but that results in a sorted (probably reordered) list.
99
100 let rec nub lst =
101 match lst with
102 | a::[] -> [a]
103 | a::b ->
104 if a = List.hd b then
105 nub b
106 else
107 a::nub b
108 | [] -> []
109
110 // Filter the links to get (hopefully) the same number
111 // and order as the titles and descriptions
112
113 let real = List.filter (fun (x:string) -> x.Length > 0)
114 let lnks = real links |> nub
115
116 // Return a link to the overall blog, if we don't have
117 // the same numbers of titles, links and descriptions
118
119 let lnum = List.length lnks
120 let tnum = List.length titles
121 let dnum = List.length descs
122
123 if tnum = 0 || lnum = 0 || lnum <> tnum || dnum <> tnum then
124 [(name,url,url)]
125 else
126 List.zip3 titles lnks descs
127
128 // For a particular (name,url) pair,
129 // create an AutoCAD HyperLink object
130
131 let hyperlink (name,url,desc) =
132 let hl = new HyperLink()
133 hl.Name <- url
134 hl.Description <- desc
135 (name, hl)
136
137 // Use asynchronous workflows in F# to download
138 // an RSS feed and return AutoCAD HyperLinks
139 // corresponding to its posts
140
141 let hyperlinksAsync (name, url, feed) =
142 async { let! xml = httpAsync feed
143 let tl = titlesAndLinks (name, url, xml)
144 return List.map hyperlink tl }
145
146 // Now we declare our command
147
148 [<CommandMethod("rss")>]
149 let createHyperlinksFromRss() =
150
151 // Let's get the usual helpful AutoCAD objects
152
153 let doc =
154 Application.DocumentManager.MdiActiveDocument
155 let db = doc.Database
156
157 // "use" has the same effect as "using" in C#
158
159 use tr =
160 db.TransactionManager.StartTransaction()
161
162 // Get appropriately-typed BlockTable and BTRs
163
164 let bt =
165 tr.GetObject
166 (db.BlockTableId,OpenMode.ForRead)
167 :?> BlockTable
168 let ms =
169 tr.GetObject
170 (bt.[BlockTableRecord.ModelSpace],
171 OpenMode.ForWrite)
172 :?> BlockTableRecord
173
174 // Add text objects linking to the provided list of
175 // HyperLinks, starting at the specified location
176
177 // Note the valid use of tr and ms, as they are in scope
178
179 let addTextObjects pt lst =
180 // Use a for loop, as we care about the index to
181 // position the various text items
182
183 let len = List.length lst
184 for index = 0 to len - 1 do
185 let txt = new DBText()
186 let (name:string,hl:HyperLink) = List.nth lst index
187 txt.TextString <- name
188 let offset =
189 if index = 0 then
190 0.0
191 else
192 1.0
193
194 // This is where you can adjust:
195 // the initial outdent (x value)
196 // and the line spacing (y value)
197
198 let vec =
199 new Vector3d
200 (1.0 * offset,
201 -0.5 * (Int32.to_float index),
202 0.0)
203 let pt2 = pt + vec
204 txt.Position <- pt2
205 ms.AppendEntity(txt) |> ignore
206 tr.AddNewlyCreatedDBObject(txt,true)
207 txt.Hyperlinks.Add(hl) |> ignore
208
209 // Here's where we do the real work, by firing
210 // off - and coordinating - asynchronous tasks
211 // to create HyperLink objects for all our posts
212
213 let links =
214 Async.Run
215 (Async.Parallel
216 [ for (name,url,feed) in feeds ->
217 hyperlinksAsync (name,url,feed) ])
218
219 // Add the resulting objects to the model-space
220
221 let len = Array.length links
222 for index = 0 to len - 1 do
223
224 // This is where you can adjust:
225 // the column spacing (x value)
226 // the vertical offset from origin (y axis)
227
228 let pt =
229 new Point3d
230 (15.0 * (Int32.to_float index),
231 30.0,
232 0.0)
233 addTextObjects pt (Array.get links index)
234
235 tr.Commit()
You can download the new F# source file from here.
A few comments on the changes:
Lines 57-62 define our new httpAsync() function, which uses GetResponseAsync() - a function exposed in F# 1.9.3.7 - to download the contents of a web-page asynchronously [and which I stole shamelessly from Don Syme, who presented the code last summer at Microsoft's TechEd].
Lines 141-144 define another asynchronous function, hyperlinksAsync(), which calls httpAsync() and then - as before - extracts the feed information and creates a corresponding list of HyperLinks. This is significant: creation of AutoCAD HyperLink objects will be done on parallel; it is the addition of these objects to the drawing database that needs to be performed serially.
Lines 214-217 replace our very simple "map" with something slightly more complex: this code runs a list of tasks in parallel and waits for them all to complete before continuing. What is especially cool about this implementation is the fact that exceptions in individual tasks result in the overall task failing (a good thing, believe it or not :-), and the remaining tasks being terminated gracefully.
Lines 221 and 233 change our code to handle an array, rather than a list (while "map" previously returned a list, Async.Run returns an array).
When run, the code creates exactly the same thing as last time (although there are a few more posts in some of the blogs ;-)
A quick word on timing: I used "F# Interactive" to do a little benchmarking on my system, and even though it's single-core, single-processor, there was a considerable difference between the two implementations. I'll talk more about F# Interactive at some point, but think of it to F# in Visual Studio as the command-line is to LISP in AutoCAD: you can very easily test out fragments of F#, either by entering them directly into the F# Interactive window or highlighting them in Visual Studio's text editor and hitting Alt-Enter.
To enable function timing I entered "#time;;" (without the quotations marks) in the F# Interactive window. I then selected and loaded the supporting functions needed for each test - not including the code that adds the DBText objects with their HyperLinks to the database, as we're only in Visual Studio, not inside AutoCAD - and executed the "let links = ..." assignment in our two implementations of the createHyperlinksFromRss() function (i.e. the RSS command). These functions do create lists of AutoCAD HyperLinks, but that's OK: this is something works even outside AutoCAD, although we wouldn't be able to do anything much with them. Also, the fact we're not including the addition of the entities to the AutoCAD database is not relevant: by then we should have identical data in both versions, which would be added in exactly the same way.
Here are the results:
I executed the code for serial querying and parallel querying twice (to make sure there were no effects from page caching on the measurement):
val links : (string * HyperLink) list list
Real: 00:00:14.636, CPU: 00:00:00.15, GC gen0: 5, gen1: 1, gen2: 0
val links : (string * HyperLink) list array
Real: 00:00:06.245, CPU: 00:00:00.31, GC gen0: 3, gen1: 0, gen2: 0
val links : (string * HyperLink) list list
Real: 00:00:15.45, CPU: 00:00:00.46, GC gen0: 5, gen1: 1, gen2: 0
val links : (string * HyperLink) list array
Real: 00:00:03.832, CPU: 00:00:00.62, GC gen0: 2, gen1: 1, gen2: 0
So the serial execution took 14.5 to 15.5 seconds, while the parallel execution took 3.8 to 6.3 seconds.