Hunt,
Looked through the xml and I dont see any issues so not sure whats going on
"You are not the only one.....!"  

I tried with the standard screenset - have to say I found it really clunky but it did exactly the same thing.
I also tried with the on board PP just in case it was something to do with the 2 parallel port cards I have installed to operate the mill and lathe - same issue.
I then tried something that seems to help but does not cure the problem - I changed the setting for LookAhead from 20 down to 1 - this causes the errors in the feedrates to go away but the G32 problem still exists.
To recap, since I think there is a danger of my base problem being buried:
when I run a simple 'G32, retract, return Z, return X' code, on some occasions the script _initiates the G32 and then jumps through to the end of the return X, without moving the axes other than a microscopic amount. The DROs however move normally with Z moving from the start at 30 to between 27 and 25, X sometimes moves from the start 0 to 0.3 or so, but not on every occasion.
Points to note are that the G32, retract, return Z, return X are ALL apparently executed - as far as Mach 3 is concerned anyway. It seems to me that Mach suffers a problem with the G32 which causes it to leap through code until it hits a firm stop - an M1 in this case.
Furthermore (when LookAhead is set to 20), the feed rates for the G0 and G1 moves are not those that are set - G0 runs at 400 (I think) and the final G1 to return X to the start is defined at 40 ans actually runs at 33 mm/min. Change the lookahead parameter to 1 and this problem goes away - perhaps linked, perhaps not.
Searching through other threads including another recent one you had commented on, suggests the use of the LastErrors file - this (see my upload earlier) generally shows the threading calculation correctly as a 1.5mm pitch screw, needing 350 mm/min on Z. The failing runs show that the thread pitch to be 40mm, needing 11000 or so mm/min. 
One other point is that I see a recent version of Mach (066 I think) has 'incorrect transfer of parameters for threading' as one of the bug fixes - is it possible that this bug is not acttually fixed?
So my own conclusion is that the problem is not a setup issue (as you say, the xml seems ok), but a bug buried in Mach. Maybe it is exacerbated by my setup but I am unable to understand how Mach thinks it has done 4 lines of code successfully when the DROs are not where the program says they should be and when 2 of those 4 lines take a visible finite time to execute.
It is as if Mach has upped the pulse rate to many times higher than it should be.....
So that you can see the problem, I shot this wobbly video on my phone - so sound/picture quality are not good) 
http://youtu.be/wMDoSYNnRcM but you get the general idea - the only fail that I shot was the first run though.
Almost final question - is there a diagnostic file I can look through to see what Mach *thinks* it has told the PP to do - and what frequency it did it at....or anything else I can look through to pin this problem down?
And finally, if this is a genuine bug, is there somewhere where I can report it as such??