I have been spending quite a bit of time on a (C#) simulator for distributed systems interaction. The simulator is to support investigations into the impact of node and network failures on protocols. I had been banging my head against the wall on some of the visualization support I wanted to build for use with the simulation. Thismorning Julia kindly helped me to take the hurdle I was stuck behind.
The work was triggered by a suggestion from Jim Gray to collaborate on the simulation of two phase commit and the new paxos commit Jim wrote together with Leslie Lamport. Paxos commit is a transaction protocol based on the well-know paxos consensus protocol [1], intended to overcome the blocking failure scenarios 2PC is know for; Pat Helland used to call 2PC 'The Unavailability Protocol'. I am not necessarily convinced Paxos is the way to go about building fault-tolerance support for transactions, but I am certainly willing to help drive a stake through the heart of 2 Phase Commit.
The current package (all in C#) consist of:
Here is a windows media movie with the simulation explorer in action (280K).
Below are some screen shots from the demo app, click on the thumbnails to see the full images. The simulation explorer holds a datagrid with the transactions and the related events. It has scpl graphs for the commit and abort times and a time progress plot. The timeline controls shows the visualization of the current transaction. The timeline for each process changes with its state. The gray bar above the timeline indicate a store operation. If the timeline disappears it means the node has crashed. Text near the timeline indicates the firing of a particular timer. I need to find icons to go with these events. You can filter on aborted transactions, message loss and transactions with node failures.
This is the basic view with a the beginning of the normal 2PC transaction in the timeline
This is the same transaction scrolled to the end of the time line,
where the commit happens. (The gray bars are store operations).
Here the transactions are filtered to show only the aborted transactions
and the timeline display the end of the transaction where it is aborted
because of second resource manager failed.
[1] the Paxos consensus protocol is described the in the Part-Time Parliament paperr by Leslie Lamport.
Posted by Werner Vogels at March 25, 2004 02:27 PM