Minutes of the DNA meeting held in Cambridge on 19/09/01
Present: Darren Spruce, Ed Mitchell, Charles Ballard, Dave Love, Graeme Winter, Liz Duke, Steve Kinder, Olof Svensson, Harry Powell, Andrew Leslie.
ACTIONS SUMMARY
ACTION: Graeme Winter
ACTION: Graeme Winter
ACTION: All
ACTION: Andrew Leslie, Harry Powell, Graeme Winter
ACTION: Ed Mitchell, Olof Svensson, Darren Spruce
Meeting Minutes
Some preliminary discussions concerning security issues took place prior to everyone arriving at the meeting.
The ESRF are considering using SSH for security and authorisation.
Harry would like to have a local system working behind the firewall (this system would start off by being a single computer to which more would be added) and then extend the system to cover systems outside the firewall after the set-up inside the firewall has proved to be reliable. Clearly the ability to upgrade things easily must be built in from the start.
Graeme Winter outlined how the mosflm server currently works by giving a standard port to which the GUI can connect. The server, which is running as root, then provides an extra thread belonging to the user to which the GUI reconnects. All authentication is therefore performed before the thread is created. Alternative ways, such as the user starting the server appear to be more complex. Queries were posed regarding the server running on the beamlines. Would there be one server per beamline or per site? Graeme reported that one server per machine would be required.
A lot of systems gain user information from the /etc/passwd file. However this is not true of all systems. This would need to be investigated – it was felt that Windows NT was a potential problem in this area.
Additionally it was pointed out that the Computing Service (CS) at the ESRF would not like to have a daemon running with root privileges. If this route for running the mosflm server is to be pursued a strong case will have to made to the CS. It was pointed out that the purpose of the server running as root is to authenticate users and spin off threads owned by the user (ie to change the UID). However the fact that it is there and running as root might be sufficient to cause problems. The SRS requirements at DL are slightly less stringent but this issue still provides cause for concern.
The ESRF are bidding for a cluster of linux boxes in order to assist in trying to reduce the bottleneck produced especially by the new ADSC 210 detector which is capable of producing 300Gbytes per day of data. One plan is to distribute processing of the data (the main source of the bottleneck) via a script that would determine which machines to use for processing. It would also be possible to put a script around mosflm so that it could be run in a similar way.
ACTION: Ed Mitchell, Olof Svensson, and Darren Spruce
The basic flowchart is shown above. In the first instance sockets were used to communicate between the gui and the translation server, using an XML like syntax. The communication will make use of Conformal XML, passed over sockets using HTTP/1.1 as a communication protocol.
In reality a more general protocol is required:
The server is constantly listening, via a listening port (O), for a GUI trying to connect to it. When a GUI does connect, a thread is started and the GUI is reconnected to the server. It is possible for a second GUI to connect here as well – providing the possibility of remote monitoring eg. To assist in solving problems. However, the second GUI could only be started by the active client and would have NO control. The GUI and the server need not be on the same computer. When processing is finished the GUI is disconnected and mosflm is killed and the thread is tidied up via another thread for garbage collection. The expert system would connect into the system as indicated above, in the same manner as the GUI.
In the event of disconnection, the translation thread should finish quietly without disabling the system as a whole. If one thread actually dies, the whole server will die, as it is really only one program with many threads, rather than many processes.
In the event of a second connection being required: The server would receive a request from the controlling GUI to open a second port on a session. The number of the second port would be returned allowing a dummy session to be connected. There would be no additional security issues as the session would be simply echoing the main session. It would be possible to switch control from the first session to the second session but this would have to be a conscious user choice.
In reality wherever one socket is used, two are in fact used – one for data and the other for control. As it is possible to envisage the situation where information goes in both directions everything is structured in XML to make things easier. The data port is to allow the more bulky information (eg images) to pass without the need for parsing.
The current version of the server can index mutliple images and return the cell and relevant matrices.
In the event of a session dying a file would be written containing information on the current status of processing so that the information can be retrieved a reloaded. It would also be possible to do a log file of the commands issued.
After a discussion on distribution of the source of a program which is constantly being updated it was decided that Graeme would issue Version 1.0 of the server by 26th September 01.
ACTION: Graeme Winter
Background indexing (ie non-interactive) has been completed and appears to be working robustly and reliably. Currently the program indexes the image(s) and chooses the solution. The solution chosen is the one that mosflm would currently select. The work was then extended to include non-interactive post-refinement in a robust manner. The next stage is to include a non-interactive, robust, version of integration so that it will be possible to index, post-refine and integrate reasonable data. Socket communication has not been used for post refinement or integration as there is not yet suitable output from these processes but sockets have been used for the output of the results of indexing. This was demonstrated by Graeme Winter.
The speed of mosflm was queried. As both detector areas and data rates are always on the increase any means to increase the speed of mosflm would be welcomed. Diederichs produced a paper which has enabled (will enable?) a limited amount of parallelisation within the program to be implemented fairly easily. Reflections could be integrated in parallel to help increase the throughput. However it would not be directly scalable – going to a 4 processor machine is likely to only give a factor of 2 increase in speed. It was estimated that it would be a 3-6month project to investigate speeding up mosflm. Andrew Leslie commented that speeding up the algorithms would not be trivial. Some improvements though could be gained with optimisation of compilers. Another bottleneck was also felt to be the reading and writing of scratch files. It was pointed out that d*trek has apparently been parallelized and it was wondered whether some of the techniques employed there could be used here.
However it was felt that any increase in speed would be welcomed – even a factor of 2 would be extremely helpful.
In the above scenario no authentication takes place. A command can be sent to collect data – effectively a "start" by a user and a status report will be sent back when the data collection has finished.
However various issues came to light in doing this:
In the scheme illustrated here mosflm would sit further to the right, beyond the client which is the equivalent of the expert system.
The aim is to put a button onto the beamline GUI for the command "Characterise crystal". Clicking on this button would cause an image(s) to be collected, indexed and a data collection strategy and an effective resolution (from I/sigma) be returned. Also the ability to display the results is required – via a log window and output to a standard log file. The collection of non-contiguous wedges of data would be dealt with by issuing a set of collect commands.
The aim would be to put the first stage of this on a beamline by Christmas/January at both the ESRF and DL. This would be an implementation allowing a crystal to be characterised by user with minimal intervention – images taken, crystal indexed with mosflm and a list of solutions displayed of which the recommended solution will be written out.
ACTION: All
Current status:
To be developed:
The current situation works fine as long as everything goes ok. However no definitions have been created to deal with errors – for example what happens if a command is incorrectly spelt? How is it known whether a whole solution has been received or even that the end of the solution has been reached? In addition the XML being used was not valid XML. Graeme Winter pointed out that he had had problems with XML so he had written his own wrapper for it – to get around the problem that the XML parser was broken.
A problem also exists in that we are dealing here with 2-way traffic – commands are being passed and replies received. XML, as it is a document, is perhaps not the best suited for this role. It was suggest to add "magic lines" thus creating a package and parse the information line by line.
Dave Love stated that he had written a document type definition (dtd) and that any XML should align with this dtd. The dtd can be viewed at http://www.dna.ac.uk/xml/dna_dtd.html
It was suggested that using a protocol would provide a solution to these problems. It would be possible to use a protocol such as http to pass xml. HTTP is a generic state-less protocol that can be used for other tasks than hypertext. Two versions widely in use: HTTP/1.0 and HTTP/1.1. It was a suggested to use HTTP/1. rather than HTTP/ 1.0, as it is more rigorously defined and allows persistant connections. The plan would be to implement http into the mosflm server and any replies sent out would be in XML. The advantage here is that there are unambiguous definitions of common errors (as opposed to scientific errors such as an inability to index an image).
Another alternative to HTTP is XML-RPC, which is based on http as well. It's a spec and a set of implementations that allow software running on disparate operating systems, running in different environments to make procedure calls over the Internet.
It's remote procedure calling using HTTP as the transport and XML as the encoding. XML-RPC is designed to be as simple as possible, while allowing complex data structures to be transmitted, processed and returned.
Additionally it would be possible to use CORBA. However that was felt to be overkill. It was pointed out that the final product must work on a number of sites around the world which could result in problems if CORBA were to be used.
The issues of how HTTP (or an alternative) could be incorporated into a C-based mosflm server was discussed. Issues covered were:
A query was raised as to whether the use of HTTP turned the mosflm server into a web server. This was felt not to be the case as the mosflm server does not fulfil all the requirements of a web server. It would be possible to use a web server but not all the functionality of a web server is required.
Is HTTP the best protocol to use? If XML is used everything needs to be thought of in terms of documents as opposed to just streams of information. This can cause parsing problems especially associated with the python data structure which wants everything quantised.
Extensive discussions took place on the merits of using HTTP over other protocols. It was decided that HTTP was probably the best protocol to use and Graeme Winter would go away and investigate how much work would be required to implement it into the existing version of the mosflm server. Dave Love said that he would send Graeme Winter details of HTTP libraries (libwww).
ACTION: Graeme Winter
ACTION: Andrew Leslie, Harry Powell, Graeme Winter
The next meeting will take place in November to report progress achieved in including the "characterise crystal" button in the beamline data collection software. Progress with the mosflm server will also be reported. The suggested date of this meeting is Wednesday 7th November at a location to be decided. Alternative dates for this meeting can be suggested.
The next full meeting to report how the release of the "characterise crystal" button went and how it was received by the users will take place at Daresbury Laboratory on Friday 1st February. Again alternative dates (and locations) will be considered.