Dahu: online data analysis server

Dahu is a JSON-RPC server operated over Tango able to execute some Python code remotely for online data-analysis with a low latency.

The dahu server executes jobs:

  • Each job lives in its own thread (yes, thread, not process, it the plugin’s developper to ensure the work he is doing is GIL-compliant).

  • Each job executes one plugin, provided by the plugin’s developper (i.e. the scientist)

  • The job de/serialises JSON strings coming from/returning to Tango

  • Jobs are executed asynchronously, the request for calculation is answered instantaneously with a jobid.

  • The jobid can be used to poll the server for the status of the job or for manual synchronization (mind that Tango can time-out!).

  • When jobs are finished, the client is notified via Tango events about the status

  • Results can be retrieved after the job has finished.

Jobs execute plugin:

  • Plugins are written in Python (extension in Cython or OpenCL are common)

  • Plugins can be classes or simple functions

  • The input and output MUST be JSON-seriablisable as simple dictionnaries

  • Plugins are dynamically loaded from Python modules

  • Plugins can be profiled for performance analysis

Offline processing

All jobs can be run offline using the dahu-reprocess command line tool. This tool is not multithreaded and plugins are directly run, it is intended for:

  • offline developments

  • re-processing some failed online processing (where performances are less critical).

Dahu is light !

Dahu is a small project started at ESRF in 2013 with less than 1000 lines of code. It is used in production since then on a couple of beamlines. With its FIFO scheduler, dahu is very fast (1µs locally, 0.3ms from Tango)