Write fast multi-thread Delphi-applications

While working on how to pass messages with sufficient information between threads for a fast and real-time application, and whether I should use a pre-allocated memory for a ring buffer in a cyclic message queue setup, or a whole different approach like the producer-consumer pattern, I came across this bullet list at Synopse while searching for a suitable pattern for these kind of constructs:


How to write fast multi-thread Delphi applications


I follow these approaches when writing software for real-time schemes, but folks at Synopse managed to put up a list of advises, which I would like to share:


  • Always use const for string or dynamic array parameters like in MyFunc(const AString: string) to avoid allocating a temporary string for each call.

  • Avoid using string concatenation (S := S + 'Blabla' + IntToStr(I)) , but rely on a buffered writing such as TStringBuilder available in latest versions of Delphi.

  • TStringBuilder is not perfect either: for instance, it will create a lot of temporary strings for appending some numerical data, and will use the awfully slow SysUtils.IntToStr() function when you add some integer value.

  • Don't abuse on critical sections, let them be as small as possible, but rely on some atomic modifiers if you need some concurrent access - see e.g. InterlockedIncrement / InterlockedExchangeAdd.

  • InterlockedExchange (from SysUtils.pas) is a good way of updating a buffer or a shared object. You create an updated version of of some content in your thread, then you exchange a shared pointer to the data (e.g. a TObject instance) in one low-level CPU operation. It will notify the change to the other threads, with very good multi-thread scaling. You'll have to take care of the data integrity and memory release of the old item, but it works very well in practice.

  • Don't share data between threads, but rather make your own private copy or rely on some read-only buffers (the RCU pattern is the better for scaling).

  • Don't use indexed access to string characters, but rely on some optimized functions like PosEx() for instance.

  • Don't mix AnsiString/UnicodeString kind of variables/functions, and check the generated asm code via Alt-F2 to track any hidden unwanted conversion (e.g. call UStrFromPCharLen).

  • Rather use var parameters in a procedure instead of a function returning a string (a function returning a string will add an UStrAsg/LStrAsg call which has a LOCK which will flush all CPU cores).

  • If you can, for your data or text parsing, use pointers and some static stack-allocated buffers instead of temporary strings or dynamic arrays.

  • Don't create a TMemoryStream each time you need one, but rely on a private instance in your class, already sized in enough memory, in which you will write data using Position to retrieve the end of data and not changing its Size property (which will be the memory block allocated by the MM).

  • Limit the number of class instances you create: try to reuse the same instance, and if you can, use some record/object pointers on already allocated memory buffers, mapping the data without copying it into temporary memory.

  • Always use test-driven development, with dedicated multi-threaded test, trying to reach the worse-case limit (increase number of threads, data content, add some incoherent data, pause at random, try to stress network or disk access, benchmark with timing on real data...).

  • Never trust your instinct, we are no computer but men... use accurate timing on real data and process.


For the problem I had to solve, I went for at lock-free message queue construct. It's not on the list, but you can find approaches for a lock-free queue for different kind of purpose on the Internet.


Best regards,

Normann


12 visninger0 kommentarer