Sunday, 26 January 2014

DWS Mandelbrot Explorer Mark II, and random notes about FireMonkey and threads

The DWS Mandelbrot Explorer, which renders tiles generated by Eric Grange's tile server, has been updated.

The following is a braindump of information about the app, about FireMonkey, about threading, and about how they all interrelate. I think some points will be interesting to you.

Miscellaneous things I've learned

  • It turns out that you cannot reliably use two FireMonkey Direct2D TBitmaps in two different threads at the same time. The code had a background thread to download the fractal data and create a tile bitmap, which was then passed to the main thread to draw. (Only one thread accessed each TBitmap at a time, but several threads used independent TBitmaps concurrently.) Sometimes this results in silent failures to update the texture object backing the TBitmap data. The DirectX methods CopyResource and DrawBitmap, and possibly others, can fail.
    I spent a long time investigating this, seeing what I might be able to do to hack in enough thread-awareness into the library that using two theoretically-independent bitmaps at the same time would work. The code is strongly designed around shared objects, used for all operations. I've looked into:
    • Direct3D factories and using the ID2D1MultiThread interface to synchronize (only Windows 7 and up though);
    • the (different) ID3D10Multithread interface for synchronization, which works brilliantly right up until it deadlocks;
    • surface sharing between APIs;
    • DXGI shared surfaces;
    • per-thread instances of render targets and textures;
    • hand-rolled synchronization around specific areas; 
    • ...you name it. Several things have been almost successful but nothing is reliable, and the more I read, especially when there are caveats about certain functionality not working on Vista but only on 7, or only on hardware and not on WARP, the more I understand why it makes sense for the FireMonkey Direct2D canvas implementation not to have even tried to implement it.
      Because of this, there's a lot more processing in the app's main thread, generating the bitmaps as well as just drawing everything onscreen. This is not ideal. It seems FMX apps will unfortunately have to stay away from second-thread graphics processing but this is due to the underlying graphics libraries, specifically here Direct2D. Ie, it's not really FireMonkey's fault. If you really need to, you can use a specific graphics library in your other threads, just don't use lots of plain TCanvases and TBitmaps and expect it to work - keep them in one thread. Graphics32 and VPR might be worth investigating.
  • Also, hacking thread support into a non-threaded library you're not overly familiar with is a hard task. Just saying. Fun though.
  • There seems to be no cross-platform Delphi RTL equivalent of WaitForMultipleObjects. The TEvent class wraps an event, but what happens when you want to wait for two of them? This would make a good open source class, I think (a list of events, with wait methods?) but right now this app's code re-uses one event for several things making it a bit complicated.
  • If you want to interrupt a TIdHTTP downloading (via blocking call TidHTTP.Get), call TIdHTTP.Disconnect. The object seems to be reusable afterwards for a subsequent Get call. I use Disconnect to achieve fast termination of thread downloading when I want to stop the background thread objects and need to wait on them, so need them to stop quickly.
  • The Quartz canvas on OSX is noticeably slower than Direct2D.  I think, without measuring, it might even be slower than GDI+. I am curious why this is and if anyone else has seen the same thing in their FireMonkey apps.
  • TImageControl on OSX does not render correctly when you write to its Bitmap. I traced through the code and am not sure what it's doing - the code looks valid - but I can state that after drawing onto the Bitmap what was displayed onscreen, a blank solid color, was wrong. Since I just needed a canvas to draw on and a control to give mouse events, I ended up just using a TPanel.
  • A FireMonkey TForm is missing some seemingly obvious events: there is no OnClick or OnDblClick.  I shouldn't need a client-aligned panel to give mouse click events, but I did.
  • The look of FireMonkey on Windows has improved greatly since XE2, and is quite close to how native Windows controls / the VCL looks. The following image (click to expand) has exactly the same controls placed in the same position with the same dimensions on a XE2 FMX form, a XE4 FMX form and VCL FMX form.  The XE2 one doesn't look very native; the XE4 one is quite similar to the VCL.

    In some cases I think FireMonkey is better. Look how the TTrackBar's edges align nicely in FireMonkey, for example, but don't in the genuine native control - something that bugs me every time I use the real one.

Notes about the app itself

FireMonkey in practice

  • The point of the app was a write a non-trivial FireMonkey app and see, in practice, what issues arose. I can confidently state I have learned a lot about FireMonkey building this app. That was the goal.
    Has there been anything particularly bad? I don't think so. There were three areas:
    • Bugs: none serious. Graphics performance is easily fixable.
    • Cross-platform: some code is slightly less clear than it could be, because I've stuck to using the cross-platform RTL and FMX only.  (For example, had I been able to drop back to using WaitForMultipleObjects(Ex) some code might have been clearer.)
    • Framework: FireMonkey is different to the VCL, and I limited myself to sticking to it and the Delphi RTL / libraries only rather than using, say, Windows API code. This is no different to learning and using any other framework: the same problems occur learning the .Net libraries, Cocoa, etc.
  • On the whole, FireMonkey is a good framework the details of which you have to learn, like any other. It's backed up by the Delphi RTL which is fairly comprehensive. Put it this way, if you can write a VCL app using Delphi, you can write a FireMonkey one.

Other

  • Finally, thanks to François Piette, who submitted some changes to port from XE2 to XE5. I integrated his changes which improved the code - especially bitmap generation from the downloaded data - greatly. Thanks François!
  • Download version 1.1 or find the source code here. Windows 32, 64 and OSX 32.

Next up will be something completely different.

Saturday, 25 January 2014

A unit to enable Direct2D in FireMonkey where possible

A few days ago I posted about FireMonkey's choice of canvas classes, where it would choose to render via GDI+ instead of via Direct2D.  There were two fixes: one (untested and possibly dangerous) enabled hardware rendering on DirectX9-class hardware, but required editing the FireMonkey source; the second (known safe and tested) enables optimised software rendering via WARP when DirectX10 hardware support is not available.  This last isn't as good when you have DX9 hardware, but it doesn't required editing any FireMonkey source and WARP renders surprisingly fast - it's certainly good enough for 2D/HD applications.

This code is now available as a unit you can include in your FireMonkey apps.  It is ifdef-ed so it will only function on Windows when compiled with XE4 and XE5. (Thanks to commenter Skamradt in the original post for suggesting this.) That means you can include and use it when compiling for OSX or Android without having to worry about it not compiling on those platforms, and that it will also only apply for known IDE / RTL versions that require this patch.  I have only briefly tested it on XE4 (where it works) and XE2 (where of course it doesn't, but compiles anyway.)  Suggestions / changes are welcome.

To use it, add the unit to your .dpr file and then add a call to TryUseDirect2D before Application.Initialize, like so:
program Project1;

uses
  FMX.Forms,
  Unit1 in 'Unit1.pas' {Form1},
  FMXDirect2DFix in 'FMXDirect2DFix\FMXDirect2DFix.pas';

{$R *.res}

begin
  FMXDirect2DFix.TryUseDirect2D; // <-- The key method

  Application.Initialize;
  Application.CreateForm(TForm1, Form1);
  Application.Run;
end.

The code is currently checked in to the source of my DWS MandelbrotExplorer app - the rest of the code of which is in a halfway state, so no point looking at it right now :)

  • You can find the unit here.
  • It's MPL licensed, so useable in both commercial and open-source software.
  • It makes a big speed difference for FireMonkey apps on my Windows 7, non-DirectX-10-hardware.  It should make a noticeable difference for anyone on a recently patched (with the Platform Update) Vista or 7 without DirectX10 hardware, which includes those running Windows in a virtual machine like Fusion.

Monday, 20 January 2014

FireMonkey canvas classes and a bugfix to speed up your apps

Everything you need to know about FireMonkey canvases - and a performance boost bugfix for some people as well!

I recently posted my first real-world FireMonkey app, which gave a zoomable, scrollable, very interactive view of the Mandelbrot fractal using the precomputed DWS Mandelbrot tiles. It worked fine on my computer.

Those are famous last words.

Soon the comments on that page were filled with people saying it didn't work: the UI said tiles were downloading etc, but it drew only a blank solid colour where the fractal should have been. I made an educated guess that the problem only happened when using the Direct2D canvas, and put out a "fix" that restricted it to drawing using GDI+. This fix worked - it draws - but GDI+ is slow, and the app as it's currently available is not of a quality I feel personally comfortable having publicly available with my name attached. Clearly I need to fix it. But how?

This is a perfect example of why being aware of the different canvases in FireMonkey matters. You need to test with each one that your app could possibly end up using on an end-user's machine, which means you need to know what they are, when they're chosen by FMX to be used, and how to force a specific choice in order to test each case. Moreover, there is (IMO) a bug in Firemonkey's logic about which class to choose when, which results in your apps rendering much more slowly than they need to in some use cases, and you may want to tweak some code in order to fix this and make your app render faster.

What's in this article?

  • The role of canvases in FireMonkey rendering
  • Overview of each possible Windows canvas class: GDI+, Direct2D, and GPU
  • How does FireMonkey choose which canvas class to use?
    • Investigating when Direct2D is chosen vs GDI+, and we find a bug
    • Fixing the bug - three possible solutions
  • For testing: how to force the selection of a specific class
    • Checking what class you are using
  • Summary
This is a long article - two and a half thousand words - so let's get going.

The role of canvases in FireMonkey rendering


FireMonkey is a cross-platform UI toolkit. As such it needs to be able to render everything onscreen independent of the underlying graphics framework - it needs one API you and I can code against that runs on Windows and OSX and iOS and Android.

It achieves this by using a variety of different canvas classes.  That is, when you access a TCanvas such as Form.Canvas or TBitmap.Canvas, due to the wonderfulness of polymorphism the actual class you are using can vary widely.  Here are the possibilities:
  • TCanvasGDIPlus (Windows)
  • TCanvasD2D (Windows)
  • TCanvasGpu (Windows)
  • TCanvasQuartz (OSX)
  • TCanvasQuartz (iOS, implementation appears independent of the OSX class with the same name)
  • At least one more for Android in XE5+. 
Let's write off the platforms that only have a single canvas implementation - OSX, iOS, and probably Android. (I don't have XE5 and googling didn't show much about the underlying code.) If you're using one of those platforms, you are by default testing using the only canvas class and this is a non-issue. But that leaves three possible canvas classes that your app could end up using on Windows. (Even if you know about the Direct2D and GDI+ canvases, I bet you didn't know about the 'GPU' canvas. I sure didn't.)

Each Windows canvas class


GDI+


TCanvasGDIPlus is the default, fallback canvas. It uses GDI+, a software-only, fairly slow, API provided by Microsoft in the Windows XP days. It will run on anything, but your rendering performance may not be great. For example, in my fractal app which draws anywhere from four to a few dozen 256x256 tiles at various scales on the window with every paint, at the default small window size click-dragging to navigate is fast. But if you maximise the window, and the rendering area becomes much larger, scrolling around - which invalidates with every mouse movement, effectively drawing as fast as possible - is painfully laggy. This is not FireMonkey's fault. It is one of the problems with using GDI+, and I have experienced the same problem drawing complex interactive UIs with GDI+ before.

If your app is rendering using this class - I show how to find out which class later - I strongly recommend you find out why and do what you can to fix it. In general, avoid using this class if possible.

You will always end up using the class on Windows XP, since it's the only one supported. On all other versions of Windows, Vista and above, 99% of the time you will be able to use TCanvas2D instead (once you fix a problem with when it's chosen) and I highly recommend you do this.

Direct2D


TCanvasD2D uses Direct2D, a 2D API implemented over Direct3D, which is available on Vista SP2+ and above. It is hardware-accelerated and fast, and theoretically the default. The quick answer is that you want to use this class if at all possible, but you may need to make some code changes to do it. Without some very small tweaks, there are cases where FireMonkey will choose a GDI+ canvas instead of a D2D one on hardware where D2D would run faster - much, much faster. This is rare, but my setup is one where it occurs.

GPU


TCanvasGpu is turned off by default, and is only used if the global FMX.Types.GlobalUseGPUCanvas is true. (Set this in your project file before Application.Initialize.) It's quite neat in that it uses a base class TContext3D to do its work, which has a very similar system for choosing which subclass is appropriate to instantiate as the canvas system. There are context classes for D3D9, D3D10, GLES and Quartz.

The first time I tried this out, it crashed immediately - FillText ends up calling TCharHelper.ConvertToUtf32 with an empty string, which raises an exception. Reading the preceding code, which seems to implement text wrapping, I don't understand why it's trying to do what it is.

TCanvasGpu running on Windows 7 on DX9
hardware. Yes, there is a whole TTrackBar
between those two buttons. (See it? Me either.)
This class is turned off by default and I do not
 recommend manually enabling it.
On my machine, it uses a TDX9Context to draw. There are noticeable severe rendering bugs. In my fractal app one control, a TTrackBar, doesn't draw at all. Text draws 'bold', which looks similar to the effect you get drawing antialiased text over itself many times. Buttons had one-pixel-wide edges missing.

I don't know how much of this is due to the TDX9Context it was using, and how much is due to TCanvasGPU itself. Since on DX10-class hardware FireMonkey will use Direct2D, DirectX-9 class hardware is the only use case on Windows for this class. (As it turns out, we should normally use Direct2D even for DirectX9 hardware. More information below.) The severity of the bugs are, I suspect, why it is turned off by default. I do not recommend manually turning it on.

How does FireMonkey choose which canvas class to use?


In FMX.Types.pas is a method TCanvasManager.GetDefaultCanvas. This returns a metaclass which is used to instantiate the actual canvas class. The first time this method is called, it assembles a list of possible, valid canvases which the current platform supports and then from that list it chooses which one is best to instantiate. There are some complex if statements about whether a class is the default and whether to try to use a software canvas, but in my testing these didn't make any practical difference.

The key is in the TPlatformWin.RegisterCanvasClasses method, which out of the GPU, D2D and GDI+ canvas classess tests which can be used and where possible adds them to this list. It only 'registers' (adds to the list) the GPU canvas if GlobalUseGPUCanvas is true, and by default it is false (see above.) That leaves D2D and GDI+.

Investigating when Direct2D is chosen over GDI+, and a FMX bugfix


First off, the easy case: GDI+ is the fallback, and is always available on machines that meet the FireMonkey requirements. It is always registered. This means that if the Direct2D class is not registered, your app will end up using GDI+.

Direct2D is trickier. And remember, any bug or quirk here that invalidly thinks D2D is not the right choice will cause the GDI+ canvas to be chosen instead, and that's bad.

Fmx.Canvas.D2D.pas's RegisterCanvasClasses method checks the Direct3D 10 capabilities reported by DirectX, and registers the D2D canvas if the D3D10 driver type is either hardware or WARP. This latter is interesting: the Windows Advanced Rasterization Platform is a software rasterizer supporting Direct3D 9.1 through 10.1 feature levels, and by all accounts is a very good one.  It is part of the DX11 runtime which you need to have installed, which is part of the platform update for Vista or Windows 7. You should already have these automatically through Windows Update.
Direct2D applications benefit from hardware-accelerated rendering on modern mainstream GPUs. Hardware acceleration is also achieved on earlier Direct3D 9 hardware by using Direct3D 10-level-9 rendering. This combination provides excellent performance on graphics hardware on existing Windows PCs.
...
When rendering in software, applications that use Direct2D experience substantially better rendering performance than with GDI+ and with similar visual quality. 

- MSDN Direct2D page
In other words, on DirectX 9.1 hardware there is a high-performance hardware rasterizer available and on lesser hardware there is still a high-performance software rasterizer available. Now, for DirectX 10 and above, it's simple: Direct2D will be chosen. But for DirectX9-class hardware, there is a choice between two software renderers: GDI+, an old and slow API, or WARP, a speedy, very technically impressive API. Clearly, where possible, FireMonkey should choose to use it, falling back to GDI+ only if nothing else whatsoever is possible. As you've no doubt guessed if you've read this far, it doesn't, and this is what we need to investigate and fix.

The problem lies in TCustomDX10Context.CheckDevice. An edited version of the problematic portion of code is:
if ...{can create a D3D hardware device} then
begin
  FDriverType := D3D10_DRIVER_TYPE_HARDWARE;
end else if
  not TCustomDX9Context.HardwareSupported and
  Succeeded(D3D10CreateDevice1Ex(D3D10_DRIVER_TYPE_WARP, D3D10_CREATE_DEVICE_BGRA_SUPPORT, g_pd3dDevice)) then
begin
  // Switch to software mode
  FDriverType := D3D10_DRIVER_TYPE_WARP;
end;
It's this else statement that is problematic. It basically says to use WARP if it's supported (fine) but only if Direct3D9-class hardware is not supported (not fine.) Almost all computers since about 2005 will support D3D9, and this API is available on Vista and above. The only reason I can think of for this is that TCanvasGpu with D3D9 support is expected to be the fallback here before GDI+. However, as we've seen, not only is that class buggy but it is disabled by default (probably because it's buggy.) This means that anyone with D3D9 hardware (but not D3D10+ hardware), and that includes people running on virtual machines like VMWare Fusion, which only supports D3D9 emulation, will end up using GDI+ when they could be using the much faster WARP.

How can we fix it?


A global switch

Normally the best way to force FireMonkey to choose a particular graphics path is via one of the globals at the top of FMX.Types.pas. There is a potentially suitable one: GlobalUseDX10Software. (Remember you need to set these in your project file before you call Application.Initialize.) It's false by default but if you set it to true, you will get WARP. Unfortunately, this means you will always get WARP when possible, even when hardware DX10 support is available. No matter how good WARP is we should choose the hardware-accelerated option when possible, and so this is a no go.

Edit the FireMonkey source


The second option is to edit the FMX source. To do this, make a local copy of FMX.Context.DX10.pas in your program's source folder. (I do not recommend editing RTL source directly and trying to recompile FMX - leave it alone and make your changes separately. If you add your local file to the project it will be used in preference to the RTL version. Just make sure you document what you've changed for future you.)

Add this local file to your project, and remove the 'not' from the else if statement above. It should look something like this:
end else if {not TCustomDX9Context.HardwareSupported and} ...
Recompile and you should get a Direct2D canvas. If you were using the GDI+ canvas before, you should notice a significant difference.

Manual code


The final - and probably best - option is to add some code to try to create a D3D10 context and check the driver type, and if it returns WARP then turn on the above switch. The following slightly ugly method (it's 1AM...) does the trick; call it before Application.Initialize in the project file. This method depends on  FMX.Types, Winapi.D3D10_1, Winapi.D3D10, and WinAPI.Windows.

Because of the method's dependencies and for code cleanliness, I would suggest putting this in a separate unit from the main project source, and ifdef both the unit being included, and the method being called, out completely if you are not compiling for Windows (the MSWINDOWS constant.) As suggested by a commenter below, it is probably also a good idea to ifdef for your specific Delphi version in case this is fixed in future.

The below code doesn't check for D3D9 hardware support, assuming that if it can create a WARP device that's enough. Feel free to add back in additional checks.
procedure TryUseWARPCanvas;
var
  DX10Library : THandle;
  TestDevice : ID3D10Device1;
begin
  DX10Library := LoadLibrary(Winapi.D3D10_1.D3D10_1_dll);
  if DX10Library = 0 then Exit;

  try
    SaveClearFPUState; // Copy from FMX.Context.DX10
    try
      if GetProcAddress(DX10Library, 'D3D10CreateDevice1') = nil then Exit;

      // If there's no hardware D3D10 support, but there /is/ WARP (software support)
      // force that to be used. Don't bother checking DX9 support, just go for WARP.
      if not Succeeded(D3D10CreateDevice1(nil, D3D10_DRIVER_TYPE_HARDWARE, 0, D3D10_CREATE_DEVICE_BGRA_SUPPORT, D3D10_FEATURE_LEVEL_10_1, D3D10_1_SDK_VERSION, TestDevice)) and
        Succeeded(D3D10CreateDevice1(nil, D3D10_DRIVER_TYPE_WARP, 0, D3D10_CREATE_DEVICE_BGRA_SUPPORT, D3D10_FEATURE_LEVEL_10_1, D3D10_1_SDK_VERSION, TestDevice))
        then begin
          FMX.Types.GlobalUseDX10Software := true;
        end;
    finally
      TestDevice := nil;
      RestoreFPUState; // Copy from FMX.Context.DX10
    end;
  finally
    FreeLibrary(DX10Library);
  end;
end;

Tweaks to this code


You might want to change a few things about this code:
  • Editing the FMX code: to match the manual code, I changed it to remove the DX9 check entirely. It either sees if it can create a D3D10 hardware device, or otherwise tries to create a WARP device. Thanks Remy for the suggestion.
  • Untested useful tweak: Direct2D is still hardware-accelerated on DX9-class hardware. Try changing the feature level to D3D10_FEATURE_LEVEL_9_1 to see if it has this level hardware support on your computer. You will need to change both the manual code test (if you use it) and the FireMonkey code creating the devices in the same area as above to match. I haven't tested this and it's just an idea for further investigation; the current code goes either either with D3D10-hardware or WARP, which I know for sure will work and are good, safe modifications to make. Changing this will always require editing the FMX source.
  • XP support: with Microsoft dropping XP support on April 8, 2014, it's quite possible the next version of Delphi will not need to support XP at all - or at least, will only do so as a legacy option. I would suggest that Embarcadero make some changes requiring the Platform Update be installed as prerequisite for FireMonkey apps, and then using one of the D3D10, D3D9-feature-level, or software WARP Direct2D canvases as the only option on Vista and above, and only using GDI+ on XP. There should never be a case on Vista or Windows 7 where GDI+ is the chosen canvas.

For testing: how to force the selection of a specific class


I stated at the beginning that you should test with each possible canvas type, in order to catch code tht works with one and doesn't with another.  How?
  • To force GDI+, set FMX.Types.GlobalUseDirect2D to false.
  • To force Direct2D (using the WARP software rasterizer even with hardware support - so only for testing) set FMX.Types.GlobalUseDX10Software to true.
  • To force the GPU canvas (unnecessary for testing, since it's off by default) set FMX.Types.GlobalUseGPUCanvas to true

How to check what class you are actually using


This is fairly simple. Find a valid normal canvas (such as Form.Canvas) and check its ClassName. It will be one of the above classes.

Summary


  • FireMonkey has several underlying graphics classes depending on the platform and, on Windows, on the capabilities of the platform
  • You need to test each one, because code that works on one can fail on another
  • On Windows, if you (or a user) have D3D9 hardware (but not D3D10 or higher hardware) FireMonkey will use GDI+ to render where it probably shouldn't, which will make your program noticeably slower when run on (a) D3D9-class hardware, or (b) in a virtual machine like VMWare Fusion. It should use Direct2D's software rasterizer instead. Fix this with one of the three ways above; I recommend with the sample code I showed above.