2015/06/21

Integrating System.Diagnostics: Profiling

Before we get into profiling, I have one quick thing to note: Assumptions are bad. I knew there was something called Performance Counters in System.Diagnostics, and kind of assumed that I'd use them for my profiler, but as it turns out, they are a Windows thing for measuring your programs performance concerning memory, exceptions and JIT compilation. I might have a use for that, but the important part of the profiler, measuring time, will use a simple System.Diagnostics.Stopwatch after all.

Now, if you've been a programmer for any significant amount of time, you've probably heard this quote by Donald Knuth before:
Premature optimization is the root of all evil.
If you ever played a game that came out anytime in the last decade or so, you also know that performance in modern games is critical, so optimization will have to happen at some point. The trick to optimizing the right code at the right time is profiling. For those unfamiliar with the term, it means measuring how much time is spent executing different portions of your code.

There's different methods to do that, but I won't get into that here; I have no clue how I'd even implement a statistical profiler, so let's talk about why mainstream profilers like JetBrain's dotTrace and red-gate ANTS aren't particularly well suited to games. It all boils down to them being unaware of the game loop and the concept of frames.
Their profile reports primarily consist of a call graph, sorted by how much time each function cumulatively took. And that works fine if you're looking to triple your tech demo's average frame rate, but once different game states come into play and you're trying to even out performance spikes, all the valuable information is averaged out.
Modern profilers come with tools to alleviate similar problems, such as a histogram that allows viewing profile reports for subsections of the entire run, but that's still not enough: I want separate profile reports for individual frames, I want context information on the game state and what the individual functions were doing in that particular frame.
Essentially, I want my profile report to be interspersed with my log data. So I'll build a profiler on top of a TraceSource and a Stopwatch, and, once some actual optimization requires it, a log file viewer that is aware of the performance data inside, and/or a debug menu that allows stepping through individual frames while viewing their profile reports live.

The first draft

My first draft for this sort of profiler looks like this, but it leaves all of the heavy lifting to the hypothetical log viewer and does not allow live viewing of individual frame profile. It is also somewhat error prone in that there is basically no way to verify correct use of the BeginOperation and EndOperation methods. We can do better than that!

One way to ensure correct use, as suggested in Game Engine Architecture, is to create a class, say LogicalOperation, that calls BeginOperation in the constructor and EndOperation in the destructor, and create a local instance within the scope of the operation, and then to hide it all behind macros. That works in C++, since you know that all local variables are allocated on the stack, so you know exactly when their constructors and destructors are called.
In C#, however, where a local variable is allocated is not defined in the language spec, and thus is an implementation detail of the compiler. I considered counter-balancing this with IDisposable and using statements, but still, that's at least a couple dozen unnecessary allocations every frame, which will increase the frequency of GC induced lags.
Also, since we can't hide the local variable behind a macro, instrumenting the code would litter it with unused local variables, which would make static analysis a pain to use, and also masquerade the intent of the code to some extent.

Building a call tree

Fixing the other problem first, I decided, was more important, and would actually solve the second problem. If I extended a call tree in the calls to the begin and end methods, I could simply analyze that for completeness at the end of each frame, where I'd have to reset it anyway.
The problem with that, however, is that a straight object oriented tree would still cost many allocations a frame. You could pool them, of course, but I decided to do it simpler still and use a preordered array of nodes. The struct I used for the nodes looked like this at first:
and of course, there is no name for the logical operation yet. This is because it seemed somewhat hypocritical of me to use a string for that, when C# strings are immutable and thus needed to be reallocated every frame, which I'd been fighting against just before. I considered using a StringBuilder for that, which I will use for most persistent string buffers I might need in this project, but then I realized that I'd have to construct these strings for the log messages anyway, so that idea went straight out the window, and I added the field as a plain old string.
In total, this is what I ended up with. This still is not complete; it is not documented yet, there is no way to disable call tree construction (which is a necessity for live viewing, unless I end up copying out the array...), and I'd like to add some more info to the root node, such as current used memory and possibly a frame ID. But that's peanuts  to what we achieved already, so let's wrap this up here.

No comments:

Post a Comment