Author Topic: AI War 2 v0.748 Released! "Macrophage Teeth"  (Read 1942 times)

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
AI War 2 v0.748 Released! "Macrophage Teeth"
« on: July 05, 2018, 05:21:59 pm »
Release notes here.

This week hasn't exactly gone as I expected, but it's been very productive.  I had planned on working on the lobby, first of all, but then some performance-unfriendly saves came to light and I decided I'd work on that instead.  The biggest hog in large battles is the vis-layer movement of ships around, and last release I talked about how I was going to look into System.Numerics and DrawMeshInstanced to solve that.  I also basically decided to upgrade to Unity 2018.2, even though that's still in beta, because it has some things we need.

Well, that didn't happen either!

Badger fixed up the "unit testing" program that we have for the sim layer, and for the first time I fired that up.  It's an area that was previously out of my domain, but that's been expanding a bit lately just due to necessity.  At any rate, I spent almost all of this week on performance improvements to the sim layer.


Badger also fixed some notable bugs, such as the Macrophage not actually doing damage when they attack.  That concludes my summary of the release notes other than to talk about performance.

Enjoy!

Chris

I again wanted to mention: we have a new Steam Developer Page.  If you go there and follow us, you'll be notified about other upcoming releases (including this one, of course).

Performance Hunting

I've tried using three different profilers in this period: NProfiler (which is awful despite promising big things), JetBrains dotTrace (which seems fine), and RedGate ANTS (which is maybe a bit better, but it's hard to be sure).

At first these tools were lobbing up really juicy bits for me that I was able to majorly optimize, leading to quite a bit of savings.  I spent way longer than I expected just trying to optimize squareroot again for our use cases, and finally cut that to a tenth or less of the load it used to represent.

I thought I was going to have to create a new form of data structure for tracking lists of entities in our code, and I came up with one in my head that I haven't implemented yet (a wrappered, pooled, linked-list structure that is super fast at adding, removing, and iterating, but has no random access possible).  It turns out that the things that I thought were going to require that MAY have been a profiling artifact, but the vote is still out on that.  I'm undecided on whether or not I need to make an adjustment there.

At the moment, what I am winding up finding is a suspicious "speed limit" on the sim code that is related to the framerate in some fashion (and no, it's not any of the obvious things; in this case it's a virtual framerate, but that still adjusts the speed limit).  At any rate, that's the next thing I need to dig into, because I think no other changes I make will show a result at the moment because all the background threads are presently running below that speed limit, making it the limiting factor.  Some of the later performance improvements I made show up with no benefit in actual gameplay yet, but they show up fine in unit testing if I set the virtual framerate really low.  Fun for soon.

One of the things that I've observed is that the background threads aren't hitting the other processors on my CPU as much as I expected, which was suspicious to me.  I've gone in and looked around, and my first thought was that our threads are spending too long transitioning from idle to active.  I'm still not sure that isn't the case.  We're using Thread.Sleep(1) in order for them to wait while being alive and then turn on as soon as a bool is set that says "your data is here -- now go!"

The problem is... apparently Thread.Sleep doesn't guarantee that it will only wait one ms.  Instead it will apparently average 12-15 ms.  That is an eternity!  No wonder things are not very busy on the secondary processors.  So that's no go.

I started using SpinWait to spin the cpu instead of Thread.Sleep(1), and that does indeed peg the CPU at 100, but there's 88% wasteage on spinning according to the profilers when that happens.  That's going to slow down the main thread and lose framerate as well as making the other threads slower to sync, too.  So that's really kind of a no-go.

I need to figure out what that mysterious "speed limit" in our code is and get rid of that, and that will solve a lot of the problem.  Other than that, though, I've got to figure out a way for the multithreading to be a bit more snappy in when it does things and stops doing things.  Right now it's 12-15ms at best from the word go to it actually doing anything (on almost a dozen background threads, individually).

We could supposedly use the Monitor class to help with synchronization, but I'll be honest that I don't yet fully understand how that would best be used while not pegging the CPU.  Offhand, it sounds like using objects to lock against and monitor instead of using a bool to check against -- still one per thread -- but I'm not positive.  Any multithreading-in-C# experts in the crowd that want to help out?  Either with some explanations or taking a look at our code, or even making some changes on your own?  We're pretty slammed, workload-wise.

Anyway, another option that is still on the table is potentially just switching to using the ThreadPool or some other form of multithreaded job class rather than threads that we keep warmed up and running and managed on our own.  That might be the simplest approach, we shall see.  I've done this in plenty of applications before, but none with ms-level speed required.  AI War Classic only had one secondary thread, and it didn't block the sim when it was idle, so we never ran into this with it.  With Stars Beyond Reach, we used a ton of threads, but it was done in such a way that a 12-15 ms lag was utterly invisible.

So that's what's going on lately!

 

 
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Toranth

  • Hero Member Mark III
  • *****
  • Posts: 1,244
Re: AI War 2 v0.748 Released! "Macrophage Teeth"
« Reply #1 on: July 06, 2018, 03:20:00 pm »
It's been a while since I did much multithreading stuff, but I've done some work with high-performance threading in C# in the past.

IIRC, the Thread.Sleep(1) problem is because Thread.Sleep always gives up the process, forcing a context switch.  However, even when it is time for the thread to resume processing, it has to wait for the thread scheduler to get back around to it.  The overhead of the double context switch plus the delays in waiting for the scheduler to realize the time has passed works out to the 12-15 millisecond delays you seen on modern Windows OSs.
In fact, in general, for anything in Windows involving times under that 12-15ms threshold, you need to do special work.  High-precision Timestamps, for example, are a common problem.


Without looking at code, this may not be helpful and you probably already know most or all of this, but...
I would strongly recommend using the available C# thread handling stuff - ThreadPools and Tasks - rather than trying to manage your own Threads.  If you have special circumstances, you can still write your own Thread schedulers for ThreadPools, but the default one works pretty well.  C# is usually good about standing up new threads in the Pool, but realizing when doing so means more overhead loss than performance gain.
Tasks are an easy way to toss methods onto the ThreadPool, with simple mechanisms for chaining, waiting/synchronizing, and cancelling them.  I found using Tasks was always simpler and no worse performing than creating threads manually.
TPL Dataflow might also help you... but I don't know if it is sufficiently high-performance.  That's designed for asynchronous and large-scale data processing with longer task times, which isn't exactly what you're doing.


For synchronization, C# has the ManualResetEventSlim, SemaphoreSlim, and ReaderWriterLockSlim which are my favorite synchronization objects for anything but the most basic locking.  Here's a bookmark I had describing the newer ones.  These 'Slim' versions operate within the .Net application space entirely, without using the underlying OS locking mechanisms, which means better performance than the standard Semaphores, Mutexes, and the like.  For the time periods you are talking about (sub 1ms), these may be useful.
For many uses, the AutoResetEvent is nice, but it has significant overhead compared to the others.
The standard Monitor, of course, is the best performing of all - it's a single call - but the others are not much worse (50-100% longer) for more functionality.
Here's a Microsoft R&D paper(pdf) on performance of some of these types.

Amusingly, which double-checking my memory, I discovered the Barrier class.  It's been around for a while, but I never noticed it.  TIL.

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: AI War 2 v0.748 Released! "Macrophage Teeth"
« Reply #2 on: July 06, 2018, 04:42:03 pm »
Here's a huge discussion that spawned. :)  https://docs.google.com/document/d/18uFxUOnlPf5-hoS0wQToPeBuFxdQ1hqpVVSP6XZ0RUM/edit#

Thanks for your notes on it as well!
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!