Author Topic: Performance Bottlenecks  (Read 3476 times)

Offline TechSY730

  • Core Member Mark V
  • *****
  • Posts: 4,570
Performance Bottlenecks
« on: February 08, 2012, 12:17:01 pm »
Just out of curiosity, what are your biggest bottlenecks, aka, the methods that the most CPU time is spent in. (Note, CPU time, meaning time spent waiting on IO is not included)
Maybe like the top 2 or top 3 methods. (Note, when time spent in a method call does not count against the time spent in its caller, otherwise, obviously, the main thread loops would hold this spot)

Have you found that the bottlenecks are in your code, or in your dependencies (Unity or Mono's library)?

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Performance Bottlenecks
« Reply #1 on: February 08, 2012, 12:22:50 pm »
The bottlenecks are in simple math.  Doing collision checks and targeting checks are the two big ones.  Unity nor mono have anything to do with it, and our code has incredible numbers of optimizations in there.  The problem isn't that there is a problem, per se, but that when you have a vast number of ships all trying to check something in a loop against all other ships in that planet, then that inherently is millions upon millions of operations to perform.

In other words: the code is extremely tuned at this point, though there's always little things we can do here any there.  It's the task itself that is absolutely incredibly enormous, on a scale unlike that of any other strategy game I know of.

But this also means that the affected methods yield large rewards to very small performance tweaks, which is where we get most of our gains out of -- when you do any operation a million times, making a very small improvement yields a much larger result.  So that's been what most of our improvements have been over the last year and a half or so (though not all of them by any stretch).
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline TechSY730

  • Core Member Mark V
  • *****
  • Posts: 4,570
Re: Performance Bottlenecks
« Reply #2 on: February 08, 2012, 12:37:39 pm »
The bottlenecks are in simple math.  Doing collision checks and targeting checks are the two big ones.  Unity nor mono have anything to do with it, and our code has incredible numbers of optimizations in there.  The problem isn't that there is a problem, per se, but that when you have a vast number of ships all trying to check something in a loop against all other ships in that planet, then that inherently is millions upon millions of operations to perform.

In other words: the code is extremely tuned at this point, though there's always little things we can do here any there.  It's the task itself that is absolutely incredibly enormous, on a scale unlike that of any other strategy game I know of.

But this also means that the affected methods yield large rewards to very small performance tweaks, which is where we get most of our gains out of -- when you do any operation a million times, making a very small improvement yields a much larger result.  So that's been what most of our improvements have been over the last year and a half or so (though not all of them by any stretch).

This is an absolutely insane suggestion, but have you considered compiling the hyper-ultra performance sensitive code into assembly, hand tuning the assembly, and then using that assembly for the body of the method? (I think C# has a mechanism where you can give assembly code for a body of a method) Of course, the original code would still be there, commented out, so you can make changes there and redo the compilation, and then hand tuning again.

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Performance Bottlenecks
« Reply #3 on: February 08, 2012, 12:39:25 pm »
MovePlayerUnit - used by every ship, turret, etc; not shots.  Every frame.  Optimization here comes down to swapping * and / for << and >>, having "exotic case" branches grouped together and the whole thing not even entered except for ship types that need it (based on a computed-at-start bool), etc.

UpdateTargetList - used by a ship needing a target list (because theirs is empty or old) when they can't find an aggregate list to copy, it loops over every potentially-autotargetable-by-me unit on the same planet, builds an eligible target list, and then sorts that list using a fairly involved comparison (which potentially involves calling that compare for every ship in the list against every other ship in the list, though quicksort generally comes in under n^2) and stores either the entire list or the first 50 or so in its local target list.  Optimization here was largely getting the aggregate list system in place, and finding good "throttle" settings that tell it not to issue more than X target lists (weighted by how many ships are being evaluated) per Y time such that your units wouldn't look braindead but also wouldn't be getting a ton of updates they don't need.

RunCollisionDetection - kind of what it says :)  We got a lot of gains by pre-emptively dealing with this, for instance each wormhole has an index on it that increments every time a ship comes out "at rest" (the index resets to 0 if nothing comes through in a while, or if it hits the end); that index is used to get a point out of a computed-at-startup array of points that happens to be a fairly large circular pattern plotted around the origin, and that point is added to the wormhole's location to give the ship a destination.  Can still collide, but works wonders compared to everyone trying to be right on top of the wormhole and relying on a full collision detection.
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Performance Bottlenecks
« Reply #4 on: February 08, 2012, 12:41:25 pm »
No, I have never considered that.  It would be insanely complex, and I doubt the performance would be any better due to the massive amount of data being crunched and the fact that the data would have to be translated back and forth a lot.  In fact, based on my experience with 8086 assembly in college, I'm pretty sure that performance with assembly would be worse by an enormous margin.  An assembly core is great when you have complex math to do on a limited set of variables, and you want to do it as efficiently as possible.  We have relatively simple math to do on database-scale numbers of variables.  You can't just translate 500,000+ variables to assembly real quick and then get the result back. ;)
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Performance Bottlenecks
« Reply #5 on: February 08, 2012, 12:43:28 pm »
If we could skip array-bounds-checking in a few places that would help, but there's a tradeoff between performance and safety, and I happen to like that our game doesn't crash (much) :)
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline TechSY730

  • Core Member Mark V
  • *****
  • Posts: 4,570
Re: Performance Bottlenecks
« Reply #6 on: February 08, 2012, 12:51:31 pm »
Yea, as I said it was a crazy suggestion, and most compilers these days are pretty good at optimizing at that level for you.

Also, as you already mentioned, you have already done many optimizations short of that ultra-low level nonsense. That generally should be enough.

Also, hand-tuning assembly is VERY hard, and as you said, the performance improvements gained would probably not be enough to justify the risk of messing something up.

But hey, thought I would throw it out there. I know of a few scientific/math applications and shared libraries (even a few general purpose shared libraries) that have used hand tuned ASM for the bodies of their most performance intensive methods.

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Performance Bottlenecks
« Reply #7 on: February 08, 2012, 01:05:56 pm »
Yea, as I said it was a crazy suggestion, and most compilers these days are pretty good at optimizing at that level for you.
To some degree.  The compiler's first and only primary responsibility is making sure the generated binary/bytecode follows the same semantics as the source code.  That tends to get in the way :)
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline junkdog

  • Newbie Mark II
  • *
  • Posts: 12
Re: Performance Bottlenecks
« Reply #8 on: February 08, 2012, 02:13:11 pm »
This is an absolutely insane suggestion, but have you considered compiling the hyper-ultra performance sensitive code into assembly, hand tuning the assembly, and then using that assembly for the body of the method? (I think C# has a mechanism where you can give assembly code for a body of a method) Of course, the original code would still be there, commented out, so you can make changes there and redo the compilation, and then hand tuning again.

Ugh... wouldn't that imply a second subset of assembly for the macs as well? Or are they all running x86 arch by now?

Are there any known tweaks for improving performance when running under linux/Wine? FPS is reduced to a crawl whenever a battle takes place with many ships (>2k) - whereas I've never experienced FPS drops under windows. The load of the CPU core running the main thread rarely exceeds 70-80% - I never checked the GPU load, though I've noticed that AIW never raised my GPU temp under windows.

I read in an older thread that performance characteristics varied across different linux installations, so maybe it boils down to trial and error with a bit of luck. All in all, I was surprised to get it working under linux - I assumed that Unity would be more problematic in concert with wine.
« Last Edit: February 08, 2012, 02:14:44 pm by junkdog »

Offline orzelek

  • Hero Member Mark III
  • *****
  • Posts: 1,096
Re: Performance Bottlenecks
« Reply #9 on: February 08, 2012, 02:19:22 pm »
Actually if you are stuck on doing large amounts of simple math you could try to look in other direction.

Try to optimize code by preparing it in a way that would fit SIMD instructions from various extensions. I'm not sure how good is compiler at detecting this.. but thats one of not many methods known to me to accelerate large amounts of simple processing.

Other way I know has to do with data localization.. but I'm not sure even if something can be done with this in C# (basically data needs to be co-located in memory so that cache hits are optimized since all data required for calculation is already in cache).

Offline TechSY730

  • Core Member Mark V
  • *****
  • Posts: 4,570
Re: Performance Bottlenecks
« Reply #10 on: February 08, 2012, 03:54:04 pm »
Try to optimize code by preparing it in a way that would fit SIMD instructions from various extensions. I'm not sure how good is compiler at detecting this.. but thats one of not many methods known to me to accelerate large amounts of simple processing.

That's a great idea, but its rather brittle. Unless the compiler gives explicit ways of telling it to use SIMD instructions or to at least consider using SIMD instructions, you are basically relying on assumptions about the compiler that may not hold next release, or if you switch to another compiler, or how the compiler feels that day, etc.
Still making it easier for the compiler to see something could be done with a special instruction from an extension may not be a bad idea.

Other way I know has to do with data localization.. but I'm not sure even if something can be done with this in C# (basically data needs to be co-located in memory so that cache hits are optimized since all data required for calculation is already in cache).

I think C# is like most modern languages, there are no guarantees about how your high level code is ordered and/or addressed in the final machine code.
It is perfectly valid for two variables in the same scope that are right next to each other in the source code to be separated by like several megabytes of stuff in the final machine code.
Now of course, most sane compilers won't do this kind of nonsense, and will strive to keep things that are frequently accessed together near each other.

This is VERY much unlike C, where there are some good guarantees about how the code will compile into machine code, including things like ordering and addressing.

EDIT: Clarification, there are no guarantees about ordering short of the minimum necessary to implement proper initialization order that the high level language mandates.
« Last Edit: February 08, 2012, 03:57:13 pm by techsy730 »

Offline orzelek

  • Hero Member Mark III
  • *****
  • Posts: 1,096
Re: Performance Bottlenecks
« Reply #11 on: February 08, 2012, 05:25:56 pm »
Thats why both of this ways can be quite difficult to use - especially in C#. Also performance gain may even vary by CPU type and compiler as you said.
But these two ways can be fun to figure out/discover if and how it can be done :D

(Also doing silly things like lists of small arrays (instead of plain lists) with special small structures containing for example only data needed for targeting and/or distance calculations maybe interesting to try if you have time to play around :D)

Offline Hearteater

  • Core Member
  • *****
  • Posts: 2,334
Re: Performance Bottlenecks
« Reply #12 on: February 08, 2012, 07:01:51 pm »
Maybe this thread should be re-titled: "How to get GUDare's games to work"

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Performance Bottlenecks
« Reply #13 on: February 08, 2012, 07:07:53 pm »
Maybe this thread should be re-titled: "How to get GUDare's games to work"
I'm seriously considering adding a "Ultra Low" (half of Low) option to Unit Cap Scale, just for folks who really like 8xHW games.
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Wanderer

  • Master Member Mark II
  • *****
  • Posts: 1,579
  • If you're not drunk you're doing it wrong.
Re: Performance Bottlenecks
« Reply #14 on: February 08, 2012, 07:25:19 pm »
Maybe this thread should be re-titled: "How to get GUDare's games to work"

BWAHAHAHAH, now now, that's not fair!  I just merely took on the challenge!  ;)
... and then we'll have cake.