Author Topic: Multi-core CPUs are not very well utilized for large fleets of ships.  (Read 12660 times)

Offline Captain Cake

  • Newbie Mark III
  • *
  • Posts: 29
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #15 on: February 17, 2010, 12:15:55 am »
Yea, maybe we should amend that to say "Oh, we're sorry, you'll generally have to content yourself with 2000 vs. 2000 in actual on-screen fights."  With a picture of 2000 footmen vs. 2000 grunts in Warcraft 2 strategically positioned nearby ;)

You gotta admit it is easily misinterpret, especially if you've never played the game before. I frequently have more ships in one system, mostly zoomed out so I'm only seeing a handful of ship indicators. It just sucks that theres a performance hit when really there shouldn't be.
I love criticism! Think I'm wrong or something? Let me know!

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #16 on: February 17, 2010, 12:31:27 am »
I do think it could be clearer that it doesn't mean 30k ships on screen at once, yes.
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Buttons840

  • Hero Member
  • *****
  • Posts: 559
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #17 on: February 17, 2010, 12:32:42 am »
If only the game were open source, then a nice "your more than welcome to implement the changes yourself" would be appropriate. :)

Offline TheSilverHammer

  • Newbie Mark III
  • *
  • Posts: 32
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #18 on: February 17, 2010, 06:25:46 am »
Ok, here is the saved game file, it only has 2900 or so ships that I have under my command.  You can see the slowdown by watching the enemy bullets fire and move in "pulses".  It is here where I notice the game pegging at roughly 25% CPU utilization.

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #19 on: February 17, 2010, 10:41:12 am »
Thanks for the save.  My observations:

1) a bit sluggish after loading, this is typical as the game is still loading a bunch of images and other resources from disk, and the AI is issuing a metric ton of commands to get back into the swing of things, etc.
2) after the initial sluggishness things seem pretty much fine, the 2000+v2000+ battle on a nearby planet ran smoothly (Chris has made some pretty aggressive performance gains since the last official release)
3) doesn't seem to be anything buggy affecting the performance
4) process explorer was showing AIWar as consuming pretty much the entirety of cores 1 and 3, though not much of 2 and 4
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #20 on: February 17, 2010, 11:51:06 am »
If only the game were open source, then a nice "your more than welcome to implement the changes yourself" would be appropriate. :)

In my opinion, if the game were open source people would have killed it by now.  Not enough programmers understand the type of AI that I've implemented (which is rather unique), and not enough are going to understand desyncs.  And the fact that a lot of very smart programmers also don't understand why the simulation can't be split out in a nonlinear fashion (it would make synchronized multiplayer impossible, which is not a concern for games like Quake 4) also would make it pretty much collapse under its own weight, I think.

This isn't a slight on anyone, but working on a game like AI War is far from a casual programming experience, and it is far too easy to break something unintentionally when making a change.  And programmers have to not only have a detailed knowledge of synced networking models and other technical things that most programmers (who have not already coded an RTS specifically) wouldn't have a reason to know, along with a detailed knowledge of the game design and code architecture itself, in order to make meaningful contributions.

When Keith first started working with Arcen, he already had about 8 months of play experience with AI War, so knew the game very well.  We then spent about 6 hours walking through the code and architecture, discussing the overall layout and how to avoid desyncs in this specific model, and so forth.  Then he started working, and the first week or two had a ton of questions as he was still getting up to speed.  Then, after that period, he was really ready to do more things on his own without my having to review every bit of code.  And I've trained half a dozen programmers on various software architectures in the past, so I can say with confidence that Keith was quite a quick study from my experience.

Most games that are multiplayer -- shooters, etc -- have a certain kind of complexity to their networking that I don't fully understand at the moment from a nuts-and-bolts level, but that tends to be very isolated in network classes from what I can tell, which is awesome.  In an RTS, as you can see from the linked article above, the networking requirements alone pervade the entire structure of the program.  This sort of problem is not limited to AI War -- I personally wouldn't want to go try and program on anyone else's RTS, either, as I'm sure I'd mess it up severely unless I took the needed time to get a really solid grasp on there code architecture.

Or, let me put it another way: at a lot of AAA studios making synced-network games, they have at least one network programmer who is well-versed in desync avoidance, etc, whose job it is to review all of the other professional programmers' work and then yell at them and fix their code when they inevitably introduce desyncs.

At any rate, at some point when I abandon AI War hopefully a decade or more from now, the game will become open source then.  But the only way it will survive any post-Arcen updates is if a few very knowledgeable programmers (about the game and programming in general) act as sort of quality control guardians, as happens with the Linux kernel.
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline TheSilverHammer

  • Newbie Mark III
  • *
  • Posts: 32
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #21 on: February 17, 2010, 02:23:09 pm »
Thanks for the save.  My observations:

1) a bit sluggish after loading, this is typical as the game is still loading a bunch of images and other resources from disk, and the AI is issuing a metric ton of commands to get back into the swing of things, etc.
2) after the initial sluggishness things seem pretty much fine, the 2000+v2000+ battle on a nearby planet ran smoothly (Chris has made some pretty aggressive performance gains since the last official release)
3) doesn't seem to be anything buggy affecting the performance
4) process explorer was showing AIWar as consuming pretty much the entirety of cores 1 and 3, though not much of 2 and 4


2000 v 2000 is fine. I thought I had like 2900+ there.  That is where it slows down, somewhere after 2000 but before 2900.   Now this will depend upon how fast your CPU is.  I have a quad core at 2.6 ghz, and if you have a dual core running at 3 or 4, you will have a much higher cap before you see a slow down.   The point is that one core is pegged, but 3 others are asleep.

That is all there is to it.  Originally I just wanted to know if you could parallelize your algorithms to take advantage of N cores.  If this is too hard, I understand.  It is just a suggestion, something for you to consider.  It may look daunting at first, but maybe such a change will not be so bad.   After all, what computer these days do NOT have at least 2 cores?  This would not be wasted effort for a select few.  4 cores / 8 (Hyper-threaded) is going to be common, and 8 cores (maybe 16 Hyper-threaded) is around the corner.  Raw CPU power (per core) is NOT going up anymore.  I think it is worth looking into.

Maybe this can be a longer term project. 

This game is really great, and a major part of it is large fleet battles.  This really is a bottle-neck I think would be worth working through.  Anyone making algorithms where performance is a major concern really must include the multi-core approach.  This is the future of computing.   Back in the good old days, the raw speed would just go up and up. 450 MHz, 900 MHz, 1 GHz, 2 GHz...  etc.   Those days are over.  The definition of "faster CPU" has been changed.   It is not longer clock cycles, but number of cores.

I think there are MANY reasons to rework algorithms to work on N cores.  It *is* different, and different is scary, however, the computing world is changing and you know what happens when you do not adapt?  Ask a dinosaur, they know what happens when you do not adapt.  I am not saying such will happen with your game or company, but it is something to keep in mind when you take the long view.

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #22 on: February 17, 2010, 02:46:36 pm »
I think there are MANY reasons to rework algorithms to work on N cores.  It *is* different, and different is scary, however, the computing world is changing and you know what happens when you do not adapt?  Ask a dinosaur, they know what happens when you do not adapt.  I am not saying such will happen with your game or company, but it is something to keep in mind when you take the long view.

It is more than different, is the point -- to our knowledge, this is a problem that no-one, anywhere, has ever solved for synchronized-multiplayer (i.e. RTS) games.  It works for Quake 4 and such because those are entirely different sorts of games.  It would even work for strategy games that have no multiplayer component.  But the problem of splitting a synchronous thread out amongst multiple cores is basically the holy grail of programming at the moment: IBM actually is offering a $1 million prize for the programmer/team that first figures out how to do this in a generalized below-the-programming-language way, for instance.

Obviously this is something that is much easier than what they are trying to do, as game structure can be adjusted on top of the programming language.  But it's still the sort of thing that, for most intents and purposes is incredibly difficult to the point of being well beyond what we can pursue.  The obvious place to split our simulation algorithm would be on a per-planet basis, but without even getting into the headaches that would cause with random-number-generation (which could be overcome), there are all sorts of thread synchronization issues that then crop up (thread synchronization takes time, often more time than you save by parallelism if you have a large amount of data that has to be synchronized).

Then, to compound this, you have the fact that performance typically only suffers when there are a ton of ships on one planet.  So that means that we can't really split on planets at all, and that means that in order to split for multithreading we'd have to use shared memory and simply a lot of locks.  That would, in turn, prevent the simulation from being deterministic, which breaks multiplayer.  And that could easily cause performance to tank, also.

In other words, you can't generalize about software architecture.  We are really really good at what we do, too -- we've already hit performance and scale benchmarks that no other RTS game in the industry comes close to in terms of simulation complexity, networking, etc.  If splitting RTS simulations across threads were feasible, we'd do it (or at least add it to the schedule to look at as part of a future expansion if time was not permitting).  I appreciate the suggestion, but it is very frustrating to me when people don't seem to listen to our explanations of why something is not possible; we aren't just trying to duck out of this because we're afraid of the issue or something, or because we don't know what we're doing or it hasn't occurred to use to use multithreading or something.  It's a very new area that isn't well adapted to certain kinds of programs, and AI War (and any other multiplayer RTS) is a prime example of that sort.

Cluster-based writes to relational databases are another sort of application where the best and brightest minds around the world are having a hard time getting that to work with any reliability.  Most of them use log shipping instead, or a single write master and multiple read sources, or similar.  Or they use vertical partitioning, which would basically be like us splitting per-planet, or in some cases they even use horizontal partitioning of single tables, which only works when the items in that table aren't interconnected in any way.  And so forth.  In other words, a small relational database that is single-threaded can't scale up at all if there are too many interrelationships in its reads and writes in individual tables or groups of tables.  That's no problem for giant ecommerce stores, etc.  But it puts a finite cap on using RDBs for things like SqlLitebot and such.  Different sort of problem from the AI War side of things, but same general underlying causes: the need for synchronicity amongst many interrelated read/write pairs.
« Last Edit: February 17, 2010, 02:54:45 pm by x4000 »
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline keith.lamothe

  • Arcen Games Staff
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,505
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #23 on: February 17, 2010, 03:13:56 pm »
I kept thinking I was forgetting something fairly fundamental  that makes multi-threading much more than a performance hassle, and Chris just pointed it out.

To boil down one of the big bugaboos:

Each player's simulation thread has to start with the same random seed and use exactly the same number of random items each cycle and use each random "index" for exactly the same purpose as its sibling simulations are using it.
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #24 on: February 17, 2010, 03:17:22 pm »
Each player's simulation thread has to start with the same random seed and use exactly the same number of random items each cycle and use each random "index" for exactly the same purpose as its sibling simulations are using it.

For more information on this approach, if anyone is interested, this article by the developers of the original Age of Empires game is still quite relevant and topical, and is basically still the go-to article for most people in the gaming industry for how to handle synced network multiplayer.
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Buttons840

  • Hero Member
  • *****
  • Posts: 559
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #25 on: February 17, 2010, 04:03:33 pm »
Great article.

Offline RCIX

  • Core Member Mark II
  • *****
  • Posts: 2,808
  • Avatar credit goes to Spookypatrol on League forum
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #26 on: February 17, 2010, 08:12:23 pm »
Ok, this is someone speaking as someone not familiar with threading problems but has enough knowledge to understand how hard some of them are.

Why can't you split the tasks by kind? as in, have one thread processing one type of thing and another processing another, so that you have a minimum of data to share between threads.
Avid League player and apparently back from the dead!

If we weren't going for your money, you wouldn't have gotten as much value for it!

Oh, wait... *causation loop detonates*

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #27 on: February 17, 2010, 08:19:30 pm »
Why can't you split the tasks by kind? as in, have one thread processing one type of thing and another processing another, so that you have a minimum of data to share between threads.

That questions basically assumes the associative property applies, which it does not.  In other words, if you reorder the operations in the simulation, you get a subtly different result per frame -- and those subtle differences later add up to major differences.  Just calling Random one extra time, or in a different order in one location, leads to a wholly different result, for instance.

Or, take another example: moving ship A before ship B can lead to different collision detection results, or to different range check results, which then leads to a different simulation.

The examples are nearly endless, honestly, as to which calculations affect which other calculations based on when they occur.  It's all dependent on the current state of the simulation at the time the calculation is run, since it's all hugely multivariate, and so if you change an earlier step even a little that can have far-reaching effects on the end result of your calculations.
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline RCIX

  • Core Member Mark II
  • *****
  • Posts: 2,808
  • Avatar credit goes to Spookypatrol on League forum
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #28 on: February 17, 2010, 08:37:03 pm »
Why can't you split the tasks by kind? as in, have one thread processing one type of thing and another processing another, so that you have a minimum of data to share between threads.

That questions basically assumes the associative property applies, which it does not.  In other words, if you reorder the operations in the simulation, you get a subtly different result per frame -- and those subtle differences later add up to major differences.  Just calling Random one extra time, or in a different order in one location, leads to a wholly different result, for instance.

Or, take another example: moving ship A before ship B can lead to different collision detection results, or to different range check results, which then leads to a different simulation.

The examples are nearly endless, honestly, as to which calculations affect which other calculations based on when they occur.  It's all dependent on the current state of the simulation at the time the calculation is run, since it's all hugely multivariate, and so if you change an earlier step even a little that can have far-reaching effects on the end result of your calculations.
This is why i like writing single player games :)
Avid League player and apparently back from the dead!

If we weren't going for your money, you wouldn't have gotten as much value for it!

Oh, wait... *causation loop detonates*

Offline Black

  • Full Member
  • ***
  • Posts: 107
Re: Multi-core CPUs are not very well utilized for large fleets of ships.
« Reply #29 on: February 17, 2010, 08:57:45 pm »
Why can't you split the tasks by kind? as in, have one thread processing one type of thing and another processing another, so that you have a minimum of data to share between threads.

That questions basically assumes the associative property applies, which it does not.

...

I think you mean commutativity, although associativity may fail if you interpret grouping of terms as execution on a single thread. Than not only AB != BA, (AB)(C) != (A)(BC)