Author Topic: Performance Bottlenecks (Read 3480 times)

Farr · « **Reply #15 on:** February 08, 2012, 09:39:35 pm »

Had to register to reply to this thread

I'm not sure if you can do this with C#, but have you considered offloading some of the calculations to the GPU? As mentioned earlier, I think AI War under utilizes newer graphics cards. Graphics cards which are mighty good at matrix operations and collision operations.

Just a thought.

TechSY730 · « **Reply #16 on:** February 09, 2012, 12:23:59 am »

Quote from: Farr on February 08, 2012, 09:39:35 pm

Had to register to reply to this thread

I'm not sure if you can do this with C#, but have you considered offloading some of the calculations to the GPU? As mentioned earlier, I think AI War under utilizes newer graphics cards. Graphics cards which are mighty good at matrix operations and collision operations.

Just a thought.

If they can figure out a way to pull that off without adding a dependence graphic cards of a certain ability, graphic cards of a certain brand, or heck, a requirement for a graphic card in general, sure why not.

Not sure how you can try to code such that it will take advantage of a nice graphics card for non graphics related math, get reliable and consitent results, even across different cards and platforms, and still able to fall back to the CPU if no eligible graphics card can be found.

Farr · « **Reply #17 on:** February 09, 2012, 01:50:55 am »

Yeah, it would depend on the graphics card on how much you can offload. For example CUDA (http://en.wikipedia.org/wiki/CUDA) is NVIDIA's solution to this. Different NVIDIA graphics cards support different revisions of CUDA. So the code would need to check which type of card you have, then just offload the specific processing supported. (Otherwise it would default back to using the standard methods on the CPU).

I think ATI has something equivalent called ATI Stream, but I can't find much on it.

Eternaly_Lost · « **Reply #18 on:** February 09, 2012, 07:11:52 am »

Quote from: Farr on February 09, 2012, 01:50:55 am

Yeah, it would depend on the graphics card on how much you can offload. For example CUDA (http://en.wikipedia.org/wiki/CUDA) is NVIDIA's solution to this. Different NVIDIA graphics cards support different revisions of CUDA. So the code would need to check which type of card you have, then just offload the specific processing supported. (Otherwise it would default back to using the standard methods on the CPU).

I think ATI has something equivalent called ATI Stream, but I can't find much on it.

The other biggest hurdle to CUDA is how much each data unit is independent, and how well it can run on the limited instruction set that CUDA gives. Well I would like to assume that you could run a few thousand ships each as their own thread on CUDA, so you can easily do them in batches of several hundred at the same time, so rather then loop over all the ships in one thread, each ship get a thread, CUDA really does not like it when it has to worry about what the other threads on it are doing from what my friend who both wrote and maintains a Password Cracker that runs in CUDA told me about it. I am not so sure that ships could be handled in the same way, as they have to worry about running into each other.

Mánagarmr · « **Reply #19 on:** February 09, 2012, 09:37:44 am »

Chris already explained this in another thread. GPUs are great at crunching floating point maths, but not so much for integers. Most maths used in AI War is integer arithmetics and as such would not benefit being offloaded to the GPU. Graphics on the other hand is mostly floating point and is therefore excellent for a GPU.

TechSY730 · « **Reply #20 on:** February 09, 2012, 10:22:47 am »

Quote from: Moonshine Fox on February 09, 2012, 09:37:44 am

Chris already explained this in another thread. GPUs are great at crunching floating point maths, but not so much for integers. Most maths used in AI War is integer arithmetics and as such would not benefit being offloaded to the GPU. Graphics on the other hand is mostly floating point and is therefore excellent for a GPU.

True, but GPUs are also great at matrix operations, though I don't know how good they are at integer matrix operations. That is, do they expect matrices of float points, or can they use matrices of integers and be consistent with the integer arithmetic of the CPU?

keith.lamothe · « **Reply #21 on:** February 09, 2012, 10:27:51 am »

Offloading computations to the gpu, among other issues, has similar problems to offloading computations to another thread: we can't guarantee order of operations, and thus multiplayer desyncs would be inevitable.

TechSY730 · « **Reply #22 on:** February 09, 2012, 10:38:05 am »

Quote from: keith.lamothe on February 09, 2012, 10:27:51 am

Offloading computations to the gpu, among other issues, has similar problems to offloading computations to another thread: we can't guarantee order of operations, and thus multiplayer desyncs would be inevitable.

Does it matter what the order of operations is, as long as:
1. Dependency of operations is satisfied (if B depends on a complete answer for A, code to solve B will not even start in any thread until A is done, or at least the part of B that needs A will wait until A completes)
2. Consistency of operations is satisfied (if A has multiple right answers, then no matter how the code parallelizes it, offloads it to other thread, offloads to other devices, etc, it will not only return a right result for A, it will return the same result for A, even across different platforms or different timings.)
3. Consistency even among cross-dependencies (lets say parts of A rely on some computations made during the solving parts of B, and vice versa. Can not only the right answer for both A and B, but also a consistent answer for A and B, be assured acrross different platforms or different timings?)
4. Guarantee of completeness before batch submit (If the network packet for synchronization among players depends on A1, A2,...,An being complete and satisfying all the conditions above, then it will wait to send the batch until it is so)

If you can assure that these hold, I don't see why desyncs are an issue, even in the light of different platforms or threading models.

Of course, these guarantees can be tricky in single threaded code, and can become extremely messy in multi-threaded code. The requirement for not just correctness (a right answer), but consistency (the same right answer) is what throws a "monkey wrench" into many of the classic parallelism techniques.

keith.lamothe · « **Reply #23 on:** February 09, 2012, 10:54:54 am »

If ship A runs collision detection before ship B on one machine, and ship B runs collision detection before ship A on another machine in the same game, a desync is very likely inevitable.

TechSY730 · « **Reply #24 on:** February 09, 2012, 11:06:57 am »

Just as a clarification, there are four main AI war threads right?

1. The main thread: The primary thread that runs the main method. It is responsible for maintaining program flow, and controlling and monitoring at a high level the UI (Runs on each machine, but by definition, the stuff it handles only matters to the exact computer, it doesn't need much multi-player synchronization. Managing game state happens in a different thread)
2. The simulation thread: Where the "magic" happens; the main simulation loop. (This is the only thread whose computations must be consistent among all machines in a multi-player environment)
3. The AI thread: As the name would imply, this is the thread that runs the code associated with the AI, effectively allowing the AI to run independantly of the main simulation, just like the player. (Notice that this implies that the AI can abuse pause mode, if the option to allow giving orders during pause is on, just as well as you can. In fact, if I am reading the debugging information right, they do

) (This thread only runs on the host, and only the final orders made by the AI need to be communicated with the other players, which is handled in the simulation thread)
4. The rendering thread: As the name would imply, this is the thread that is responsible for drawing the in-game graphics, including managing graphics tasks to be done by the GPU, if any. (This thread is run by each computer, but becuase other players don't care about exactly what other computers display, it doesn't need to be syncrhonized among clients.)

There may be other threads, like one for sound or one for networking. And of course there are ones spawned by the underlying libraries that we don't care about, as those are implementation details.

The nice thing about this separation is that very little cross-communication needs to happen across these different threads. Sure, one way reads are common, like the graphics thread reading the state of the game managed by the simulation thread so it knows what to draw, the main thread from the simulation thread so it knows what objects to show in dialogs, the AI thread needs to read the game state managed by the simulation thread so it know what the flip is going on with its units, the simulation thread needs to listen for commands advertised by the main thread (which handles UI which is how players give commands) and the AI thread (so the AI can issue orders), and the main thread needs to keep "track" of the other threads so that it knows how to manage them or react to their state changes.

However, these types of communication are rather trivial, and don't really introduce too terribly many synchronization issues. Oh sure, sometimes you can get conditions where a thread tries to read stuff that is not guaranteed to be ready for consumption yet, and thus introduce a race condition, and I think there have been bugs like that before, but for the most part these are pretty easy communication to deal with in both a correct and consistent basis.

Much more difficult are threads that need lots of communication between them, like non-trivially parallelizable tasks. That is where it gets nasty to ensure correctness and consistency.

TechSY730 · « **Reply #25 on:** February 09, 2012, 11:19:27 am »

Quote from: keith.lamothe on February 09, 2012, 10:54:54 am

If ship A runs collision detection before ship B on one machine, and ship B runs collision detection before ship A on another machine in the same game, a desync is very likely inevitable.

That is covered by the cross-dependency case I mentioned. The position and collision resolving of ship A depends on the position and collision resolving of ship B, and vice versa. These are the nasty cases to ensure consistency on, though not in all cases impossible.
Basically, can you find a way such that regardless if ship A or ship B is processed first, the same result can be obtained? It may be possible, but I don't know. There have been computation problems involving cross-dependencies such as this such that no way to separate the two tasks, make no assumptions about ordering of the two tasks, and still come out with the same result. However, there have been some computation problems where you can.
I am not well versed enough to figure out which one the ship collision problem case falls under.

Though I do have an idea. The collisions of ships on one planet have NO bearing in either direction on ships on a different planet, right? What if you had N threads, where the different threads could work on collisions on different planets? Of course making sure that no two threads are working on the same planet at the same time. Care would have to be taken though that every computer chooses the same planets for their threads and all computers chose the same number of planets across every network communication, and also pacing is consistent. For example, if thread A chose planet 1 and paced itself to only 500 ships on that planet, and thread B chose planet 2 and paced itself to only 100 ships on that planet, then regardless of how the other computers get it done, all the other computers better have dealt with those same 500 ships on planet 1, and the same 100 ships on planet 2. It doesn't matter how (the second computer could be running one thread for collisions, or finished processing a different planet first, or had their thread A work on planet 2 and their thread B work on planet 1), as long as the same planets and the same ships are handled before the next network sync. Order of ships whiten a single planet being handled also matters, but that is already handled by the existing logic which would just be run by each thread.
Now of course this all falls apart if the collisions on one planet can influence the ships on a different planet.

keith.lamothe · « **Reply #26 on:** February 09, 2012, 11:32:48 am »

There are 2 main threads:

1 does the simulation, input, and graphics. Before you ask: splitting the graphics out would require it to have its own copy of a large part of the game data, and we're already pushing the memory wall enough without pole-vaulting it like that

The other does (most) AI decision making, and has its own copy of (a subset of) the game data to do so without causing cross-thread computation dependencies, etc. Basically all the AI thread does is issue orders in roughly the same way a human player does, rather than directly impacting the sim at all.

And there's a couple extras for sound and networking, yes, but those are trivial in load and not at all related to the sim.

Quote from: techsy730 on February 09, 2012, 11:19:27 am

Basically, can you find a way such that regardless if ship A or ship B is processed first, the same result can be obtained?

No, that's not really possible, no. Perhaps in the collision case it could be done at the expense of a lot of ram for intermediate results, but it would also have to be done for movement, getting-target-lists, firing, etc, etc. The code would get a lot harder to understand and maintain, and the memory costs would balloon, without even getting into the memory costs of having copies of the main data for other threads to use.

Quote

Though I do have an idea. The collisions of ships on one planet have NO bearing in either direction on ships on a different planet, right?

No, that's not true. If a zenith autobomb collides with an Interplanetary Munitions Booster and kills it, that impacts calculations on bordering planets. There are various other examples

TechSY730 · « **Reply #27 on:** February 09, 2012, 11:46:42 am »

Quote from: keith.lamothe on February 09, 2012, 11:32:48 am

There are 2 main threads:

1 does the simulation, input, and graphics. Before you ask: splitting the graphics out would require it to have its own copy of a large part of the game data, and we're already pushing the memory wall enough without pole-vaulting it like that
The other does (most) AI decision making, and has its own copy of (a subset of) the game data to do so without causing cross-thread computation dependencies, etc. Basically all the AI thread does is issue orders in roughly the same way a human player does, rather than directly impacting the sim at all.

And there's a couple extras for sound and networking, yes, but those are trivial in load and not at all related to the sim.

So the simulation thread handles the input handling, graphics, and the simulation. Makes sense.

Quote from: keith.lamothe on February 09, 2012, 11:32:48 am

Quote from: techsy730 on February 09, 2012, 11:19:27 am
Basically, can you find a way such that regardless if ship A or ship B is processed first, the same result can be obtained?
No, that's not really possible, no. Perhaps in the collision case it could be done at the expense of a lot of ram for intermediate results, but it would also have to be done for movement, getting-target-lists, firing, etc, etc. The code would get a lot harder to understand and maintain, and the memory costs would balloon, without even getting into the memory costs of having copies of the main data for other threads to use.

Drat. It will be nice once more research on maintaining correctness and consistency on cross-dependent calculations without ordering guarantees comes along, and then better tools to help deal with it (keeping it all straight in your head and by hand is NASTY), and a ways (by tools or better education/reference materials) to help recognize the sub-set of sets of cross-dependent calculations that are impossible to do this and still maintain correctness and/or consistency.

Quote from: keith.lamothe on February 09, 2012, 11:32:48 am

Quote from: techsy730 on February 09, 2012, 11:19:27 am
Though I do have an idea. The collisions of ships on one planet have NO bearing in either direction on ships on a different planet, right?
No, that's not true. If a zenith autobomb collides with an Interplanetary Munitions Booster and kills it, that impacts calculations on bordering planets. There are various other examples

Double Drat. Well you can't fault a guy for trying to find independent sub-problems, one of the key ways to aid parallelizing tasks.

eRe4s3r · « **Reply #28 on:** February 09, 2012, 12:17:30 pm »

Quote from: keith.lamothe on February 09, 2012, 10:54:54 am

If ship A runs collision detection before ship B on one machine, and ship B runs collision detection before ship A on another machine in the same game, a desync is very likely inevitable.

There was this thing called Flowfield and it seems to entirely eliminate the collision bottleneck and desync problem, as you shouldn't check for "full-stop" collisions anyway, you should simply strife towards always have units "aim" for a non-collision without enforcing it specifically. Though i wonder how they did that specifically in supcom2, thats by far the only thing thats really cool about that game. I am guessing its done by having path calculation simply always aim for the path of least resistance, but should there be none they will simply take the path of lesser resistance until they merge into a line and ignore collision as long as the units in that line all move in the same direction (hence flow field) units moving opposite are a path block and pathed around because units moving against the flow have a higher weight than the ones moving with the flow.

One has to wonder how you check for weightings and assign them... sounds like fluid dynamics ;p

keith.lamothe · « **Reply #29 on:** February 09, 2012, 12:21:19 pm »

Collision detection was just the first example that came to mind (we don't do actual pathfinding except on the galaxy map, btw). Every bit of the sim has to happen in the same order; ship movement, ship firing, shots moving, shots hitting, ships being built, etc, etc. Not only in the same order within themselves, but also with respect to each other.

News:

Author Topic: Performance Bottlenecks (Read 3480 times)

Farr

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

Farr

Re: Performance Bottlenecks

Eternaly_Lost

Re: Performance Bottlenecks

Mánagarmr

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

keith.lamothe

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

keith.lamothe

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

keith.lamothe

Re: Performance Bottlenecks

TechSY730

Re: Performance Bottlenecks

eRe4s3r

Re: Performance Bottlenecks

keith.lamothe

Re: Performance Bottlenecks