Author Topic: [Solved] Game Stop at 1:02  (Read 10366 times)

Offline Spikey00

  • Lord of just 5 Colony Ships
  • Master Member Mark II
  • *****
  • Posts: 1,704
  • And he sayeth to sea worm, thou shalt wriggle
[Solved] Game Stop at 1:02
« on: November 11, 2009, 05:46:29 pm »
Game Stopping at 1:02 (Waiting for Players)
Vaos and I-KP have tried playing together, and later on I joined them.

The phenomenon explained is that in all games, the game halted at 1:02, exactly a minute and two seconds.  When I joined in, it still stopped at 1:02, but separately with Vaos we did not experience the same problem (meaning only with I-KP we had the problem).

Same game versions, expansion disabled in the lobby.


No crash, just "Waiting for Players".  I have no personal input on why this may be happening.
« Last Edit: November 13, 2009, 11:06:31 pm by x4000 »
I'd take a sea worm any time over a hundred emotionless spinning carriers.
irc.appliedirc.com / #aiwar
AI War Facebook
AI War Steam Group

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Game Stop at 1:02
« Reply #1 on: November 11, 2009, 05:49:25 pm »
Please see this thread, and let me know if you have any more info, if you don't mind.
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Spikey00

  • Lord of just 5 Colony Ships
  • Master Member Mark II
  • *****
  • Posts: 1,704
  • And he sayeth to sea worm, thou shalt wriggle
Re: Game Stop at 1:02
« Reply #2 on: November 11, 2009, 05:57:52 pm »
All I can say is that there were at least two games attempted by Vaos and I-KP, both ending up with this issue.  One game with me that also incurred this problem.

Full Hamachi was used; very early in the game (1:02).


Vaos + Me = No issue.
Vaos + I-KP = Problem.
Vaos + I-KP + Me = Problem.

Everything was fine until 1:02 (at least with my game with them both).
I'd take a sea worm any time over a hundred emotionless spinning carriers.
irc.appliedirc.com / #aiwar
AI War Facebook
AI War Steam Group

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Game Stop at 1:02
« Reply #3 on: November 11, 2009, 05:59:36 pm »
Without more information, there is absolutely nothing I can do, unfortunately.  I need to know if there were errors logged on one of your machines, or if there was funky happenstance with the networking (via the Shift+F3 method).  As it is, I don't have any actionable information, unfortunately.
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Vaos

  • Full Member
  • ***
  • Posts: 107
Re: Game Stop at 1:02
« Reply #4 on: November 11, 2009, 06:04:05 pm »
You want a recording of the Shift-F3 menu from start to the end? We're using 2.001D by the way.

Edit : I recorded from when we started the game to the stop. Encoding/Uploading that.
« Last Edit: November 11, 2009, 06:17:27 pm by Vaos »

Offline Vaos

  • Full Member
  • ***
  • Posts: 107
Re: Game Stop at 1:02
« Reply #5 on: November 11, 2009, 06:32:17 pm »
Here is the video. Link (Save as)

Offline Spikey00

  • Lord of just 5 Colony Ships
  • Master Member Mark II
  • *****
  • Posts: 1,704
  • And he sayeth to sea worm, thou shalt wriggle
Re: Game Stop at 1:02
« Reply #6 on: November 11, 2009, 06:37:33 pm »
Mysteriously, all three of us could still chat in-game... There are no error logs because the game didn't crash.

Along with Vaos' video, here are a few screenshots of another game (same result, same players).


http://img5.imageshack.us/g/aiwar2009111116055887.png/
I'd take a sea worm any time over a hundred emotionless spinning carriers.
irc.appliedirc.com / #aiwar
AI War Facebook
AI War Steam Group

Offline I-KP

  • Hero Member
  • *****
  • Posts: 681
  • Caveat Pactor
Re: Game Stop at 1:02
« Reply #7 on: November 11, 2009, 07:06:38 pm »
What isn't shown on the above screenies and video (both greatfully compiled by the erstwhile 1-minute-duel compatriots, V and S) is that from my point of view everything ran fine, no lag and played almost as if locally (asside from the last 1 minute game with S which did lag considerably), but once the game paused with "Waiting For Players..." at the 62 second mark the 'Withheld Messages' count climbed up to about 30, then the 'Stored Messages' count climbed to about 20, and then the server was reset by the host.  This event was repeatable pretty much to the second.

HTH
« Last Edit: November 11, 2009, 07:08:24 pm by I-KP »
Atmospheric & Lithospheric Reticulator,
Post-accretion Protoplanet Aesthetic Seeding Team,
Celestial Body Design & Procurement Division,
Magrathea Pan-Galactic Planets Corp.,
Magrathea.

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Game Stop at 1:02
« Reply #8 on: November 12, 2009, 12:15:55 am »
Okay, so what is going on is basically the same issue as here:  http://arcengames.com/forums/index.php/topic,2096.0.html and here: http://arcengames.com/forums/index.php/topic,2023.0.html

The timing of this is consistent at 1:02 or so because that's when the AI starts first issuing any commands, and when it issues a significant number of commands that is causing someone's network card, router, or otherwise to get overloaded and drop critical information that stalls the game.  Other than these three cases, I've not heard of this issue elsewhere -- other players play under much more trying conditions (cross-continent with more players) without incident, so it's not something that's simply a bug with the game.  It's possible there is an issue with the network library, but I've been through that code (it's open source), and I've also talked to the library author, and it really seems unlikely.

My general comments here are the same as from the other threads:

I've been emailing with the author of the network library, and he had this to say initially:

Quote
It's hard to say what the problem is; but if withheld packets are growing, it means either an earlier packet is not coming thru for some reason, or, possibly more likely, that an acknowledge isn't coming thru in the other direction.

Let's assume A is communicating with B; and suddenly the host at A is starting to queue up withheld messages. This could mean that B has lost the ability to successfully send messages (such as acknowledges) to A, but A can still send messages to B. It will mean tho, that both peers will time out their connection after a while. This may be the result of one host switching IP or losing router information when behind NAT.

It should be easy to check if this is really the case (unidirectionally failure in the "link") - if other packet is coming thru it's not the case.

Any kind of exception in the library could also be a problem; skipping the acknowledge part or something - but the log should show if this is the case.

So, this pretty much lines up with what I was thinking, except that it could also be that ACKs from the client are not making it back to the server (for the transmission to be complete, the packet has to not only be sent, but it has to be acknowledged as received, given that this is an "unreliable" protocol).

I checked out your video again to see if it looks like the client has stopped sending ACKs to the server, but the sent and received message/packet count keeps climbing for the server, even while the Withheld message count also climbs.  I'm still talking with the network library author, but so far this is pretty puzzling especially since you don't have any error messages logged.  Can you be sure and check on both the client and the server machines to make sure there is nothing at all being logged?  I know you checked the server, but if the client is the one with some sort of exception occurring, then there would be an error on the client side only.  Be sure to check the Game Data Directory in the Settings window on both machines, since the location of where those errors would be logged might be different between the two machines, especially if you are on two different OSes.

It is still looking like a router is the most likely culprit, but it's hard to be certain of much at the moment.  I'll let you know when I hear back from the network library author again.

So here's what to try:

I wonder about QoS logic on the routers, or a bad driver for one of the network cards.

Generally speaking, the first thing I would do with that sort of thing would be to make sure that the network drivers for the cards on both computers are up to date.  You might take a look at the router configuration and see if there is some sort of application-level content filtering going on, or if it has QoS services turned on (common when on a VOIP-enabling router, like those from Vonage).  If so, you might try turning those features off to see if that helps. 

You might also try looking to see if there is a later firmware version for the routers (often that is simple to update, but there may be risks that the manufacturer will inform you of).  Sometimes that can make for an improvement of your networking in general, as bugs do happen even on router firmware, and that would potentially resolve the issue here if it's related to outdated router firmware.  I've never heard of that issue happening with AI War before, but I have heard of it being an issue with certain other games and software.  Ironically, in the links for the first two games there, router firmware updates didn't solve the problems, but often it does, and hopefully it will here.

My best bet in your specific case is that there network card or router on the host is the problem, but it's a good idea to check those our for all the participants to make sure.  A few other notes for your specific case:

1. Chat uses a different sort of sending, so wouldn't be affected by the main game logic being held up.

2. The fact that this happens with three players but not with two is again indicative of the host router/NIC being the issue.  Basically, twice as much data has to be sent when you have two clients instead of one, and the router or NIC is dropping some data in that case it seems.

It's definitely below the level of the game itself, and I'm almost positive it is also below the level of the network library used by the game, as well.  Given that the people in the other two threads never got back to me telling me if their problem was resolved or not, I can't be sure.  But this sort of thing happens with other games than just AI War, and it's usually outside the game itself (supported by having hundreds of players playing successfully, but a handful getting failures that seem below the network library level).  Hopefully the driver or firmware updates will fix this up for you guys, and you'll let me know, and then I'll have a definite answer on this (and you guys will be able to play).  If not, we'll see what we can figure out from there, but that's where my money is at the moment, so to speak.

Hope that helps!
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline I-KP

  • Hero Member
  • *****
  • Posts: 681
  • Caveat Pactor
Re: Game Stop at 1:02
« Reply #9 on: November 12, 2009, 06:49:12 am »
Cheers for the swift and informative reply.

My router is a rock steady Netgear DG384 (old but reliable) and is running in latest firmware.  I have no issue with any other kind of connection for games (or otherwise) be that via VPN, direct IP, in the wild or wherever.  Hamachi is latest version and seems to connect okay for other things.

I'll try to connect to someone else's game to rule out a possible singular anomaly at some point soon (via IRC) but there is nothing more I can do my side to address what this might be.
Atmospheric & Lithospheric Reticulator,
Post-accretion Protoplanet Aesthetic Seeding Team,
Celestial Body Design & Procurement Division,
Magrathea Pan-Galactic Planets Corp.,
Magrathea.

Offline I-KP

  • Hero Member
  • *****
  • Posts: 681
  • Caveat Pactor
Re: Game Stop at 1:02
« Reply #10 on: November 12, 2009, 06:56:21 am »
Incidentally, I am playing Borderlands at the moment (one of the listed possible games where this kind of error can occur) with no issue.
Atmospheric & Lithospheric Reticulator,
Post-accretion Protoplanet Aesthetic Seeding Team,
Celestial Body Design & Procurement Division,
Magrathea Pan-Galactic Planets Corp.,
Magrathea.

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Game Stop at 1:02
« Reply #11 on: November 12, 2009, 09:34:39 am »
No problem.

Bear in mind that the network requirements for most games are lower than AI War -- many of them can be played over non-broadband connections or close, which is not the case for AI War of course.  There's a lot more data being transmitted for AI War during spikes in particular (as with the one that is causing the foulup here).  Borderlands or other action-oriented games use a different style of networking where basically there is a long, constant stream of a smaller amount of data.  Most FPSes are not deterministic, so that's where you get players that can seem to "jump" from place to place during heavy lag, for instance.

Any RTS game is typically a deterministic affair, which means that all of the computers involved are running the relevant portions of the simulation in lock-step, and if one gets out of sync they all have to stop.  This also keeps the amount of data being passed around lower in general, compared to how much is changing in the game simulations.  Given the increase in size with AI War (games that are typically 10x to 100x larger than any other RTS I'm aware of), the load on the network is attendantly larger --  we combat that with compression and other tricks that makes it often only 2x to 4x more load on the network, but still it is a ton of data.

All of this background, above, is basically to explain why it's apples-to-oranges with a comparison to Borderlands or another game.  Basically, near as I can tell AI War is in a league of its own when it comes to the amount of data being passed in a multiplayer game, and so there is a decent chance of exposing an underlying issue with the network via this game, that is not exposed via other "more intense" games, like Borderlands.  Bear in mind that while Borderlands is more graphically intense, AI War's network needs is probably the more intense of the two.

It's not definite that the host is the problem, that was just the most likely candidate -- having a larger game means the networking load is attendantly greater on the clients, as well, though not nearly doubled.  So it's possible that a driver or router issue on one of the clients is the issue.  As the host, you can see which client it is waiting on (it says "WAIT" next to their name down in the scores area). If it's waiting on all clients, most likely the issue is with the host itself.  The screenshots were from a non-host machine, so I can't pinpoint it there.

Hope that helps...
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline I-KP

  • Hero Member
  • *****
  • Posts: 681
  • Caveat Pactor
Re: Game Stop at 1:02
« Reply #12 on: November 12, 2009, 10:55:01 am »
I only mentioned Borderlands because it was cited as being a candidate for this kind of problem in one of your quoted replies.

Having played Supreme Commander over the net without any trouble (aside from local performance issues - it is something of a monster after all) I'm not really seeing much in the way of evidence to suggest that my connection is now somehow incapable of handling heavy loads.  With the router running up to scratch and all other net traffic activities performing without fault I'm not really sure what else there is to try.

I'll still try and join a game with someone other than V and see how that goes.
Atmospheric & Lithospheric Reticulator,
Post-accretion Protoplanet Aesthetic Seeding Team,
Celestial Body Design & Procurement Division,
Magrathea Pan-Galactic Planets Corp.,
Magrathea.

Offline x4000

  • Chris McElligott Park, Arcen Founder and Lead Dev
  • Arcen Staff
  • Zenith Council Member Mark III
  • *****
  • Posts: 31,651
Re: Game Stop at 1:02
« Reply #13 on: November 12, 2009, 11:31:20 am »
I only mentioned Borderlands because it was cited as being a candidate for this kind of problem in one of your quoted replies.

Fair enough -- I'm not trying to argue with you.  What I'm saying is that while a number of other games also have this sort of problem under lesser load, the fact that they don't for any individual is not necessarily indicative that there are no problems, given the differing scale with AI War.

Having played Supreme Commander over the net without any trouble (aside from local performance issues - it is something of a monster after all) I'm not really seeing much in the way of evidence to suggest that my connection is now somehow incapable of handling heavy loads.  With the router running up to scratch and all other net traffic activities performing without fault I'm not really sure what else there is to try.

From a networking standpoint, SupCom passes a lot less data than AI War; it is a beast, for sure, but it was the game I was playing most immediately prior to creating AI War, and I've seen the difference first hand.  However, I'm also not suggesting that your network connection is incapable of handling heavy loads -- far from it.

What I'm saying is, when AI War's networking library dumps a lot of data into your network adapter, not all of that data is making it to the other end, and/or the other end isn't sending acks back.  So this may not be your issue at all, it may be the other end has a router or network card issue that is preventing those acks.  Some possibilities:

1. A firewall, QoS service on a router, or some other software sees the huge spike in transfer traffic, and thus smothers it.  QoS can be implemented in a variety of ways, but often it is geared towards smoothing out spikes in network traffic to ensure that VOIP still runs smoothly, for instance (on the Vonage routers).  I doubt that any other game has quite the same sorts of spikes that AI War does, because usually you have X number of entities all moving around fairly constantly, rather than X low number of entities moving around and then a player suddenly giving 1000 or 4000 commands to units all at once (as with a big bulk move, or what the AI is doing here).  So depending on the QoS algorithms on the various firewalls in use, they may be actively disrupting your AI War session despite the fact that they are not "broken" and do not interfere with other network traffic.

2. Some sort of faulty network driver can't cope with that amount of data being put into its queue without an overflow and losing a bit of it before then passing on the rest.

3. Some sort of filtering at the ISP level is freaking out.

The underlying network library that we are using is supposed to cope with that sort of thing, but I haven't gotten much response from the library author about any potential issues there are there.  As far as he is concerned, I think, the issue is below his layer since it works for the vast majority of people but some specific computers have issues.  This creates a lot of challenges for me, as thus far I can't disprove that it is someone's hardware, and I don't see any issues with his or my software (the game itself is super simple with how it handles this, and all transmissions are identical, so it's almost guaranteed to be the network library or lower).

I'll still try and join a game with someone other than V and see how that goes.

Sounds good.  Again, there might be no problems on your end at all -- it's possible that one of the other players is the one with the QoS interfering, or bad network driver, or whatever issue.  From the sounds of it your stuff is well maintained and up to date, and the age of your router is more of a plus than anything in my mind (since QoS was less standard until recently).  Generally speaking, the vast majority of setups don't have any issues with AI War -- and in one of the other two reported instances of this issue, they were able to play just fine over the LAN, and saw less incidence of this over Hamachi compared to direct-over-internet connections.  So there again, that points to some sort of filtering or throttling or QoS at the router or ISP level, where something either is failing because of, or actively rebelling against, having a data stream with periodic spikes above normal.

This is a very tough sort of situation for me, to be honest, because while part of my background is as an IT admin, and I've got some experience coding TCP sockets and such, I'm by no means an expert on all things networking.  My expertise is in most of the other areas of creating games -- hence why I'm using an external networking library at all.  So I have coordinate with the author of the network library for some of the nitty gritty with this, and his library is being used in a variety of products, including AI War, without any known incidence of failures of this sort.  By the same token, there's a ton of different hardware and driver software out there, all of which has its pluses and minuses, and different versions of each (some with bugs, some with not), and different sorts of filtering software, ISP policies, firewalls of both the software and hardware variety... this makes it a real challenge to find out what the root problem actually is, and generally with this sort of issue (which shows up in various games for various reasons with various different hardware) it is something outside the game or network library itself.  It's the sort of thing where, if the game or network library couldn't handle it, it would fail for everyone or at least a majority of players, rather than an extremely tiny minority, right?  So the trick is figuring out what else is in the pipeline between the affected players, and how to recommend what the fix is.  There's always the possibility of some sort of odd edge case bug in my game or the network library, but I've been all through that to the extent that I can, and there's not much left for me or the network library to check at the moment with the data we have.

I really hate when these sort of issues come up, because of course the tendency for a lot of companies is to point the finger of blame to some other company, and I hate doing that even when it is true (or likely true), because it looks bad in general.  So I always look internally first, but when all of those solutions seem to be exhausted, there's nowhere else to look but at the huge stack of other software (drivers, firewalls, etc) that is responsible for transfer of data.  Hope that makes sense, and I hope that the comments above about QoS help you diagnose the issue if it is indeed on your end, or perhaps the issue is on the client end and one of them can simply update their NIC driver or router firmware and get the issue solved that way...
Have ideas or bug reports for one of our games?  Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline I-KP

  • Hero Member
  • *****
  • Posts: 681
  • Caveat Pactor
Re: Game Stop at 1:02
« Reply #14 on: November 12, 2009, 12:03:34 pm »
I'll try to get onto other hosts and see what comes of that. 

PS.  My router is a DG834, not a 384.  Probably an obvious typo but worth mentioning just in case.  And yes QoS was but a twinkle in its father's eye when this router came out.
Atmospheric & Lithospheric Reticulator,
Post-accretion Protoplanet Aesthetic Seeding Team,
Celestial Body Design & Procurement Division,
Magrathea Pan-Galactic Planets Corp.,
Magrathea.