Generally if I'm balance testing it's some stuff of mine vs some mobile AI force so I just set it up on one planet and enter "warp in the clowns,Bomber,500" in the chat thing and it will send in a wave of mkI bombers of about 700 in size (depending on cap scale, the third parameter is something of a coarse one).
For more general balance questions I usually rely on numerical/statistical analysis.
For more complex scenario testing it's generally necessary to have a "valid" gamestate, i.e. one arrived at through more-or-less natural means (tons of "activate the omega", "engies", and "give me k" activations fit within that, generally), and a scenario builder or whatever is likely to generate gamestates that vary significantly from what players would actually see.
And tons of stuff can basically only be really tested if the players give us a save where something is about to happen
Even if it's something really simple or early it's just impossible to know if the key thing leading to whatever oddness was one of the things left out in our attempted re-creation, etc.