Author Topic: ProcGen Scoring Metrics  (Read 2046 times)

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
ProcGen Scoring Metrics
« on: July 14, 2014, 01:15:42 PM »
I need a little help working on a tough nut.  Well, more like I have half of what I need, trying to work out the other half and that's what's giving me trouble.

What I need are fitness metrics for procedurally generated enemies.

Specifically, what are the various metrics I can use to score creeps by in order to divide them into rough classes, so that when a wave happens I can pick one or more classes, take the best unit from that class and that's the creep that gets spawned.

I want at least six different metrics, I've got...four.  I can figure out how to score basically any metric I want (i.e. assume that for any adjective I can write a function that produces a quantified value for it), its more a matter of figuring out what those metrics are.

So right now I've got:
  • Distance Traveled (before dying)
  • Damage Dealt to towers (before dying)
  • Damage Taken (before dying)
  • Damage Avoided (before dying)
I want at least two more that fall into that broad category of "survival."  I've also got two others that would be in addition to those, "Cheapest" to make chaff units (acting as distraction targets), and "Utility" (healers, buffers, etc.) that aren't good at getting through the maze themselves, but which act as a force multiplier.  Unit cost isn't programmed in any capacity yet, but it is something that will be doable and eventually implemented.  Ditto buffing.  I've not actually gotten around to the ECS generated creep stuff yet, but I've got a large variety for towers and skipped over the "towers that buff other towers" for the time being.  So I'll be skipping over the same components for creeps on the first pass.

Offline Mick

  • Hero Member Mark II
  • *****
  • Posts: 911
Re: ProcGen Scoring Metrics
« Reply #1 on: July 14, 2014, 02:55:38 PM »
Wouldn't damage taken be based directly on the creep's hit points? Or, are there mechanisms that allow creeps to regenerate/heal.

If some creeps have the ability to heal other creeps, damaged healed could be a metric. You could also make a distinction between self-healed damage (regeneration) and other-healed damage (support/healing) to get two different metrics.

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #2 on: July 14, 2014, 03:23:28 PM »
Wouldn't damage taken be based directly on the creep's hit points? Or, are there mechanisms that allow creeps to regenerate/heal.

There are going to be stats like armor and abilities like regen, yes.  So raw hitpoint total doesn't quite tell the whole story.

Quote
If some creeps have the ability to heal other creeps, damaged healed could be a metric. You could also make a distinction between self-healed damage (regeneration) and other-healed damage (support/healing) to get two different metrics.

Heal-other falls into the "utility buff" category, which would also cover things like "makes other creeps faster" or "summons weak minions" or "stuns towers."  As such abilities are so diverse and wide ranging, its better to classify them all along the same metric, rather than breaking each one out into its own metric.

Offline ptarth

  • Global Moderator
  • Hero Member Mark III
  • *****
  • Posts: 1,127
  • I'm probably joking.
Re: ProcGen Scoring Metrics
« Reply #3 on: July 14, 2014, 06:19:55 PM »
Can you describe the system you are trying to evaluate these for?

From the context and replies it seems to be:
  • Development of an intelligent system for the selection of most effective creeps
  • Tower defense game
  • Ranking creeps by characteristics

When are these metrics updated? Never? Per wave? Per mob death?
If they aren't updated, then you having to deal with creep metrics that are independent of the player's tower choices. For example, creeps facing a player whom uses poison towers (low tick damage that is unblockable) versus a player who uses high damage low firing rate weapons. If they are updated, then when are they updated? If it is after every creep death, then doesn't that require a creep to be spawned to get an "evaluation" to be spawned again. If a creep is marked to be very bad in its metrics initially, then it has a very low chance of being spawned again. Thus allowing the player to switch tower arrangements to be more efficient, given that they can manipulate the spawning logic. If you do some sort of universal per wave manipulations, then have to abstract out many things, for example, how effective an AOE healer creep is. I'm not sure that is very viable either.

In summary, more details about the problem please.
Note: This post contains content that is meant to be whimsical. Any belittlement or trivialization of complex issues is only intended to lighten the mood and does not reflect upon the merit of those positions.

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #4 on: July 14, 2014, 08:21:26 PM »
I'll have to write a longer reply later, but the metrics themselves don't change, but the creep lists and their things on your feet are updated every wave.

The goal is to adapt against the player's setup.

Offline keith.lamothe

  • Arcen Games Staff
  • Administrator
  • Zenith Council Member Mark III
  • *****
  • Posts: 19,318
Re: ProcGen Scoring Metrics
« Reply #5 on: July 15, 2014, 09:23:22 AM »
Bear in mind that the performance of a creep type may vary wildly with the types of other creeps in the wave and the ordering.

Also, varying the weighting of metrics from wave to wave can assist in procedurally finding the "chinks" in the player's defenses.  Like going for 100% Distance-Traveled one wave, and 100% Damage-Taken in the next.  And maybe in the one after that starting with a group of Damage-Taken creeps, followed by a group of Damage-To-Towers creeps, followed by a group of Distance-Traveled creeps.


Another survival-oriented-metric might be "time-on-target absorbed", in other words, how much dedicated attention from towers that creep took up before it died.  So if a single-target tower with a 0.5 second reload fired 3 shots at it, that's 1.5s total.  If it took 2 such towers, then 3s, and so on.  Possibly with multipliers for the total resource cost of the tower.  AOE towers might not count towards this at all, depending on the mechanic involved and whether it can really be saturated or not.

Another metric which seems weaker but might fill out the set is "time alive".  That's less important than distance traveled or how much tower time it soaked, but it helps catch creep types that are surviving by not being eligible for incoming fire, and even aside from that sometimes you can throw a player off by just having a lot of creeps still in the maze, etc.


Another thing which you might measure predictively rather than experientially is "is this creep type vulnerable to any of the towers covering the first X distance-units of the path?".  In the sense of those towers getting a multiplier or whatever.  A combination of low-vulnerability and damage-to-towers could make a good pick-style for the first group of maybe 20% of waves.
Have ideas or bug reports for one of our games? Mantis for Suggestions and Bug Reports. Thanks for helping to make our games better!

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #6 on: July 15, 2014, 10:34:18 AM »
When are these metrics updated? Never? Per wave? Per mob death?
If they aren't updated, then you having to deal with creep metrics that are independent of the player's tower choices. For example, creeps facing a player whom uses poison towers (low tick damage that is unblockable) versus a player who uses high damage low firing rate weapons. If they are updated, then when are they updated? If it is after every creep death, then doesn't that require a creep to be spawned to get an "evaluation" to be spawned again. If a creep is marked to be very bad in its metrics initially, then it has a very low chance of being spawned again. Thus allowing the player to switch tower arrangements to be more efficient, given that they can manipulate the spawning logic. If you do some sort of universal per wave manipulations, then have to abstract out many things, for example, how effective an AOE healer creep is. I'm not sure that is very viable either.

The basic plan is to be running some evaluation tests in the background between waves while the player is building towers.  So while the creeps will always be compared against what the player's doing the information will be out of date slightly (depending on how many waves in the future the plans are for (undetermined yet) as well as explicitly not taking into account what towers the player just built) the game is reacting to them.

Creeps will be continually generated and having their metrics scored.  There's little to no need to worry about having a creep get devalued and "never spawned again" as there's not going to be much "ongoing memory of what we tried before."  Just "here's 30 random results, score them."  They might be the same, they might be different.  Next time we score, we throw out everything we have and start from scratch.  Or I might throw out the "oldest 30" and generate a new 30 and score a running collection of 90.*

*How many I can score at a given time isn't known yet.  Based on previous experience I should be able to simulate an entire wave in a few fractions of a second.  I'll have to refactor some code from what I have right now, but I'm prepared to do that.

Bear in mind that the performance of a creep type may vary wildly with the types of other creeps in the wave and the ordering.

I am certainly aware!  Just that there isn't a simple method of testing for stuff like that.  Changing one factor can completely throw a wrench in what worked well, so figuring out why a given mix worked is an impossible task, programmatically.

Quote
Also, varying the weighting of metrics from wave to wave can assist in procedurally finding the "chinks" in the player's defenses.  Like going for 100% Distance-Traveled one wave, and 100% Damage-Taken in the next.  And maybe in the one after that starting with a group of Damage-Taken creeps, followed by a group of Damage-To-Towers creeps, followed by a group of Distance-Traveled creeps.

That's why I want at least six different metrics. :)  The point is to only send 2 to 4 different creep types per wave, so with six metrics, that does exactly that.  As well as mixing up the order.  In a multi-path map, while you'll be getting creep types A, B, and C the different paths might get them in a different order.  ABC vs. CAB vs. BCA.

Quote
Another survival-oriented-metric might be "time-on-target absorbed", in other words, how much dedicated attention from towers that creep took up before it died.  So if a single-target tower with a 0.5 second reload fired 3 shots at it, that's 1.5s total.  If it took 2 such towers, then 3s, and so on.  Possibly with multipliers for the total resource cost of the tower.  AOE towers might not count towards this at all, depending on the mechanic involved and whether it can really be saturated or not.

Ahh, good good.  I think that might end up being some kind of hybrid between "damage taken" and "damage avoided" but it is a different score.  Worth seeing the results to determine if its a useful metric at the very least.

Quote
Another metric which seems weaker but might fill out the set is "time alive".  That's less important than distance traveled or how much tower time it soaked, but it helps catch creep types that are surviving by not being eligible for incoming fire, and even aside from that sometimes you can throw a player off by just having a lot of creeps still in the maze, etc.

That one might be a hard metric to actually test for, but I'm writing it down none the less.  It's definitely a metric that depends on other creeps being around.  I'll think on how to test for that.  Might end up being tested during a wave, so that it's a creep that shows up from the last wave based on that metric, while the others get shuffled about.

Quote
Another thing which you might measure predictively rather than experientially is "is this creep type vulnerable to any of the towers covering the first X distance-units of the path?".  In the sense of those towers getting a multiplier or whatever.  A combination of low-vulnerability and damage-to-towers could make a good pick-style for the first group of maybe 20% of waves.

Multipliers are based on an elemental system: water beats fire beats wood, etc.  Elemental types won't be considered during the testing phase anyway, as that'll be a modifier that will get applied to the group such that a given group isn't always the same elemental type.  So you'll get "Fire bruisers" and "water bruisers" in the same miniwave, and while that'll give the creeps slightly different stats (fire increasing damage, earth creeps having more health) it'll provide a mix of elements to fill that particular role.

I'm also pondering a modifier based on what metric is scored well at.  So the creeps that get a good score in Distance Traveled and are chosen as the "fast creep" (regardless of their actual speed value) might get a modifier that boosts their armor, allowing them to survive just a little longer.  "Damage dealt" creeps might get a boost to their speed, allowing them to get to a target sooner.  And so on, the modifier not directly effecting the attribute primarily responsible for the high score in the metric (so "damage taken" doesn't get a boost to HP) but still allows them to perform better in the category.

Offline ptarth

  • Global Moderator
  • Hero Member Mark III
  • *****
  • Posts: 1,127
  • I'm probably joking.
Re: ProcGen Scoring Metrics
« Reply #7 on: July 15, 2014, 04:21:39 PM »
I'd like to propose a thought exercise to consider.
  • Assumptions
    • Assume that the player makes good build choices.
    • Assume that the AI makes good creep distribution choices.
  • Development
    • The player chooses the most powerful defenses available to them, given the playing field.
    • The AI chooses the creeps that are best suited to defeating those defenses. Its "decisions" don't incorporate new defenses.
    • The player chooses new defenses based upon the current or previous creep waves (these would be defenses that are strong against creeps that are strong against their previous defenses.
    • On the next wave the AI then adapts to the defenses built during the previous wave.
    • Return to step 1 and repeat.
  • Experience Analogy
    • This is in contrast to typical Tower defense games which present the challenge as a puzzle.
      • You have X waves consisting of K, L, M units at A, B, C frequencies.
      • Solve for a reasonable defense.
      • In this case you try to build a defense that just accomplishes X damage, however you want.
    • In this version however, we would get cyclic behaviors, the length of the cycle depending upon the type of balance (e.g., rock, paper, and scissors versus infantry, ranged, and cavalry vs fire, water, earth, and wind vs strong and fast, etc)
    • So you'd get into a pattern wherein it would be: tower 1: high aoe damage, tower 2: rapid fire damage, tower 3: Armor piercing, return to tower 1.
    • Is that what you want and is that fun?
    • Gratuitous Tank Battles does something like this.
      • The best strategy is to hold a reserve of money and alternate between defense types, entirely changing them from one to another. Then, when the AI runs out of money, spend your savings on units that are the exact opposite of what you already have.
      • I don't really enjoy Gratuitous Tank Battles, even though it has seemingly all the parts that would make a good Tower Defense game.
      • I believe the problem is that you don't develop a strategy that is unique to you, instead you develop this cyclic system and you don't feel invested or like you have choices.
      • For example, you can't base your defenses AOE damage, because the AI will send anti-AOE troops. You have to follow the cyclic pattern.
Note: This post contains content that is meant to be whimsical. Any belittlement or trivialization of complex issues is only intended to lighten the mood and does not reflect upon the merit of those positions.

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #8 on: July 15, 2014, 04:47:25 PM »
I'll certainly look at that being a problem.  I do agree that it is a pitfall and do agree that it wouldn't be fun.

I am not going to be writing the system to go "oh there's lot of AOE, I need AOE immune units" more... "this unit survived well when used, lets use it."  Not because it counters what the player built, but because it provides a reasonable challenge to the player.  I definitely don't want this to happen (I had to leave the computer running for four hours unattended once that was built before the game ended).

I'm trying to aim for something that falls somewhere between Dungeon Defenders and Loadout: strong emphasis on making choices and sticking with them, but a wide variety of choices to make, also knowing that your opponent can do the same.

The reason I said to not consider the towers the player just built is two fold:
1) the player should have a reasonable expectation that the computer is not all-knowing
2) inability to process the changes quickly enough.

Offline ptarth

  • Global Moderator
  • Hero Member Mark III
  • *****
  • Posts: 1,127
  • I'm probably joking.
Re: ProcGen Scoring Metrics
« Reply #9 on: July 15, 2014, 04:52:00 PM »
  • I'm assuming that the goal is to develop the best performing AI, and not that you want to implement a system (for whatever reason).
  • You claim that evaluating complex cases is impossible, so the individual metrics, taken from units in isolation, may not be that useful.
  • Given that you can simulate a run very quickly, why not due away with algorithmic unit selection and go with simulation-based selection.
  • Generate X cases of randomly generated creep waves consisting of Z different creeps in Y different sequences.
  • Score a creep wave based upon how many get through the defenses (or how far it got before dying).
  • Pick the best creep wave after all X cases have been simulated.
  • Alternatively, you can develop a neural network with several layers of hidden nodes. Use the previous procedure to identify the better creeps to use in conjunction given a defense, and then use that information to create more targeted creep waves to test than random selection initially proposed.
  • Another approach would be an iterative genetic algorithm to creep-wave selection.

  • With the metric approach, we create variables that we think are important for creep wave success. However, since creep wave success is really due to complex interactions between different creep waves, the metrics aren't capturing the important aspects of the simulation

  • The only other metrics that comes to mind are a variety of cost effectiveness metric.
  • One example, would be a variant on Keith's "time-on-target-absorbed" metric, wherein not only do we factor in the $value of the towers kept busy, but we also factor in the cost of the unit we send AND the monetary gain the player gets.
  • Effectively, the creeps are hard to kill and don't give the player very much money for doing so.
  • The opposite of this would be the point effectiveness metric for the attacker. Weighting metrics based upon how much the attacker would get for successfully getting the unit through the defenses.
Note: This post contains content that is meant to be whimsical. Any belittlement or trivialization of complex issues is only intended to lighten the mood and does not reflect upon the merit of those positions.

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #10 on: July 15, 2014, 05:07:29 PM »
I'm assuming that the goal is to develop the best performing AI, and not that you want to implement a system (for whatever reason).
You claim that evaluating complex cases is impossible, so the individual metrics, taken from units in isolation, may not be that useful.
Given that you can simulate a run very quickly, why not due away with algorithmic unit selection and go with simulation-based selection.
Generate X cases of randomly generated creep waves consisting of Z different creeps in Y different sequences.
Score a creep wave based upon how many get through the defenses (or how far it got before dying).
Pick the best creep wave after all X cases have been simulated.
Alternatively, you can develop a neural network with several layers of hidden nodes. Use the previous procedure to identify the better creeps to use in conjunction given a defense, and then use that information to create more targeted creep waves to test than random selection initially proposed.
Another approach would be an iterative genetic algorithm to creep-wave selection.

I am not looking for the "best performing AI."  I'm looking for a "Good enough AI."
The complex case is not difficult to evaluate, but rather difficult for an automated system to determine what went right and what could do better.  Which is why I broke the problem down into something that is much easier to test and measure.  It might not be perfect but it should be good enough.

The problem with the genetic version is that the creeps here aren't...genetically defined.

Offline ptarth

  • Global Moderator
  • Hero Member Mark III
  • *****
  • Posts: 1,127
  • I'm probably joking.
Re: ProcGen Scoring Metrics
« Reply #11 on: July 15, 2014, 07:11:22 PM »
There are two issues, 1. understanding what made a creep wave successful and 2. making a successful creep wave. If you only care about 2, then 1 isn't important. As I understand your position, you want to do 1 so then you can do 2 (or possibly making a series of "interesting" creep waves). On the other hand, if you create X permutations of a creep waves and just evaluate it based upon success rate, you can skip right ahead to making "good enough" creep waves.

In a genetic algorithm approach, the unit of interest isn't the characteristics of a creep. It is the make up of the creep wave. So you run creeps waves of AAA, BBB, CCC, DDD. You find that AAA & CCC did the best. You then run permutations of AAA and CCC, so AAC, ACA, CAA, etc. By starting with a reasonable, but not exhaustive initial population of creep waves, and using the genetic iterations to refine the best ones, you can then end up with a "good enough" creep wave.
Note: This post contains content that is meant to be whimsical. Any belittlement or trivialization of complex issues is only intended to lighten the mood and does not reflect upon the merit of those positions.

Offline Draco18s

  • Resident Velociraptor
  • Core Member Mark IV
  • *****
  • Posts: 3,925
Re: ProcGen Scoring Metrics
« Reply #12 on: July 15, 2014, 07:38:06 PM »
Oh, I see.  Yes, the genetic thing would work for the wave that way.  And probably quite well.

From what I remember of the genetic algorithms is that even though A and C did well, you still want to keep B and D in the mix.

Offline ptarth

  • Global Moderator
  • Hero Member Mark III
  • *****
  • Posts: 1,127
  • I'm probably joking.
Re: ProcGen Scoring Metrics
« Reply #13 on: July 15, 2014, 08:42:20 PM »
That's the idea. There are bunches of variations on how you do selection and permutation. The basic idea is just selection of a "better" group of agents, mutation of that group, and then iterate.
Note: This post contains content that is meant to be whimsical. Any belittlement or trivialization of complex issues is only intended to lighten the mood and does not reflect upon the merit of those positions.