One thing that could possible be done, though I know very little about it, is to apply some sort of adaptive audio effect to the voices (or maybe the sounds too).
That is, apply some modulation at runtime to a smaller selection of sound files to produce a wider variance of sounds. I know that lots of games do this in situations where you'll have a single sound effect played many times in a row, to avoid sounding too samey. For example footsteps or gun firing sounds might be slightly modulated a tiny bit each time the sound is played so it doesn't sound like you're just playing 1 sound effect over and over.
Maybe this could be applied to the voice files, but instead of random variation, it could pick a different minor variation per grouping. I know that many of the lines are only said by a very small selection of bots to whom they are relevant (e.g. only bots that heal can ever say things about healing). However because of the greeting & death lines (and maybe a few others), which are globally shared, this might be worth investigating.
EDIT: Sine Pepisolo gave his takes on reengineering the voices,
I figured I'd do mine. I'm a noob at audio engineering, but it was fun to mess around with the various effects and parameters.