To clarify, I wasn't proposing taking score into account, but probably going by win percentage. There are all sorts of easy and hard ways to go at it, but one of the easy ways might be:
Take any given binary setting, say Medium Golems. Look at all games with that turned on and turned off, and check the win % in both cases. If win % is higher with it on, we know it probably makes the game easier overall; if lower, then harder.
Now just weight each of these settings by the disparity between win% -- e.g., if a setting gives you 54% or 51% when it's on or off respectively, you know it's not much of a factor, but if it's 70% and 40%, that's a different story. I'm not a statistician, but you wouldn't be able to take the linear difference of the percentages... however, I'm sure you could take the sqrt or log or what have you and use that as your weighting factor.
For the actual difficulty ratings, I think you could get away with the same approach, since there's only a handful of them. Yes, these factors can interrelate in complex ways so by considering them one at a time you are necessarily losing something, but it would be dead simple, and as I said, I think it'd get you 90% of the way there.
Better yet: do that AND do the community rating thing. See what results seem to track best, use those.