Hi, my name is Romain `Sylver` Durban from France. I have made this small project in an attempt to analyse and evaluate the behaviour and effectiveness of the Matchmaker in World Of Tanks. I am myself a player of the game and I have witnessed various times teams being most likely unfair not only by the players skills (witch should always be the first criteria to balance teams) but by the distribution of tanks. Thus, since nobody else conducted such analysis, I wanted to look into it by myself.

Prologue

It is a fact, some tanks are better than others within a same tier. The purpose of the game is not make all tanks of the same tier equal, otherwise it would most likely be boring. Wargaming have tried to respect as much as possible historical specifications of the real tanks, which is very respectable. Because everyone wanted to win the war, tanks were designed to be as efficient as possible (under various constraints, such as the complexity, the weight, the maintenance or the production cost), with the objective of surpassing tanks of the enemies. Consequently, it is logical to notice a lack of balance within tanks and we have to deal with it. However, to make it fun and interesting for everyone, each player has to be able to contribute to the game and each team must have the same chances to win. If the first is nearly impossible to achieve, the second should be way easier to. But does Wargaming really succeed at making teams balanced?

In this "blog", I will try to tackle the question of teams balance in World of Tanks and share my own opinion about it. To do so, I will first talk about how Wargaming claims to balance teams. Then, I will present my own vision of teams balance. I will continue by presenting my approach for the analysis and the data I used. At last, I will comment the results that I obtained and conclude by saying how effective the matchmaker seem to be and what could be improved or not.

NOTE: All the results I used for the analysis are available on this website in a practical way, I hope.

I - Wargaming's matchmacker

Even if Wargaming does not share details about how the matchmaker works and how it has evolved, some guidelines have been given in the past (a Wiki page and a Youtube video, dating from 2013 and probably out of date now). In this, we can learn about the ruleset that is theoricaly used by the matchmaker. Whether or not these rules are still applied is not known.

I.a - Tank weights

The first thing to know is that to balance teams, you need a metric to measure the total strength of one team compared to the other. In World of Tanks, it is done by using balance weigths. These weights are hidden but an idea of the values and of the logic have been given. Each tier has its own assigned weight. Each tank receive a multiplier to increase its weight depending on its class or others specific cases. Heavy tanks, scouts and SPGs have an increased weight (values differ from the different sources). Tanks of high tiers (most likely from tier 9 to 10) all have the same multiplier, making them equal. Due to their specificity, some tanks are manually weigthed. See the linked pages for more details.

Using the weights, the matchmaker tries to:

  • Limit the total weight difference to 10%
  • Limit the total number of SPGs per team to 5, the weight difference of SPGs between two teams can not exceed 20% and count difference should not be more than 1
  • Limit the weight difference of scouts between two teams to 30% and count difference should not be more than 1
  • Limit the count difference of platooned tanks to 3
  • Limit the count difference of a same tank to 1
  • Limit the weight difference of top tier tanks to 25%

I.b - Battle tiers

Balancing teams using weights is a good thing, but comparing the total weight of each team could hide a large disparity of tanks within a same team (for instance a tier 10 and a tier 1 tank in the same team). Another set of rules is thus necessary to limit the range of tiers possible in a same battle. To do so, Wargaming uses the battle tier, a value indincating the tier of the current battle. This battle tier is usually the highest tier of the battle with the exception of the battletier 11 containing only tier 9 and 10 tanks, and the battletier 12 containing only tier 10 tanks. Each tank has its own set of possible battletiers. Most tanks can encounter tanks of 2 more tiers, except light tanks for going up to 3 tiers and some premium tanks having only 1 more tier.

The only exception to this rule is the case of platoons. The highest weigthed tank sets the battletier possible for all tanks of the platoon, which can lead to awkward situations. It was used in the past to interfer with the matchmaker and force it to put a stronger tank to balance the very weak one. It seems that now measures have been taken to prevent this abuse and simply leave one of the teams much weaker, punishing the team having the platoon instead of the other for the sole purpose of discouraging them to do it.

I.c - Is that all?

Wargaming does not claim to use others parameters to balance teams. They officialy ignore the tank modules, equipments and consumables. They also pretend not to take into consideration the players skill. Some people say that Wargaming does, or at least considers the recent winrate (see this article), to make sure everyone keep a winrate close to 50%. I myself think this is not the case for the following reasons:

  • Why would they hide a feature that players are asking? But after all, nobody will complain about a feature that does not officially exist
  • Lots of players have a winrate much higher than 50% (60+) on a large number of battles, this wouldn't be possible in a system really balancing teams by players skill

I.d - In short

The matchmaker tries to have teams of a similar total weight. It also tries to balance more specificaly SPGs, scouts and top tier tanks. They do no seem to consider heavy&medium tanks or tank destroyers nor the number of bottom tier tanks. But the more important here is that only the nature of tanks is considered, no statistics are used in any way (or at least not officially). This means that they consider that all tanks having the same multiplicator are strictly equal, which is quite questionable.

II - Evaluating the matchmaker

The primary goal of this project is to evaluate the efficiency of the matchmaker, to define and use another approach to verify if teams are actually well balanced. I am not trying to verify if Wargaming indeed does what it claims to, instead I will define my own vision on how teams should be balanced.

II.a - Balance rules

In my opinion, the rules to balance teams should be the following:

  • The weight difference should not exceed 25% the weight of a top tier
  • The number of top tier and bottom tier tanks should be strictly the same
  • The total weight of top tier tanks should not differ by more than 25% the weight of a top tier
  • The number of tanks of a given type should not differ by more than 1
  • The weight of tanks of a given type should not differ by more than 25% the weight of a top tier
  • The number of SPGs should be exactly the same in both teams and not exceed 3 in general
  • Do not allow platoons of a tier difference greater than 2
  • Do not allow platoons having more than twice the same tank

Why these rules? First, considering a basic and static percentage to verify the weight difference is a bit too simple. Furthermore, if you think about it, 10% difference is actually really big. In a team of 15 players, this represents roughtly 1.5 tank, or a top tier tank. In other words, it's almost like if the game was 16 versus 15.

So, why 25% of the weight of a top tier? The 25% comes from my own definition of weights that I will explain later. Why the top tiers? Because they are those making the more impact in the game and they usually set the battle tier, which represents the level of the battle. This value will act as a static difference allowed instead of a naive percentage.

Why the same number of top and bottom tiers? Again, top tiers have the more impact in the game. Giving more of those to one of the team will most likely give an advantage. But this is not enough, these tanks should have a similar total weight. In a similar way, the count of bottom tiers should be the same because they tend to impact the game in a negative way (when the tiers range is high, these tanks have troubles to be able to help). A team having less bottom tier tanks is a team that has more tanks able to impact the game, and that gives a noticable advantage.

Why the same number of SPGs and max 3? SPGs can have a massive impact in the game and completly prevent some tanks/players from playing. The higher the SPGs count, the more shells will come from the sky. Having an SPG of a higher tier does not make a difference as big as for other types. An higher tier SPG will usually deal more damage but have a lower fire rate. Consequently, a lower tier SPG can be as impactful as an higher tier one. What will make the difference is the number of SPG as they can support more often, on a wider area and compensate the lack of accuracy with more shots fired. Why maximum 3? I believe this limit is already being used and works fine. More would really make the game horrible for others. 3 has shown to be acceptable, even if less could be better you need to consider the fact that you need to find a game for these players as well.

Why a similar number of tanks of the same type? Checking the count of tanks of a given type looks important to me. It is unacceptable to have for instance 5 heavy tanks in one team and 1 in the other, it really makes the game awkward and possibly imbalanced. Furthermore the set of tanks of a given type should be balanced seperatly, both teams should have the same potential in each type. Each team should have the same options but use them differently and smarter to win the game. Only this way we can emphasize the player skill as the way to victory. But indeed, if both teams are almost identical, games might look boring and repetitive. A game full of heavy tanks against a team full of medium tanks might end up being very challenging for both teams. But with the random factor of the map, this could also be very one sided and not fun at all.

Why restricting platoons? There is absolutly no logical and legitimate reason to platoon with tanks having a tier difference greater than 2 (meaning that it wouldn't be possible to have these two tanks in a same game without platoons). Allowing this can only introduce imbalanced in teams. Furthermore, when players are platooning with three identical tanks, it also leads to balance issues and can possibly make the game horrible for other players. We can for instance think of a triple-KV-2 platoon in a tier 6 battle.

II.b - Balance weights

Just like Wargaming, we will use balance weights to evaluate each tank and verify the balance of teams. We will also only consider the tanks and not the players, but only because we do not have access to this information in a post-analysis. What we will do differently is that we will use statistics to define the weight of tanks from its efficieny instead of static and arbitrary values. But which statistics to use? Which metric really shows the impact and the potential of a tank? I have not found a perfect answer to that, instead I am proposing 3 different metrics.

  • Win rate (from WNExcepted)
    • This was my first idea because it was used to calculate the WN8, a widely accepted indicator of tanks and players efficiency. The interest of this data is that it is deeply processed and improved, ensuring the quality of the values.
    • The win rate is often used to compare the efficiency of tanks in random battles. It makes sense, a tank that wins more often is more likely stronger. I personnaly don't completly agree on this. This indicator shows how often the tank was placed in a team that won, regardless of the fact that this tank helped or not to win these games. A tank can often accidently win a game without helping, or lose having done an amazing job. Furthermore, some tanks have a more favorable "matchmaking" than others, putting them in games in which the tank can have a bigger impact. These factors can introduce too much imprecision to reliably use the win rate.
    • Another aspect that annoyed me is that values from WNExpected have a significant bias. It represents the 50% players that played the best with the tank, having in mind to see how well the tank can perform in good hands. If having all win rates higher than they should be is not a problem for us, making various tanks look stronger won't reflect the overall behaviour of tanks in random battles. We can realiably consider that these 50% best players on the tank are more likely more skilled and/or use premium ammo, increasing the efficiency of the tank and compensating some of their weaknesses. In general, weak tanks will not look so weak and the stronger ones will look slightly weaker than they should be.
  • Win rate (from vBAddict)
    • This will present the same drawbacks coming from the usage of the win rate.
    • However, this winrate represents all players, the good and the bad ones. In my opinion this is a more accurate indicator to evaluate the balance of teams in any battle.
  • Damage dealt (from vBAddict)
    • This is the indicator that I think represents the best the impact of a tank in a game. Damage dealt is not accidental. Dealing damage is almost always the more impactful in a game (but there always are some exceptions of course). There is no possible doubt about the fact that a tank that on average makes more damage is a tank that has more impact into the game. If we take two heavy tanks of a same tier, the one making on average more damage is clearly better. This ignores completly the outcome of the game, and thus highly reduce the impact of the team (and consequently of the matchmaker) on the stats of a tank. The overall damage dealt includes others indicators of the strengh of a tank than the raw alpha-damage (such as the mobility, the accuracy, the penetration, the armor or the camouflage) because a tank that survives longer and can more reliably make damage is a tank that will eventually deal more damage.
    • Dealing damage is indeed not the only way to contribute in the game, scouting can also help a lot. Unfortunately, assisted damage is very inconsistent as it can also contain damage on opponents accidently tracked or scouted, making this indicator unreliable. Consequently, scouts generally have a poor weight despite the fact they can have a big impact. But do they really consistently do?
    • I would have liked to combine this indicator with the average damage blocked because negating opponents damage is also a direct contribution. Unfortunately, this indicator is unavailable, even on vBAddict.

But we are not done yet with the balance weight. We have only defined which indicator in available statistics to use. Now we have to put these statistics together to calculate the weight of each tank. Starting with the same idea than Wargaming that uses a value per tier, I defined a range of possible weights for each tier (a minimum and a maxium value) in which all tanks will be spread depending on their difference from the best/worst. The worst tank will have the minimum weight and the best the higher one, the rest will be spread in proportion of the indicator used.

But how did I set these ranges of weights for each tier? First, after briefly analysing the average increase of hitpoints from one tier to another, I decided that the worst tank of a tier would be 33% higher than the best of the previous tier. Next, I arbitrarly decided that the best tank of a tier would have 25% more weight than the worst one. Using this percentage will put the worst tank of a tier at the same weight difference of the best one of its tier than to the best of the lower tier. This gives the following values (rounded to get only one decimal):

TierMinMax
I22.5
II3.34.2
III5.56.9
IV9.211.5
V15.319.1
TierMinMax
VI25.431.8
VII42.252.8
VIII70.287.8
IX116.7145.9
X194242.5
minRangen = maxRangen-1 * 1.33
maxRangen = minRangen * 1.25
The interest of defining a rule to set these ranges is that weights are consistent throughout all tiers and have the same properties. This way, he know that the weight difference in a tier will always be 25% of the lowest value (and 20% of the highest one). That is why the 25% is used to verify the balance between teams, this way we know that symbolically the global imbalance between two teams will be, in the worst case, like if one team had the best tank of the top tier and the other the worst. This could also be the case of a team having the worst tank of the top tier and the other the best one of the lower tier, but if we force the teams to have the same number of top tier teams, this can not happen.

The balance weights values using all three indicators can be found on the tanks page of this website.

II.c - Analysis approach

Now that balance rules have been established and that we have a way to measure the balance for each battle, it is time to run the analysis to verify how the teams created by the matchmaker behave. We will not simply verify if the rules are being respected on average but instead we will try to figure out how many battles do not and by how far they fail.

The analysis will observe several aspects: the entire team, top tiers, bottom tiers and each tank type. On each of these aspects the following analyses will be run:

  • The basic percentage of weight difference
    100 * abs(weightteam 1-weightteam 2) / ((weightteam 1+weightteam 2)/2)
  • The count difference (for each type an for top tier)
    abs(countteam 1-countteam 2)
  • The count distribution of each type
  • The advanced percentage of weight difference, compared to the top tier weight
    100 * abs(weightteam 1-weightteam 2) / topTierBaseWeight

Sometimes, it will not even be possible to compare a count or a weight between the two teams because one of the team does not have any tank of the analysed category (for example, only one of the team has tank destroyers). Such situation is considered as a complete failure, consequently the worst value possible will be set, which means that the maximum value might show a significant increase.

These analyses will produce for each battle a set of values that we now need to put together to observe the big picture. When only a few values are possible, the best reprensetation is a bar chart (for instance for the count of tanks of each type). Otherwise, we will use a Cumulative Distribution Function (CDF) on a line chart. Values will range from 0 to 100% of the observed population and will indicate the proportion of battles having less than a given value. For instance, if we observe the percentage of weight difference, the value at x=10 will show the percentage of battles that have less than 10% weight difference.

A seperated type of tank as been added, Scouts. Because not all light tanks are scouts, it has been considered seperatly. However, tanks are not officially tagged as scouts, making it arbitrary. To try to use similar data than Wargaming, the list of scouts is the same than the one provided on their wiki page. This list, that remains questionnable (tanks marked with the *), is composed of the following tanks:

  • USSR: T-50, MT-25, LTTB(*), T-54-ltwt(*)
  • Germany: Pz38tnA, Luchs, VK 16.02, VK 28.01, SP I C, Ru 251
  • USA: M5 Stuart, Chaffee, T21, T37, T71, M41 Bulldog, T49(*)
  • France: AMX ELC, AMX 12t, AMX 13 75, AMX 13 90
  • China: M5A1 Stuart, 59-16, WZ-131, WZ-132

III - Results

The analysis previously depicted has been ran using my own ~4,000 replays ranging from tier I to tier VIII over about 6 months. These replays have been first parsed to a JSON format using Phalynx's WoT-Replay-To-JSON. Afterward, each replay in JSON has been analysed using the approach explained earlier using python. The analysis used the WNExpected data of version 23 (November 2015) and vBAddict's stats over the month of September.

The results can be found on a dedicated page of this website in the form of charts. Some specific examples can also be found on another page showing practical cases of team imbalances.

III.a - Wargaming rules

Even if the goal here is not to verify if the rules from Wargaming are verified, it is still interesting to check their behaviour. Firstly, we can see that the global weight difference exceeds 10% only in 3 to 5% of the games (depending on which indicator we consider), this rule seems to be effectively applied. However, we can see that this difference can reach 18%, which is 80% more than the original limit.

If we start to look into the top tier tanks, the difference grows bigger, but the rule is allowing up to 25%, which reprensents about 90% of the games. Again this rule seems to be generally well applied, but we can see that the difference can reach 80%, which is more than 3 times the original limit. Regarding scouts (not light tanks), about 73% of the games respect the 30% limit, which is not as good as the rest, but this could be due to a different set of Scouts being used by the matchmaker. The last rule we could verify is about SPGs. About 90% of the games have less than 20% weight difference as stated, there never are more than 5 SPGs per team (rarely over 3) and that the count difference never exceed 1.

In general, we can indeed see that rules established by Wargaming are reasonably well applied with some bad exceptions. However, all the rest is being completly ignored. Heavy&medium tanks along with tank destroyers are left chaotic, they are rarely balanced and the difference can grow really big.

Even if the rules from Wargaming seem to indeed be used, this does not mean that these rules can actually balance teams. Just like we said earlier, 10% of the entire weight of the team is significant, this represents more than a tank difference (just like if it was 16vs15). And that is when the rule is working, it sometimes can get far worse and teams start to look ugly. It is the same for the weight difference of SPGs (25%) or scouts (30%), depending on the count of corresponding tanks, the impact can be big. 25% of a single SPG can be a SPG of a higher tier, or two of a lower tier. If you don't believe in these theories, we can verify what a 10% weight difference can mean in some examples, and indeed this can be huge. Global percentage just do not work well. 30% of a single tank or of three tanks is completely different and the outcome is nearly unpredictable. Furthermore, the percentages seem in general to be too high and thus too permisive. Rules allow imbalanced teams from the beginning, so imagine when the rules are not fully effective.

III.b - Our rules

We have seen that Wargaming's rules are being effectively respected in most cases with about 10% errors rate. But how does it behave compared to our more strict rules? For that we need to pay attention to the "Advanced weight balance" charts. Remember that the threshold we are looking to is 25% of the toptier base weight, which either represents the weight difference between the best and the worst tank of a same tier, or the weight difference of the worst tank of a tier and the best of the lower tier.

Regarding the global weight balance, 39% to 46% of the games seem to still match our strict rules, which is not catastrophic (it is almost half the games) but not good neither. This means that about half the games are not strictly balanced. Bad news is that the difference can go really high, 7% to 10% of the games have more than 100% of toptier's base weight difference (just like a 16v15). If we look more into details and start with top tier tanks, we can see that it is slightly better as 50% to 54% of the games appear to be fine.

However, when it comes to heavy&medium tanks and tank destroyers, it is chaotic. More than 50% of the games have more than 100% weight difference and if it was limited to a maximum difference of 200%, it would go really high. At least, as expected, it gets better when it comes to light tanks and scouts more precisely, 55% of games are balanced regarding to light tanks and about 71% regarding scouts. It is even better for SPGs as 82% of the games seem fine and that 99% of games have less than 60% weight difference.

In short: This advanced approach confirmed most of what we already knew with the more basic analysis, the overall weights along with top tiers, scouts and SPG are only close to be balanced, at least close to what Wargaming claims. When only 10% of the games seemed to go wrong with Wargaming rules, we can see that more than half of the games actually are imbalanced. But that is not the worst, because it can go nuts more often that it is acceptable and make horrible teams.

Nota bene: We can see that the three indicators used to build the balance weights look very similar. The only noticable trend is that the WNExpected values look more strict (produce a bigger imbalance), possibly because some weak tanks have a better rating and highly increase the team weigth when they maybe should not in general. We could also question the interest to use statistics to define precise weights. In the current situation it looks almost meaningless because the weight difference goes way beyond any fine and precise balance, thus precise values can very well be a waste of efforts.

IV - Conclusion

Wargaming claim that they are balancing games, they even provide a set of rules they use to reach that goal. If these rules indeed seem to be respected, with some exceptions, they allow very imbalanced teams from the beginning. They either do not really care about making strictly balanced teams or they terribly fail trying. In my opinion, if they really care about it, they should revise their set of rules and really monitor results to really improve the matchmaker. If there are other reasons that prevent them from really achieving their goals, such as waiting time or platoon constraints, the least they could would be to be honnest about it.

IV.a - How to improve the current matchmaker?

Increase maximum waiting time. If minimizing waiting time is a reason of poor balance, I am sure almost everybody would agree to wait a little bit more for better overall games. Considering how low is the current average waiting time (about 5-10 seconds on average I would say), they can afford to increase it (20-30 seconds is still rather fast). Remember that the short waiting time contributes to the fact a lot of players do not mind dying quickly in a game and become highly useless to their team.

Restrict platoons. It is understandable that platoons might make teams hard to balance, teams can no longer be balanced tank by tank but by group of tanks. But, in my opinion, it is also the responsability of players to help the system and not abuse/misuse features. You want to play with friends? Fine, but make sure to use tanks of the same or similar tiers. That is why Wargaming should, at the very least, forbid all platoons of a tier range higher than 2, meaning that such tanks would never meet each other in other circumstances. But even a tier range of 2 can make things complicated. The lower would the tier range allowed be, the easier it would be to make balanced teams.

Revise the error margins. Wargaming uses basic percentages to verify the weight difference of a set of tanks, it can either be the whole team or just a specific set such as scouts or SPGs. But a percentage hides a lot of things, 30% of 1 tank is very different than 30% of 3 tanks. The bigger the set, the bigger the error can be. They need to find a more accurate metric, such as the weight of the toptier tanks like I did in my analysis, which will behave the same regarless of the size of the set analysed. But that is not the only problem. Even if basic percentages are innacurate, the values they chose are simply to high and already allow significant imbalances. They should decrease them to make the matchmaker more strict. For instance, the 25% of the toptier weight seem to correspond to about 3% of the total team weight difference, this is more than 3 times less, there is a lot of room for improvement left.

Use templates more extensively. In their video explaining how the matchmaker works, they show how it builds and reuses templates of teams to accelerate the creation of teams. This is a brilliant idea, but why not use it more often and more reliably. Honnestly, if I did not see it in the video, I would not know they do it because it is barely visible in the shape of teams. Templates, if not pre-built and static templates, would most likely help a lot in making balanced teams. This would reduce a lot of randomness in the composition of teams, and thus possible imbalance. If teams always have a shape that was designed beforehand, then they can not be highly imbalanced. The only variable factor left that could create imbalance would be the strength of the various tanks, which can still be handled. There are various ways to make templates of teams. Templates could simply indicate which tiers to put in a team and how many for each. A stricter template could also indicate the type of tanks in the whole team or even in each tier, and possibly their number. The more precise and perfect template would specify which tanks to put, it would ensure a perfect balance if the template was balanced when defined. Many templates could be defined depending on the gamemodes, maps, tanks in the queue etc. These templates not only would highly improve the balance of teams but also improve the compositions of teams by, for instance, not putting many SPGs in a city map, or heavy tanks in a wide oppened map. In short, always using pre-defined templates would give a lot of control and customization to Wargaming and thus highly improve the games proposed by the matchmaker. Furthermore, pre-defined templates would not need any initialization and could be used at any time. The only drawback of templates, if it is one, is that it can make all games look the same, or at least similar.

Make use of statistics. They could already be doing it without telling, but Wargaming do not seem to make use of any sort of statistics to balance teams. This could just be the performance of tanks just like I did but even considering players skill would be an improvement. All these statistics already are available, accurate enough and simple to use, why not use them? It is really sad to see that the win chance proposed by XVM is right most of the time, but it is also good news because it means that statistics are actually very efficient to predict the outcome of a battle and thus to evaluate the balance of teams. When something free is available to improve a system, you need to either be stubborn or a fool to ignore it.