Order of edge list affects betweenness scores in edge.betweenness.community

Asked by Clinton Leach

I'm trying to use edge.betweenness.community to identify community structure in a network with about 400 vertices and 1600 edges. I have the graph stored as an edge list in a csv file, which I read into R and use graph.data.frame to create the graph object which I run edge.betweenness.community on. If I change the order of the links in the original csv file, I get different betweenness scores for the edges that the algorithm removes, when it seems like they should be the same, as the graphs should be identical. Changing the order of the links in the csv file also seems to change the number of groups in the structure that maximizes modularity. What could be causing this?

Question information

Language:
English Edit question
Status:
Solved
For:
igraph Edit question
Assignee:
No assignee Edit question
Solved by:
Tamás Nepusz
Solved:
Last query:
Last reply:
Revision history for this message
Tamás Nepusz (ntamas) said :
#1

I suspect that the following thing happens. At some point during the edge.betweenness.community algorithm, there exist two edges in the graph with exactly the same betweenness centrality. The algorithm is then free to choose one of them, and such choice may depend on the actual ordering of the edges. Of course, the resulting new betweennesses after the two possible removals will be different unless your graph is highly symmetric. The same reasoning applies to the case of modularity optimization, as there might be a point at some stage where there exist two pairs of groups s.t. the merging of any of the pairs would yield exactly the same amount of increase in modularity, and then the algorithm is free to choose again. (I struggled a lot with these when I implemented the fast greedy modularity optimization as I wanted to make sure that the results match the original implementation of Clauset et al exactly until I realized that such ties exist even with simple model networks like the Zachary karate klub network).

By the way, the partition you get after the fast greedy modularity optimization process is not necessarily the one that maximizes the modularity. In fact, it is more often not the actual global maximum. All the modularity optimization algorithms in igraph 0.5.3 are just heuristics that try to find an optimal partition, but determining the actual maximum would be an NP-complete problem.

Revision history for this message
Clinton Leach (clint-leach) said :
#2

Thank you, that makes sense.

So if, in the event of a tie, all of the edges that tied for the highest betweenness score were removed simultaneously, could I expect to see the two networks fragment the same way? Would this fix the problem of getting different results for different link orderings, or is it possible that the difference could come from the actual calculation of betweenness scores rather than the order of removal?

Revision history for this message
Best Tamás Nepusz (ntamas) said :
#3

Indeed, if all the edges that tied for the highest betweenness score were removed simultaneously, the behaviour of the algorithm would be totally deterministic. The reason why we don't do it this way is that the resulting dendrogram could theoretically have branching points that split a cluster into more than one sub-clusters at the same time, and this cannot be represented in the "merges" matrix that is returned by the igraph_community_edge_betweenness function in the C core.

Since the edge betweenness community detection is relatively simple, you can easily implement the variant proposed by you in plain R using igraph's edge.betweenness and delete.edges functions if you want a totally deterministic algorithm that is not affected by the ordering of the edges.

Revision history for this message
Clinton Leach (clint-leach) said :
#4

Great, that's what I was thinking about trying, so I will give that a shot. Thank you for your help.

Revision history for this message
Clinton Leach (clint-leach) said :
#5

Thanks Tamás Nepusz, that solved my question.