An algorithm replicates the ants' behaviour in a computer, proving it can be an accurate way of predicting the population density of a network.

Ants, as it turns out, are really good at a lot of things—and not just lifting and communicating. It appears that ants also have the whole voting thing to select a new nest down to a science. Biologists have long suspected that ants base their population-density estimates on the frequency with which they—literally—bump into other ants while randomly exploring their environments.

This long-studied ability recently received new support from researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), who created an algorithm that replicates the ants' behaviour in a computer and proves that it can be an accurate way of predicting the population density of a network.

"It’s intuitive that if a bunch of people are randomly walking around an area, the number of times they bump into each other will be a surrogate of the population density," said Cameron Musco, an MIT graduate student in electrical engineering and computer science and a co-author on the new paper. "What we’re doing is giving a rigorous analysis behind that intuition, and also saying that the estimate is a very good estimate, rather than some coarse estimate. As a function of time, it gets more and more accurate, and it goes nearly as fast as you would expect you could ever do."

**Random walks**

Musco and his co-authors—his advisor, NEC Professor of Software Science and Engineering Nancy Lynch, and Hsin-Hao Su, a postdoc in Lynch’s group—characterise an ant’s environment as a grid, with some number of other ants scattered randomly across it. The ant of interest—call it the explorer—starts at some cell of the grid and, with equal probability, moves to one of the adjacent cells. Then, with equal probability, it moves to one of the cells adjacent to that one, and so on. In statistics, this is referred to as a “random walk.” The explorer counts the number of other ants inhabiting every cell it visits.

In their paper, the researchers compare the random walk to random sampling, in which cells are selected from the grid at random and the number of ants counted. The accuracy of both approaches improves with each additional sample, but remarkably, the random walk converges on the true population density virtually as quickly as random sampling does.

That’s important because in many practical cases, random sampling isn’t an option. Suppose, for instance, that you want to write an algorithm to analyse an online social network—say, to estimate what fraction of the network self-describes as Republican. There’s no publicly available list of the network’s members; the only way to explore it is to pick an individual member and start tracing connections.

Similarly, in ad hoc networks, a given device knows only the locations of the devices in its immediate vicinity; it doesn’t know the layout of the network as a whole. An algorithm that uses random walks to aggregate information from multiple devices would be much easier to implement than one that has to characterise the network as a whole.

Musco said he and his colleagues initially assumed that this was a liability that an algorithm for estimating population density would have to overcome. But their attempts to filter out oversampled data seemed to worsen their algorithm’s performance rather than improve it. Ultimately, they were able to explain why, theoretically.

"If you’re randomly walking around a grid, you’re not going to bump into everybody, because you’re not going to cross the whole grid," Musco said. "So there’s somebody on the far side of the grid that I have pretty much a zero per cent chance of bumping into. But while I’ll bump into those guys less, I’ll bump into local guys more. I need to count all my interactions with the local guys to make up for the fact that there are these faraway guys that I’m never going to bump into. It sort of perfectly balances out. It’s really easy to prove that, but it’s not very intuitive, so it took us a while to realise this.”

The grid that the researchers used to model the ants’ environment is just a special instance of a data structure called a graph. A graph consists of nodes, typically represented by circles, and edges, typically represented as line segments connecting nodes. In the grid, each cell is a node, and it shares edges only with those cells immediately adjacent to it.

But the techniques apply to any graph, such as one describing which members of a social network are connected, or which devices in an ad hoc network are within communication range of each other.

"If they were robots instead of ants, they could get gains by talking to each other and saying, 'Oh, this is my estimate,'" said Musco.