Widely-used machine learning method is flawed

One algorithmic approach used to model and analyze complex networks is fundamentally flawed and fails to capture the full properties of real-world complexities, according to a new study published in Proceedings of the National Academy of Sciences.

Specifically, the techniques known as low-dimensional embeddings, which are used as input to machine learning models, have shortcomings despite being commonly used.

"It's not that these techniques are giving you absolute garbage. They probably have some information in them, but not as much information as many people believe," C. "Sesh" Seshadhri, first author of the study and associate professor of computer science and engineering in the Baskin School of Engineering at UC Santa Cruz, said in a statement.

One example of how the method is flawed is within social networks, which is a complex network that many companies want to use machine learning in order to predict social behavior and recommendations for users. A person’s position in a social network is converted into a set of coordinates in a geometric space through embedding techniques, which then creates a numbered list of people used in an algorithm. The conversion allows the system to make predictions based on the relationships in the points in space, according to the press release.

"That's important because something abstract like a person’s ‘position in a social network' can be converted to a concrete list of numbers,” Seshadhri said. “Another important thing is that you want to convert this into a low-dimensional space, so that the list of numbers representing each person is relatively small.”

However, the structure loses “significant structural aspects of complex networks” through the embedding techniques process. Researchers proved this mathematically and confirmed it through empirical testing. In particular, triangle structures, which represent connections between three people in social networks, can overlook parts of community structure.

"We're not saying that certain specific methods fail,” Seshadhri said. “We're saying that any embedding method that gives you a small list of numbers is fundamentally going to fail, because a low-dimensional geometry is just not expressive enough for social networks and other complex networks.”

However, there are many types of embedding techniques, and other research shows that certain techniques may work better for certain tasks. The study underscores that the growing importance of machine learning and AI models in society––and the methods behind their predictions––should continually be analyzed. The study also shines a light on the “black box” problem of AI, which makes it more difficult to decipher why AI and machine learning models end up with the results they do.

"We have all these complicated machines doing things that affect our lives significantly. Our message is just that we need to be more careful about evaluating these techniques," Seshadhri said. "Especially in this day and age when machine learning is getting more and more complicated, it's important to have some understanding of what can and cannot be done."