Algorithms are often represented in the form of logical diagrams, also known as algorigrams, in which the conditions (“if…, then…”) appear in diamonds, and are connected to rectangles that contain the instructions.
Click for a full size image
Algorithms: a Threat to Democracy?
Algorithms are increasingly present in our daily lives and decision-making processes. Yet in the era of big data and artificial intelligence, the lack of transparency of some automated data processing could threaten the rule of law and democracy. Computer scientist David Monniaux deciphers the logic at work in the algorithmic method, and points to its potential dangers.
Once unfamiliar to the general public, the term “algorithm” now crystallizes the hopes and fears sparked by the big data revolution. The notion of algorithm, however, was defined and used by mathematicians and computer scientists well before it made the headlines.
Did you say algorithm?
An algorithm is no more than a finite series of instructions for resolving a problem, which does not require reflection or inventiveness on the part of those performing it. We all use algorithms in our daily lives without necessarily realizing it: when changing a wheel, for instance, or when preparing pancake batter from a recipe. In fact, IT developers use hundreds of algorithms enabling machines to complete geometrical calculations, simulate seismic waves, calculate the shortest path between two points on a map, etc.
Sorting algorithms are a simple example of algorithms that have always been of particular interest to mathematicians, computer scientists, and more recently search engines. As their name suggests, these algorithms are used for automatically sorting and arranging the items in a list according to predetermined criteria: alphabetic order, size of words, or any other property associated with the items in the list, such as a popularity score.
The most basic of these algorithms is probably the selection sort. An alphabetical selection sorting algorithm of a list of names can be described through the following series of instructions:
1 scan through the names one by one,
2 find the one that comes first in alphabetical order,
3 exchange it with the first name on the list,
4 repeat step 1 until the list is completely sorted.
Using this method to sort the following list: Jean, Fatima, Kévin, Cécile, Anne, involves scanning through the names, finding the first one in alphabetical order (Anne), and swapping it with the first name on the list (Jean), thus ending up with the sequence: Anne, Fatima, Kévin, Cécile, Jean. The same step is repeated for the second item by scanning through the list again, finding the first name in alphabetical order (Cécile), swapping it with the second name (Fatima), and ending up with: Anne, Cécile, Kévin, Fatima, Jean… And so on, starting from the third item, and then the fourth. At the end, the list is sorted!
Such selection sorting works very well, but requires a series of operations, causing calculation times to quadruple whenever the size of the initial list doubles! Less intensive sorting algorithms do in fact exist. Algorithmics is the science of designing and analyzing algorithms: researchers try to develop the least time- and memory-consuming formulas or—in a timely response to current concerns—the most energy-efficient.
The choice of criteria
Sorting items in a list by alphabetical order is an operation that is perfectly defined mathematically. However, we rely on information technology to answer questions that are much more vague: what are the most relevant web pages for a recipe? Or about the Holocaust? Who is a potential terrorist or not? How should students be evenly distributed across various branches?
The development and selection of sorting methods adapted to large volumes of data—which are a subject of research in themselves—is important, although the choice of sorting criteria is possibly even more crucial. For example, what score should be given to a web page to indicate that it is more relevant than another? Each search engine has its own formula. One of the most frequently used criteria for determining the quality of a page is the number and reputation of other pages pointing to it—the idea being that a document must be worthwhile if it features in many interesting papers.
This type of classification gives the illusion of objectivity, although it is important to bear in mind that at one point or another, it is the people who choose the classification criteria based on their intuition… and biases. The apparently technical nature of the process can thus conceal political choices, or makeshift arrangements. For that matter, companies specializing in search engine optimization propose methods for artificially improving page rank, while search engines continually change their criteria to counter them.
In some cases, ranking criteria are no longer directly determined by the system developers, but by automatic learning instead. On the web, for example, it is possible to observe users’ reactions to the choices offered, and automatically refine the criteria.
At the other extreme, there are deep learning techniques in which the highly complex model has a great many parameters, which are adjusted in light of representative examples of the properties to be assimilated. For example, the system is shown a series of photos that also include the names of the objects they depict. Upon completion of the learning process, the model will be able to provide the description of a photograph it has never seen before.
Yet again, despite a certain objectivity of the criteria—which this time are not chosen by an individual but “emerge” from the data, the choice and adjustment of learning methods, as well as the way the data is presented significantly affect the type of rank obtained, and are down to experts who must master both automatic learning and the field of application. In fact, it is not necessarily possible, even for specialists, to understand and explain the criteria effectively retained by automatic learning.
A necessary transparency
The rule of law supposes that citizens already know the rules that will apply to them, and can question the way they are applied, which should prevent the use of obscure algorithms based on unavailable data. Yet as previously mentioned, there are many cases in which arcane automated processing determines what we have access to, whether it be the admission of undergraduate students after their A Levels, or the choice of airline passengers subjected to extended security controls. Citizens therefore have no control.
The main danger resides in the attraction that magic solutions exert over decision-makers… and sometimes over individuals as well. There are promises of an automatic method, where one would only have to collect data and press a button to get answers. The possibility of processing large volumes with low personnel costs is so attractive that its potential biases are overlooked, along with the modifications they require! In the case of automatic detection of suspicious behavior for example, one tends to forget that for simple statistical reasons, even a highly precise method applied to a non-target population—in which few individuals are being sought—tends to produce more false positives (innocent people wrongly suspected) than true positives.
The blind application of criteria deduced through automatic learning can lead to undesirable results. For instance, learning could determine (potentially via indirect criteria) that those who commit robbery more frequently come from disadvantaged backgrounds, and that hiring people from underprivileged environments should therefore be avoided. On a large scale, this would obviously entail a counter-productive policy (increased unemployment in sections of the population leading, in a vicious cycle, to exclusion and poverty, and hence rising delinquency).
True enough, our governments did not wait for information technology to ask for irrelevant information, or to apply arbitrary decisions, and despite their shortcomings, search engines are infinitely more powerful than the catalogs of yesteryear. However, the transparence of selection criteria is a prerequisite for democratic debate. Concerning algorithms, the notion that “the machine decided” should not conceal unavowed political choices.
source: Centre national de la recherche scientifique