An unfair (algorithmic?) advantage: a popularity premium for low-credibility accounts on YouTube, Facebook and Instagram before the 2024 European Parliament elections

Posted on:  2024-12-08

Cover image generated with AI

EXECUTIVE SUMMARY

Analysis of YouTube, Facebook, and Instagram data in the run-up to the 2024 European Parliament elections campaign highlight a substantial “popularity premium” benefitting low-credibility accounts when compared to a control group of high-credibility accounts.

Specifically, after controlling for their number of followers, we find that low-credibility accounts’ posts gather significantly more engagement or views than those of high-credibility accounts.

  • When looking at all content posted between January 1st 2024 and June 14 2024 by the 195 accounts we tracked, we find that posts from low-credibility accounts had 2.3 times more engagement on Instagram, 3.2 times more on Facebook, and 1.4 times more views on YouTube per 1,000 followers compared to high-credibility accounts.
  • Looking specifically at the conversation around elections, in line with the Digital Services Act election guidelines, we find that this “low-credibility premium” holds when isolating EU- and election-related content. Such content from low-credibility accounts had 1.7 times more engagement on Instagram, 7.6 times more on Facebook, and 2.4 times more views on YouTube per 1,000 followers compared to that from high-credibility accounts.

We argue that, once follower numbers are controlled for, the relative over- or underperformance of some accounts is largely in platforms’ hands, as their various recommender systems account for the majority of content views on their services. We therefore hypothesize that the engagement optimization objectives of these recommender systems trump any account-credibility factors, on a systemic level.

These findings appear to run contrary to the Digital Services Act election guidelines and the Commitments taken by these platforms under the Code of Practice on Disinformation.

In addition, these results raise questions about the fairness of these algorithmic recommendations to high-credibility content creators that appear to be penalized relative to their low-credibility counterparts.


Protecting election integrity in the EU

The Digital Services Act creates a legal obligation for Very Large Online Platforms to ensure that the functioning of their services does not create systemic risks to EU societies across a number of critical areas including civic discourse, and particularly during election periods when the risks stemming from information manipulation are heightened.

To provide more details as to how platforms can safeguard elections, the European Commission published in the run-up to the European Parliament elections a detailed set of guidelines which codify a number of policies and practices, many of which are already standard practice on some platforms.

Mirroring existing platform policy principles (e.g. X/Twitter’s “freedom of speech, not freedom of reach”), one suggestion from the guidelines is for platforms’ recommender systems not to elevate content “coming from accounts that have been repeatedly found to spread disinformation”.

Although we fully recognize that measuring account-level amplification or downranking of content is challenging, we would still expect that at least some of the effects of platforms’ measures to comply with this regulation would be detectable to outside observers. We therefore set out to study which such effect could be observed.

Account-level responses to online information quality challenges

The credibility of individual pieces of content is in large part determined by source-level factors, as:

  • Editorial standards are set at the organization level and apply to all content published by the organization. For instance, academic journals require articles to be subjected to peer review before publication, European fact-checkers need to comply with a strict quality-control methodology for each of their articles, all BBC publications must abide by their organizational guidelines
  • Motivations of the account owner can systematically bias the content published by the account at the expense of information credibility. For example, state-controlled media in authoritarian countries promote the regime’s preferred narratives regardless of their accuracy, various media groups owned by a politically-motivated actor can republish the same unsupported information, some profit-driven social media accounts optimize exclusively for financial incentives without taking into account information credibility…
  • Organizations have some direct legal and regulatory responsibility for ensuring the accuracy of the content they produce. In France, TV channel C8 lost its cable license in large part due to the audiovisual regulator finding a number of its broadcasts in breach of its content quality obligations (C8 has appealed the decision). In the US, Fox News had to settle a defamation lawsuit for USD 787 million for broadcasting falsehoods about a company providing voting machines.

This focus on taking a source’s track record into account when assessing the trustworthiness of individual pieces of content it publishes is hardly new: it is among the basic tenets of information literacy programs, from Slovakia to France. A number of projects, such as the Journalism Trust Initiative or NewsGuard, are built around the idea that “who” publishes an information is critical in evaluating whether “what” is said can be trusted.

This principle is recognized in one way or another by all major social media platforms in their moderation policies, which include some provisions that apply to repeat spreaders of misinformation at the account level:

  • About half of the publicly-accessible measures taken by Meta to protect the integrity of the digital information environment around the US 2020 elections related to actor-level action,
  • YouTube has a strikes system under which an entire channel becomes penalized if its content breaks the platform’s rules more than once over a 90-day period.
  • TikTok “remov[es] accounts that repeatedly post misinformation that violates [their] policies”,
  • X (formerly Twitter) takes account-level action for content-level violations and its users can be (shadow)banned for sharing content the platform deems unsuitable, including posting misinformation about content moderation practices at X.

This account-level moderation approach is, on paper, both fair and effective:

  • Fair, as prominent misinformers are easily identified and a policy targeting repeat misinformation offenders has a low likelihood of harming other users. Although a source rated as being overall untrustworthy might not share exclusively false or misleading content (and conversely, a usually high-quality source can sometimes make mistakes), prominent misinformers are quite stable over time. For example, DeVerna et al. (2024) find that so-called misinformation superspreaders (accounts that repeatedly post misinformation and garner many interactions) remained stable from one time period to the next on Twitter/X[1].
  • Effective, as social media platforms display strong power law dynamics: a handful of users account for the vast majority of content views. A policy targeting a very small percentage of accounts would result in an outsized drop in overall levels of misinformation on the platform. For instance, De Verna et al.’s modeling finds that removing just one hundred accounts from Twitter would result in a drop in 50% of low-credibility content on the platform.

The EU regulatory framework also recognizes the need for actor-level measures

The first set of guidelines published by the European Commission under the Digital Services Act (DSA) recommends a set of measures that Very Large Online Platforms (VLOPs) can take to help safeguard the integrity of elections in the EU, thereby going some way towards complying with their obligations to mitigate the risks of “actual or foreseeable negative effects on civic discourse and electoral processes” linked to their services (DSA Article 34).

The guidelines, among many best practices, lay out some recommendations that platforms should adopt at the account level so as to reduce the prevalence and impact of mis- or disinformation on their services:

  • (27).c.v. recommends that platforms attach user-facing signals as to a source’s trustworthiness (as assessed by an external third-party using a transparent methodology),
  • (27).d.ii. suggests that recommender systems should reduce the prominence of election-related content “coming from accounts that have been repeatedly found to spread disinformation”.

These actor-level measures in the guidelines echo a corresponding set of measures in the voluntary Code of Practice on Disinformation, to which all major content-sharing platforms are signatories (except for Twitter/X, which left it in May 2023):

  • Measure 18.2., in which platforms commit to “enforce […] policies to limit the spread of harmful false or misleading information […] and take action on webpages or actors that persistently violate these policies” (commitment signed on to by Facebook, Instagram, TikTok, YouTube and LinkedIn).
  • Measure 22.1., under which platforms “will make it possible for users of their services to access indicators of trustworthiness” (out of major social media platforms however, only LinkedIn signed up to this measure).

Auditing platforms’ enforcement of actor-level measures during the European Parliament elections campaign

Social media platforms’ stated content moderation policies appear aligned with the EU regulatory framework on the relevance and feasibility of curbing disinformation in part through reducing the prominence of accounts that have repeatedly spread misinformation.

However, a policy is only as useful as its implementation. The enforcement of social media platforms’ content moderation policies has repeatedly been found lacking, particularly in a non-English-language context. As a recent illustration, Alliance4Europe and Science Feedback found that sanctioned Russian media organizations and personalities maintained a significant EU presence on Facebook, Twitter/X, YouTube, TikTok and Telegram, despite EU legislation unambiguously prohibiting it.

We therefore set out to look at whether, in line with the DSA electoral guidelines, the actions taken by some platforms on repeat spreaders of misinformation ahead of the 2024 European Parliament vote had any noticeable impact at a systemic level.

A central methodological question: how to measure reduction in prominence?

Measuring the extent to which platforms are reducing the reach of, or at least not amplifying,  content posted by given accounts assumes the existence of a baseline of what the ‘natural’ reach of such content should be.

Such a baseline is elusive. Previous efforts have notably centered on comparing a chronological feed to an algorithmically-curated one to isolate the factor over which the platform has full control, i.e. its recommendation system(s)[2] [3]. This approach, while conceptually sound, requires seldomly-available access to large-scale data from two comparable sets of users or to both a chronological and an algorithmically-curated feed from the same set of users. 

Using only publicly-available data, another method can be used to measure on-platform amplification: reach per post per follower of low-credibility accounts, using high-credibility accounts as a control group. This is the approach that this investigation followed.

The intuition behind this metric is quite straightforward: on a given platform, assume that two accounts, both dedicated to sharing news, each have 10,000 followers. One account consistently spreads conspiracy theories, displays extreme partisan bias and has opaque ownership. The other is the official outlet of a reputable news organization, with strong editorial standards, a due fact-checking process, an ethics charter and employs trained journalists. 

In line with the DSA, the Code of Practice, platforms’ moderation policies against repeat spreaders of disinformation and their public commitments to information integrity, platforms’ various recommender systems should, on balance, promote content coming from high-credibility accounts. Symmetrically, they are expected to downrank the content coming from low-credibility accounts.

In consequence, all else being equal (see the “Possible explanations, limitations and hypotheses” below for scrutiny of that assumption), we would see the high-credibility account’s posts perform much better than those of the low-credibility one, given that the same number of users – 10,000 in this example – have expressed their interest in the accounts’ production. 

Should this popularity premium for high-credibility accounts fail to materialize, it would suggest that account credibility signals, although they might exist, are not critical in choosing what to display to users, and that other factors, most likely the engagement potential of the content, dominate.

Methodological approach

1- Running data access checks to delineate which platforms can be audited

Using data access tools at our disposal (official platform APIs, third-party Python libraries, Bright Data access obtained under the Bright Data Initiative), we checked which of the platforms offered sufficient data access to conduct our study.

Facebook, Instagram and YouTube were the only ones to offer practicable access. While our approach could have been applied to LinkedIn, TikTok and Twitter/X, lack of access to publicly-available data, despite it being a requirement under DSA Article 40.12, prevented us from implementing this study on those platforms.

Telegram did not have sufficient and active high-credibility accounts to allow us to build a satisfactory control group and was consequently left out of this study.

2- Curating a dataset of high- and low-credibility accounts across the EU

Science Feedback used its Consensus Credibility Scores, a database of over 20,000 domains with an attached credibility rating drawn from the aggregation of publicly-available expert assessments of news domains.

We extracted domains that had:

  • either a very high (> 0.7) or very low (< 0.3) credibility rating, so as to minimize controversy over whether specific domains are indeed repeat disinformers. To align with DSA election guidelines, we make the assumption that domains that have very low (high) credibility frequently (rarely) publish misinformation.
  • had most of their traffic coming from one of the following EU countries: Germany, France, Poland, Slovakia, Spain, Sweden. These countries were chosen to balance EU subregions and member state population size.

From this list, we found the official social media account pages of the domain, resulting in the following coverage by platform:

Number of low-credibility accountsNumber of high-credibility accounts
Instagram2145
Facebook3433
YouTube2537

Table 1 – Summary statistics of account numbers per platform

Examples of low-credibility outlets in our sample include the official accounts of InfoVojna, Francesoir and UncutNews.

Examples of high-credibility outlets in our sample include the official accounts of El Pais, Dagens Nyheter and Die Zeit.

The full list of accounts is available on demand.

3- Collecting all content posted in the months before the election

We then collected all posts and number of views or interactions from these 195 accounts during the period January 1 – June 14 2024 (elections were held during the period June 6-9).

To match the DSA election guidelines’ focus, we then used GPT-4o to identify which posts had a relation to the EU or the EU elections.

Specifically, we submitted all post data (depending on availability: content title, description, content of the post itself, transcript of the video) to the OpenAI API, asking the following question:

Is the post talking about the European Union or EU parliamentary elections or EU institutions or EU political figures or European Parliament elections campaigns or EU policies or voting in the period June 6 to 9? (answer from 0-10)

We considered any answer strictly greater than 5 as affirmative. A human inspection of 50 randomly-selected posts resulted in 49 posts confirmed as election-related, with the outstanding post being ambiguous.

The proportion of posts related to the EU over time further confirms that the overall topic identification is correct, with a low and stable prevalence in the beginning of the year followed by a steady rise in the weeks before the election, culminating with a very sharp increase on the election weekend (see Figure 1).

Figure 1 – Proportion of EU-related posts over time (all platforms)

Number of posts from high-credibility accounts Of which, EU- or EP election- relatedNumber of posts from low-credibility accounts Of which, EU- or EP election- related
Instagram40,0091,14811,369542
Facebook160,6074,61771,8712,062
YouTube12,0111,2566,676331

Table 2 – Summary statistics of sample size

5- Processing data

For each post, we then calculated a metric of post popularity per account follower. Popularity was defined as number of views when this was accessible (on YouTube) and total number of interactions when it wasn’t (on Facebook and Instagram). Interactions were defined as the sum of the numbers of comments, reshares, likes and other reactions (each with a weight of one).

The average of content popularity per account follower was then calculated across all posts coming from high-credibility accounts and all posts coming from low-credibility accounts. The same calculation was conducted for the EU-related subsamples.

Result 1: On YouTube, Facebook and Instagram, low-credibility accounts benefited from a popularity premium in H1 2024

Figure 2 – Mean popularity per post per follower (with 95% bootstrap confidence interval) for the two account categories when including all their posts. The confidence intervals were calculated using a bootstrap approach.

Mean popularity per post per 1,000 followers – low-credibility Average popularity per post per 1,000 followers – high-credibilityLow-credibility premium factor
Instagram18.77 [17.91-19.63]8.35  [8.12-8.60]2.25
YouTube200.71 [172.54-238.12]141.05 [119.58-165.69]1.42
Facebook1.03 [0.99-1.07]0.32  [0.31-0.33]3.22

Table 3 – Mean popularity per post per follower for the two account categories (with 95% bootstrap confidence interval) and ratio of the two metrics. All posts.

Low-credibility accounts’ posts performed 42% to 222% better than their high-credibility counterparts’. High-credibility accounts do not overperform on any platform. 

Figure 3 – Mean popularity per post per follower for the two account categories (with 95% bootstrap confidence interval). EU- and EP-related posts.

Mean popularity per post per 1,000 followers – low-credibility Average popularity per post per 1,000 followers – high-credibilityDisinformers’ premium factor
Instagram13.55 [12.09-15.03]8.00 [7.31-8.75]1.69
YouTube262.33 [192.02-344.20]114.00 [60.70-180.21]2.30
Facebook1.95 [1.77-2.12]0.26 [0.24-0.27]7.50

Table 4 – Mean popularity per post per follower for the two account categories (with 95% bootstrap confidence interval) and ratio of the two metrics. EU- and EP-related posts.

On a per-follower basis, in the run-up to the European Parliament elections, high-credibility accounts’ EU-related posts appear to be significantly less popular than those of the low-credibility group. High-credibility accounts do not overperform on any platform.

Discussion

If platforms’ content recommendation systems prioritized account and content credibility in the signals they use to choose which contents to promote, we would have expected – all else being equal (see discussion about these assumptions below) – high-credibility accounts’ content to perform better than that coming from their low-credibility counterparts for two reasons:

  • By definition, high-credibility accounts publish, on balance, high-credibility content. Content from such accounts is therefore expected to be overrepresented in any recommendation system that prioritizes content quality, which should translate into a higher ‘popularity per follower’ metric for those accounts.
  • In addition to, or in the absence of, a ‘quality score’ for a given piece of content, recommender systems can also use the account’s overall ‘quality score’ as input into the recommender system. Likewise, all else being equal, this would result in more impressions of and engagement with content from high-credibility accounts, hence higher ‘popularity per follower’ scores.

This study, however, paints the inverse picture: low-credibility outlets are the ones that perform disproportionately better.

This result is both surprising and concerning. Surprising, given the regulatory obligations and public commitments taken by these platforms to reduce the reach of content from accounts that repeatedly share misinformation and promote authoritative information. Concerning, given the outsized impact of platforms’ design choices on the spread of disinformation and the harm they can cause to high-credibility publishers.

Importantly, a different approach is not only possible, but has already been implemented. To protect the US 2020 elections, Facebook reportedly tweaked its content recommendation system to give more weight to what it called “News Ecosystem Quality”, an opaque internal metric assigned at the page level to assess a news publishers’ quality, resulting in greater prominence for high-credibility media outlets.

Possible explanations, limitations and hypotheses

We do not suggest that platforms’ willingly promote low-credibility accounts at the partial expense of high-credibility ones. The most likely explanation is that some other confounding factor(s) are correlated with account credibility and explain our observations. 

  • Hypothesis 1: Engagement signals dominate account or content credibility in recommender systems’ outputs

Social media platforms have been accused of prioritizing user engagement over other concerns (e.g. 1, 2, 3). This emphasis on engagement appears to feed, at a technical level, into the recommender systems used to choose which content to promote. For instance, out of the 39 signals disclosed by Facebook in its explainer of the workings of one of the systems it uses to suggest content in users’ feed, at least 35 are engagement- (or view-)based.

If engagement-based signals are indeed dominant in recommender systems, and if content from low-credibility accounts is more engaging (possibly due to its simplistic nature, appeal to emotion, or polarizing effect), it would be logical that content from low-credibility accounts overperforms.

While this would explain our findings, whether this complies with the letter and the spirit of the DSA and the Code of Practice should be assessed.

  • Hypothesis 2: High-credibility accounts have other characteristics, in particular their size, that makes their content perform poorly

The high-credibility accounts in our sample tend to belong to established media, such as a country’s newspaper of record or major public service media. Because these accounts belong to large media organizations, they tend to post very frequently. They also have very large follower bases, possibly due to their preexisting reputation for reliability.

Account size appears to be negatively correlated with algorithmic amplification in this study on Twitter/X. If this effect was also present on Facebook, Instagram and YouTube, it could explain why their content performs worse than that of lower-credibility, but smaller, accounts. Our sample size was insufficient to conduct robust tests on the impact of account size on amplification factor, and we would welcome more results that investigated this relationship.

However, regardless of whether this hypothesis holds true, we do not believe it materially impacts the core question of platforms’ responsibility to promote high-credibility accounts. We do not see any compelling reason to effectively penalize a large high-credibility account for its size.

Likewise, the characteristics of the audience of these accounts might differ. For instance, low-credibility accounts could have audiences that spend a lot more time and/or engage at a much higher rate than the audiences of high-credibility accounts. While this would explain the low-credibility performance premium, it would raise the question of the extent to which platforms control what users see on their services (see Hypothesis 4).

  • Hypothesis 3: Disagreements between platforms’ and our estimates of accounts credibility

This study rests on the credibility assessment of the social media accounts. We took an “abundance of caution” approach by:

  • Leveraging the Consensus Credibility Scores, which are themselves aggregates of various experts’ input, therefore drawing on ensemble methods to minimize bias and maximize accuracy;
  • Setting a very low (high) threshold to label an account as low (high) credibility, to remain uncontroversial and avoid edge cases.

Although we are not making the full list of accounts public as we do not wish it to be used for commercial purposes, interested parties are welcome to request it to advance public-good projects.

  • Hypothesis 4: Factors outside the platforms’ control (in particular, other than the recommender systems) explain this “low-credibility premium”

Should users mostly access content via other means than non-algorithmically curated feeds (e.g. via off-platform links to the content), platforms’ ability to steer traffic towards high-credibility accounts would be limited. 

To the best of our knowledge, the only reliable public estimate of the share of content views driven by algorithmic systems came from comments by YouTube’s chief product officer in 2018, in which he revealed that more than 70% of content views on the platform were the result of an algorithmic recommendation. We see no reason why that proportion would have dropped since then.

To the contrary, the growing share of social media time spent on mobile platforms and the introduction of formats primarily consumed through recommender feeds (Instagram Reels, YouTube Shorts) have likely resulted in an increase in the share of content views happening on algorithmically-curated surfaces.

It hence appears highly likely that platforms do exercise significant control over what their users see, and that they would be able to promote content posted by high-credibility accounts more so than that coming from low-credibility accounts. We would however welcome more extensive per-platform data disclosure on the percentage of content views that occur on such algorithmically-curated surfaces.

Societal implications

Overall, we fail to identify reasons that could justify this “low-credibility premium” while remaining consistent with platforms’ commitments and obligations to promote high-credibility accounts over low-credibility ones, particularly in an electoral context.

This low-credibility premium on YouTube, Facebook and Instagram is a problem on two counts.

First, it effectively contributes to reducing the reach of high-quality content producers, raising basic issues of fairness, freedom of expression and freedom of information. Given finite amounts of user attention, artificially elevating low-credibility accounts comes at the expense of all other accounts on the platform, including high-credibility ones.

Second, it increases the overall prevalence of disinformation and reduces that of high-quality content, with profound systemic ramifications on the overall quality of the online information environment.

We therefore recommend that YouTube, Instagram and Facebook review whether their various recommender systems are effective in promoting high-credibility voices.

We recommend that other platforms (particularly LinkedIn, TikTok and X/Twitter) ensure sufficient data access under DSA Article 40.12 to allow for such studies.

Academic references


The account credibility datasets were developed as part of a project supported by the EMIF.

0

The sole responsibility for any content supported by the European Media and Information Fund lies with the author(s) and it may not necessarily reflect the positions of the EMIF and the Fund Partners, the Calouste Gulbenkian Foundation and the European University Institute.


This work was supported by The Bright Initiative, powered by Bright Data, which offers public-interest organizations pro bono access to large-scale web data collection tools.

Science Feedback is a non-partisan, non-profit organization dedicated to science education. Our reviews are crowdsourced directly from a community of scientists with relevant expertise. We strive to explain whether and why information is or is not consistent with the science and to help readers know which news to trust.
Please get in touch if you have any comment or think there is an important claim or article that would need to be reviewed.

Published on:

Related Articles