Data Scientists Tackle Archives Built by Social Media Companies after Google, Facebook, and Twitter Release Data on Political Ads (More or Less)

0
227

The Need for Manual Labor and Artificial Barriers Sometimes Thwarts Transparency:  President Trump and Senate Republican PAC are Big Spenders while Facebook Advertisers Lean Left and Google Advertisers Lean Right…

Using cutting-edge machine learning and data scraping tools, computer scientists at the New York University Tandon School of Engineering today released the first database and analysis of political advertising based on more than 884,000 ads identified by Google, Twitter, and Facebook.

The team launched their user-friendly Online Political Ads Transparency Project in July with data from Facebook, which was the first company to provide it. But the researchers were forced to switch techniques when Facebook blocked their data collection two weeks later. Today’s report is the first to include not only Facebook (including Instagram), but data newly shared by Twitter and Google.

Although they found numerous roadblocks to meaningful transparency – ranging from faulty archives constructed in haste by the social media giants to varying definitions of “political advertising” and throttling of data collection by Facebook – NYU Tandon Computer Science and Engineering Assistant Professor Damon McCoy and his team nonetheless reported meaningful insights:

  • President Donald Trump and his PAC registered the largest number of ads of any candidate, due in large part to the preponderance of small, micro-targeted advertising. Virtually all were aimed at raising funds during the study period, September 9-22, 2018. The researchers found similar dominance by President Trump in their initial, Facebook-only, analysis.
  • The Democratic candidate for Senate from Texas, Beto O’Rourke, continued to be the apparent largest spender, mostly seeking small donations from outside his state via Facebook and Twitter. Although O’Rourke was the rare federal candidate unaffiliated with a PAC, he was like other candidates in using social media to raise funds outside their districts, McCoy noted.
  • The Senate Leadership Fund, a Republican Super PAC, was the largest spender on Google and across all three platforms combined.
  • Priorities USA, a left-leaning PAC, was among the big spenders, but exact figures are not available because it collaborated on ad placements with other PACs.
  • Left-leaning organizations are the big spenders on Facebook and Twitter; on Google, the trend is reversed.
  • Facebook apparently carries the most political ads, but Google apparently ranks higher in impressions and spending. This is due, in part, to the large number of small, micro-targeted ads on Facebook (60 percent) and because the majority of spending on Google (61 percent) is by PACs, which are more like to have large budgets. But analysis is muddied by the fact that both Google and Facebook disclose only ranges; only Twitter discloses exact spending and impressions. Each of the giants also defines “political advertising” differently. For example, Facebook alone includes non-media for-profit companies promoting slanted political content, companies selling merchandise with political messages, and solar panel firms with environmental messages. Google and Twitter, meanwhile, limited their reporting to only federal candidates, at least initially.
  • PACs accounted for 23 percent of the spending on Facebook during the study period.
  • The very top spenders during the study period on Facebook, though, were Facebook itself and its own Instagram – Facebook to publicize its responses to Russian election hacking and Instagram to spread a get-out-the-vote message. But the researchers pointed out that the company seemed to overcharge itself, based upon impressions.

Collaborators on the Online Political Ads Transparency Project are NYU Tandon doctoral student Laura Edelson, NYU Shanghai visiting undergraduate student Shikhar Sakhuja, and Ratan Dey, a former NYU doctoral student studying under Professor Keith Ross and now an assistant professor of practice in computer science at NYU Shanghai.

McCoy conceived the project to build easy-to-use tools to collect, archive, and analyze political advertising data. Although Facebook became the first major social media company to launch a searchable archive of political advertising, for both Facebook and Instagram, in May 2018, McCoy found the archive difficult to use, requiring time-consuming manual searches. He decided to apply versions of the data scraping techniques he had previously used against criminals, including human traffickerswho advertised and used Bitcoin.

Despite the difficulty the team subsequently encountered accessing Facebook data, they report it has by far the most comprehensive political archive among the three social media companies. The report outlines problems with the API – an interface with other platforms – introduced in beta form by Facebook to allow researchers access to its archives.

Google’s data is the easiest for the public to access, as a BigQuery dataset, available in its entirety via the Google Cloud service. But it is updated in real time, with no archiving, so the NYU researchers are capturing the data daily, to share and archive.

Twitter has no easily accessible political ad archive, so the NYU research team is scraping all political advertising data identified by Twitter and sharing and archiving for the public, as well.

Although the researchers used the September period for comparison purposes, they have now compiled data from late May through October 3, with a gap of about six weeks while Facebook blocked its data scraping. They praised the social media companies for implementing fixes they recommended and continue to work toward transparency.

The work was funded in part by the National Science Foundation under a grant to McCoy for research that explores bias and the manipulation of online data.

Visit the project and download data at: https://online-pol-ads.github.io.

NYU Tandon School of Engineering Logo (PRNewsFoto/NYU Tandon School of Engineering)

SHARE