misinformation project planning, pt 1

some early thoughts, a work-in-progress (to be updated!)



2019 Canadian federal election sentiment analysis

what are the main sentiments around the election? are there unexpected clusters? how are the hashtags being used? who are the actors? are there clear clusters of controversial or misleading content?

  • collect elections-related tweets and do unsupervised learning on them

    • e.g. for all tweets under #cdnpoli, encode them, run unsupervised clusters, and label these groupings

    • scope: Canada 2019 official hashtags: #cdnpoli, #elxn43, #polcan, #ItsOurVote, #CestNotreVote

    • here's what Twitter is already doing for Canada 2019

    • scope 2: (if the above is interesting) US 2020 hashtags: #2020election, #2020, #presidentialelection, etc...

  • dashboard items to build, inspired by the Hamilton 2.0 dashboard

    • [x] figure out how the hell to use Dash by Plotly (i spent a couple of hours trying to use Mozaik and other JavaScript-heavy frameworks, got super confused, and have a newfound appreciation for the complexity front-end)

    • infrastructure + pipelines

      • [ ] schedule daily pipeline to pull more tweets (this is a manual pull for now...)

      • [x] cache to disk whenever possible

      • [ ] deploy to free Google App Engine (using GCP instead)

    • general stats

      • [/] dropdown for week selection, (done where it matters)

      • [x] # tweets,

      • [x] # distinct accounts,

      • [x] # hashtags,

      • [ ] # countries (probably not doable due to API limit)

      • [ ] tweet volume week-over-week (captured below)

      • [x] tweet volume by date

    • account-level

      • [x] top 10 accounts by tweets

      • [x] top 10 accounts by likes

      • [x] top 10 accounts by retweets

      • [ ] hook up both to the official Twitter API to get verification status

      • [x] top 10 mentions by Twitter handles

    • tweet-level

      • [x] top 10 tweets by retweets

      • [x] top 10 tweets by likes

      • [ ] hook up to Twitter API to get permalink for tweets

    • language + hashtags

      • [x] top 50 hashtags

      • [ ] top 10 key phrases (too nebulous for now)

      • [ ] top 10 topics discussed (via topic modeling?)

      • [x] top entities (via named entity resolution)

      • [x] top entities mentioned by politician

      • [ ] clusters of most similar tweets as deemed by Universal Sentence Encoder + k-means (or UMAP, LLE, IsoMap + other sklearn manifold dimensionality reduction techniques)

    • other places on the web

      • [x] top 10 links

      • [x] top 10 domains linked

      • [ ] top 10 linked news articles (it is difficult to identify a news article)

    • politician-level

      • [ ] top 10 tweets retweeted by major politicians

      • [x] num followers, num tweets...

      • [x] top 5 tweets by likes

      • [x] top 5 tweets by retweets

      • [ ] who do they retweet the most? (they rarely do)

      • [x] most common hashtags used

      • [x] tweet volume over time

      • most common hashtags mentioned alongside

    • other

      • [x] tweets by hour in EST

      • top 10 countries



misinformation classifier

given the existing data available on the Elections Integrity Hub, can we train a classifier to score individual tweets on their likelihood to be misinformation?

  • train a model on the existing tweets dataset, build pipelines to parse tweets in real-time and score elections-related tweets on the following:

    • likelihood to be from fake account

    • likelihood to be propaganda

  • note: it would be really neat to make this a fact-checking task (i.e. score tweets with claims based on their truth value) but this is likely actually not feasible for so many reasons

  • scope: 2020 American election, 2019 Canadian election

  • turn this into a webapp?



lower priority

  • heat map of fake accounts around the world



reading material

  • https://medium.com/digintel/beijings-computational-propaganda-goes-global-the-significance-of-china-s-debut-as-a-e220145dc90a

  • https://www.nytimes.com/interactive/2019/09/18/world/asia/hk-twitter.html

  • https://medium.com/digintel/welcome-to-the-party-a-data-analysis-of-chinese-information-operations-6d48ee186939

  • fever.ai (very sad that i missed it this year!)

  • https://popular.info/p/massive-i-love-america-facebook-page

  • https://s3.amazonaws.com/kf-site-legacy-media/feature_assets/www/misinfo/kf-disinformation-report.0cdbb232.pdf

i love stories — write me a twitter dm 🔮✨

To reply you need to sign in.