misinformation project planning, pt 1

misinformation project planning, pt 1

some early thoughts, a work-in-progress (to be updated!)
﻿
2019 Canadian federal election sentiment analysis
what are the main sentiments around the election? are there unexpected clusters? how are the hashtags being used? who are the actors? are there clear clusters of controversial or misleading content?
collect elections-related tweets and do unsupervised learning on them
e.g. for all tweets under #cdnpoli, encode them, run unsupervised clusters, and label these groupings
scope: Canada 2019 official hashtags: #cdnpoli, #elxn43, #polcan, #ItsOurVote, #CestNotreVote
﻿here's what Twitter is already doing for Canada 2019
scope 2: (if the above is interesting) US 2020 hashtags: #2020election, #2020, #presidentialelection, etc...
dashboard items to build,  inspired by the Hamilton 2.0 dashboard﻿
[x] figure out how the hell to use Dash by Plotly (i spent a couple of hours trying to use Mozaik and other JavaScript-heavy frameworks, got super confused, and have a newfound appreciation for the complexity front-end)
infrastructure + pipelines
[ ] schedule daily pipeline to pull more tweets (this is a manual pull for now...)
[x] cache to disk whenever possible 
[ ] deploy to free Google App Engine (using GCP instead)
general stats
[/] dropdown for week selection, (done where it matters)
[x] # tweets, 
[x] # distinct accounts, 
[x] # hashtags, 
[ ] # countries (probably not doable due to API limit)
[ ] tweet volume week-over-week (captured below)
[x] tweet volume by date
account-level
[x] top 10 accounts by tweets 
[x] top 10 accounts by likes 
[x] top 10 accounts by retweets 
[ ] hook up both to the official Twitter API to get verification status
[x] top 10 mentions by Twitter handles
tweet-level
[x] top 10 tweets by retweets 
[x] top 10 tweets by likes 
[ ] hook up to Twitter API to get permalink for tweets
language + hashtags
[x] top 50 hashtags
[ ] top 10 key phrases (too nebulous for now)
[ ] top 10 topics discussed (via topic modeling?)
[x] top entities (via named entity resolution)
[x] top entities mentioned by politician
[ ] clusters of most similar tweets as deemed by Universal Sentence Encoder + k-means (or UMAP, LLE, IsoMap + other sklearn manifold dimensionality reduction techniques)
other places on the web 
[x] top 10 links 
[x] top 10 domains linked
[ ] top 10 linked news articles (it is difficult to identify a news article)
politician-level
[ ] top 10 tweets retweeted by major politicians
[x] num followers, num tweets...
[x] top 5 tweets by likes
[x] top 5 tweets by retweets
[ ] who do they retweet the most? (they rarely do)
[x] most common hashtags used 
[x] tweet volume over time
most common hashtags mentioned alongside
other
[x] tweets by hour in EST
top 10 countries
﻿
misinformation classifier 
given the existing data available on the Elections Integrity Hub, can we train a classifier to score individual tweets on their likelihood to be misinformation?
train a model on the existing tweets dataset, build pipelines to parse tweets in real-time and score elections-related tweets on the following: 
likelihood to be from fake account
likelihood to be propaganda 
note: it would be really neat to make this a fact-checking task (i.e. score tweets with claims based on their truth value) but this is likely actually not feasible for so many reasons
scope: 2020 American election, 2019 Canadian election
turn this into a webapp?
﻿
lower priority
heat map of fake accounts around the world
﻿
reading material 
https://medium.com/digintel/beijings-computational-propaganda-goes-global-the-significance-of-china-s-debut-as-a-e220145dc90a
https://www.nytimes.com/interactive/2019/09/18/world/asia/hk-twitter.html
https://medium.com/digintel/welcome-to-the-party-a-data-analysis-of-chinese-information-operations-6d48ee186939
﻿fever.ai (very sad that i missed it this year!)
https://popular.info/p/massive-i-love-america-facebook-page
https://s3.amazonaws.com/kf-site-legacy-media/feature_assets/www/misinfo/kf-disinformation-report.0cdbb232.pdf

To reply you need to sign in.