IORG studies information manipulation and identifies information operations with publicly verifiable data science methods. We need all kinds of data. Facebook and Weibo are two important social platforms in Taiwan and China, and getting data from them is more difficult than we thought. Our scrapers have encountered numerous challenges: acquiring target lists, countering blocking mechanisms, controlling scraping speed, defining data structure, enhancing efficiency on data storage and search. We would like to share our working solutions to these challenges, lessons learned for continuous operation, and how we open-sourced a hardware-software-integrated scraper system.
Over the years, the g0v community has launched open data projects, providing super valuable data for information manipulation researchers. “Cofacts” has suspicious LINE messages, “tvlogger” has TV news data, and “0archive” has web pages of static websites and forum articles from PTT. We would like to share how we extended the open data standard from “0archive” to accommodate more sources and platforms. We’d also share the way we store, index, search, and open this massive collection of data.
How do you, from the vast sea of text messages, find and observe the life cycle and dissemination network of a rumor? Aside from copy-pasting and link-sharing, a rumor can also “fork” itself or “merge” with others. Where can we draw the boundary of a rumor? We would like to share our proposed mathematical definition of messages belonging to a rumor, and an algorithm to efficiently group them. Lastly, we have mapped several rumors into their dissemination network. We’d share those too.
More information on IORG & open-source, please refer to https://iorg.tw/open.
About chihao
ID = chihao. g0ver, IORG co-director. A content management system is the manifestation of organizational governance and values.
About Shao-Hong, Lin
Master of Political Science, Department of Political Science, National Taiwan University, 2019
Data Engineer, IORG