Contemporary Orphan Newspapers: Web Preservation, Research Materials, and Future Prospects

Time
2025年8月09日 13:00 ~ 13:30
Speaker
Chia Hsun Wang, Tyng-Ruey Chuang, Wu Hung-Yen
Room
TR512
Collaborative Notes
https://hackmd.io/Hkita5-Oex
MandarinElementary
Interdisciplinary Practice of Art, Culture, Local Context, and Open Data.

Abstract

Taiwan’s Apple Daily was founded in 2003, but its website stopped being updated in 2022 and has been unavailable since 2023. Since then, the newspaper’s nearly 20 years of news and commentary on Taiwan’s past life, society, entertainment, and politics have disappeared from the Internet. Similar to its sister newspaper, the Apple Daily in Hong Kong, which was dissolved for multiple reasons, the newspaper’s website, which contained a large amount of records on the lives of ordinary people, is now offline. News about important contemporary events such as the Sunflower and Umbrella movements, including written reports and audio-visual content, are no longer available online to the general public, which has had a significant impact on education, research, and the preservation of contemporary history. This report will share our experience in organizing and converting the Archive Team’s 2022 archive of Taiwan’s Apple Daily website in WARC (Web ARChive) format, and building a research dataset of hundreds of thousands of news articles. We used IPTC (International Press Telecommunications Council) ninjs (News in JSON) format to transcribe the content of this archive into a searchable database. It is hoped that our attempt will revitalize the public access and research use of such orphan newspapers.

This sharing is partly based on our April 2025 report in Web Archiving Conferene 2025 (WAC2025): Recently Orphaned Newspapers: From Archived Webpages to Reusable Datasets and Research Outlooks (). Research Outlooks (https://pid.depositar.io/ark:37281/k5p3h9k37). (Notice: The English content is automatically translated and may contain inaccuracies or misinterpretations. Please refer to the original version for the most accurate information.)

About the Speaker

Chia Hsun Wang

Chia Hsun Wang

Chia Hsun Wang is currently working at the Institute of Information Science, Academia Sinica. Previously she was with Open Source Software Foundry (OSSF) and Creative Commons Taiwan, two projects hosted at Academia Sinica. With an engineering background and a passion for promoting open and free culture, her focus revolves around the topics of digital preservation and research data management.

Tyng-Ruey Chuang

Tyng-Ruey Chuang

Tyng-Ruey Chuang is an Associate Research Fellow at the Institute of Information Science, Academia Sinica, with joint appointments at both the Research Center for Humanities and Social Sciences (Center for GIS) and the Research Center for Information Technology Innovation.

Wu Hung-Yen

Wu Hung-Yen

I’m Hung-Yen (Jimmy) Wu, a computer science student at NYCU with a strong passion for reinforcement learning, AI, and mathematical thinking. I enjoy solving tough problems through both theory and engineering — from network debugging to research in imitation learning (ICML 2025).

I’ve interned at Academia Sinica, contributed to AI and archiving projects, and founded the Math Department soccer team. Curious, hands-on, and research-driven, I’m always eager to learn and collaborate on meaningful tech challenges.