In the world of generative AI, non-dominant languages in various regions may face marginalization. How did we rally and integrate local languages and corpora, collaborating with the IMA Association, the literary community (writer Chang-Song Hu and numerous other literary figures), academia (Professor Yuan-Fu Liao from National Yang Ming Chiao Tung University), and the AI engineering sector (Phison Electronics) to create an LLM that better understands Taiwan’s local languages and contexts?
Join us to learn about this remarkable journey of open-sourcing Taiwan’s local languages. You will hear about:
The fantastical journey of rallying Taiwan’s literary community to open-source local language corpora.
An introduction to the open-source corpus Taiwan Tongues (Taiwan General Language Corpus).
How to use Taiwan’s local corpora (using Taiwanese Hokkien as an example) to train an LLM that deeply understands Taiwan’s languages.
Open-sourced training methods and code, enabling you to easily create your own Taiwan-specific LLM.
Unleashing creativity to build your own applications for Taiwan’s local language LLMs.