Skip to main content
Skip to main menu Skip to spotlight region Skip to secondary region Skip to UGA region Skip to Tertiary region Skip to Quaternary region Skip to unit footer

Slideshow

Building a Korean Twitter Corpus using Python

Wonbin Kim

University of Georgia

Abstract

A corpus is a collection of texts, which is an important language resource where we can observe how people actually use language. It has been widely used in various fields such as lexicography and natural language processing (e.g., Hanks, 2012; Pustejovsky & Stubbs, 2012) as well as linguistics. However, despite its importance, the Korean national corpora have not been updated since 2007. Also, the Yonsei Twitter Corpus, which is a large-scale Korean Twitter corpus, consists of old data. Thus, this paper aims to build a new Korean Twitter corpus on the basis of up-to-date data and present how to create a Korean Twitter corpus by means of Python.

 

Kim, Wonbin. 2022. Building a Korean Twitter Corpus using Python. UGA Working Papers in Linguistics 5, 123-141. The Linguistics Society at UGA: Athens, GA.

 

View this article on Athenaeum

 

Previous Article Table of Contents Next Article

Support Linguistics at UGA

Your donations to the Department of Linguistics will support research and travel opportunities for students and faculty and other initiatives to enhance students' education in linguistics. Please consider joining other friends and alumni who have shown their support by making a gift to our fund. We greatly appreciate your contributions to the success of our programs!  

EVERY DOLLAR CONTRIBUTED TO THE DEPARTMENT HAS A DIRECT IMPACT ON OUR STUDENTS AND FACULTY.