Research area

Natural Language Processing

Information Extraction

Research projects

Synonym discovery (Naver)

VNLP NLP in specific domains
  • Spelling correction
  • Co-reference resolution
  • Relation extraction
  • Text summarization
  • Dependency parsing
  • News tag generation
  • Social part-of-speech tagging
  • Named entity recognition
  • (Samsung) SMS prediction
  • Information extraction: (Viettel) Cyber security, healthcare
  • Chatbots: E-commerce, telco, banking, healthcare
Thi-Nhung Nguyen, Kiem-Hieu Nguyen, Tuan-Dung Cao and Young-In Song. An Uncertainty-Aware Encoder for Aspect Detection. Findings of ACL: EMNLP 2021

Thi-Thanh Ha, Van-Nha Nguyen, Kiem-Hieu Nguyen, Tien-Thanh Nguyen and Kim-Anh Nguyen. Utilizing Bert for Question Retrieval on Vietnameses E-commerce Sites. PACLIC 2020

Thi-Trang Nguyen, Huu-Hoang Nguyen and Kiem-Hieu Nguyen. A Study on Seq2seq for Sentence Compressionin Vietnamese. PACLIC 2020

Anh-Duong Nguyen, Kiem-Hieu Nguyen and Van-Vi Ngo. Neural Sequence Labeling for Vietnamese POS Tagging and NER. 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF)

Ba-Long Bui, Thi-Trang Nguyen, Huu-Hoang Nguyen, Kiem-Hieu Nguyen. HMMs for Unsupervised Vietnamese Word Segmentation. 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF)

Kiem-Hieu Nguyen. BKTreebank: Building a Vietnamese Dependency Treebank. In Proceedings of 12th Language Resources and Evaluation Conference, LREC 2018 paper poster data demo

Viet-Trung Tran, Kiem-Hieu Nguyen and Duc-Hanh Bui. A Vietnamese Language Model based on Recurrent Neural Network. In Proceedings of 8th International Conference on Knowledge and System Engineering, KSE 2016 pdf slides dataset (email me)

Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret and Romaric Besancon. A Dataset for Open Event Extraction in English. In Proceedings of 10th Language Resources and Evaluation Conference, LREC 2016 pdf dataset (email me)

Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret and Romaric Besancon. Generative Event Schema Induction with Entity Disambiguation. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) pdf sourcecode (email me)

Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret and Romaric Besançon. Désambiguïsation d’entités pour l’induction non superviséede schémas événementiels. 22ème Traitement Automatique des Langues Naturelles

Kiem-Hieu Nguyen, Xavier Tannier and Veronique Moriceau. Ranking Multidocument Event Descriptions for Building Thematic Timelines. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers pdf poster

Kiem-Hieu Nguyen and Cheol-Young Ock. Semantic Relatedness for Biomedical Word Sense Disambiguation. Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

Kiem-Hieu Nguyen and Cheol-Young Ock. Margin perceptron for word sense disambiguation. Proceedings of the 2010 Symposium on Information and Communication Technology


Van-Hai Vu, Quang-Phuoc Nguyen, Kiem-Hieu Nguyen, Joon-Choul Shin, Cheol-Young Ock. Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags. IEICE Transactions on Information and Systems. 2020.

Thanh Thi Ha, Atsuhiro Takasu, Thanh Chinh Nguyen, Kiem Hieu Nguyen, Van Nha Nguyen, Kim Anh Nguyen, Son Giang Tran. Supervised attention for answer selection in community question answering. IAES International Journal of Artificial Intelligence. 2020

Thi-Thanh Ha, Thanh-Chinh Nguyen, Kiem-Hieu Nguyen, Van-Chung Vu and Kim-Anh Nguyen. Unsupervised Sentence Embeddings for Answer Summarization in Non-factoid CQA. Computación y Sistemas. 2018

Thi-Thanh Ha, Van-Chung Vu and Kiem-Hieu Nguyen. Towards Event Timeline Generation from Vietnamese News. CICLING 2018, Lecture Notes in Computer Science

Kiem-Hieu Nguyen and Cheol-Young Ock. Word Sense Disambiguation as a Traveling Salesman Problem. ARTIF. INTELL. REV. 2013 pdf res

Kiem-Hieu Nguyen and Cheol-Young Ock. Using Wiktionary to Improve Lexical Disambiguation in Multiple Languages. Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science

Kiem-Hieu Nguyen and Cheol-Young Ock. Diacritics restoration in vietnamese: letter based vs. syllable based model. Zhang BT., Orgun M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science

PhD thesis

Kiem-Hieu Nguyen. Knowledge-based Word Sense Disambiguation using Information Retrieval and Topic Model. PhD Thesis. Uni. of Ulsan pdf slide

BKTreebank 1.0 contains 6,900 sentences annotated with POS tagging and dependency parsing for Vietnamese. For more details on this version of the treebank, please refer to the paper:

Kiem-Hieu Nguyen. "BKTreebank: Building a Vietnamese Dependency Treebank". LREC 2018


We also provide the library written in Java (with JRE 8 or higher) with vanilla POS tagger and dependency parser as described in our paper. The download link is


The treebank is released for research purpose. In order to access to the data, please fill and submit the Google form below using an email from an academic establishment (undergraduate students or graduate students please ask your supervisors).