Projects
Here are some of the projects I’ve worked on:
Research: Gamified Crowdsourcing for Idiom Corpora Construction in the Russian Language
Abstract: One of the most common shortcomings of modern natural language processing (NLP) applications (such as Google Translate, Apple’s Siri, and Samsung’s Bixby) is their poor ability to properly process multi-word expressions (MWEs). MWEs exists with both compositional and non-compositionalidiomatic meanings. The aim of this study is to understand the morphological analysis of the Russian language and to present a crowdsourcing and crowd-rating approach that is first applied and tested in the literature on idiom corpora construction. This paper also investigates UDAR - one of the most efficient finite-state converters for the Russian language. For an easy integration, a toolkit called Stanza MWEs was used in this project. Stanza MWEs helped to effectively lemmatize phrases in 29 days while the chatbot was running. The data collected during this period is included in this report to demonstrate the effectiveness of the chatbot.
Click here for the full research article.
Project Idea: CU StartHub
I am planning to develop a platform for Columbia students (temporarily, going to expand to other student soon) to search for other ambitious people to formulate and cofound their next big startup idea with. Groups who have already formed an idea can post their startup project on the website to recruit others.
This project is still in the planning phase. If you are interested in contributing to it, please shoot me an email.