Statistical Linguistic Analysis for User Chat Message Logs
📅 Feb. 2021 - Jul. 2021 • 📰 Slides • Github
In this project, I contributed to the development of an interactive dashboard to analyze user chat logs and describe their linguistic behavior. It is meant for smaller companies and organizations to easily understand the linguistic patterns of their user base. Many applications are integrated with social chatting systems, including video games, dating apps, and social media. With this dashboard, even verbose chatters with thousands of logged chat messages can be summarized, evaluated, and compared with other users at a glance. This is enabled through sentiment analysis, clustering, style transfer, and generative modeling. Then, downstream use cases include flagging/suspending/banning toxic users, recommending advertisements or posts to users, and even using their “virtual” chatbot counterpart to predict their behavior to new inputs. The dashboard is powered with Jupyter Notebooks and Voilà , tested with pytest, documented with Sphinx, and deployed using AWS (EC2 and S3).
Within my group of collaborators, I mainly worked on the UI, deployment, and generative modeling. The generative modeling utilizes the BlenderBot chatbot based on RoBERTa, along with unsupervised style transfer based on a Seq2Seq Transformer from Krishna Et. al. As a proof of concept, a Discord chat dataset was used.
Other collaborators on this project: Shivad Bhavsar, Rex Chen, Jiayi Luo, Rongxiang Zhang, Kevin Youssef.