※ Open Samizdat
Hi, my name is Matúš Pikuliak and Open Samizdat is a place where I put my writings. You can contact me at email@example.com. I have an RSS feed and a Twitter account available. You can also subscribe with your email address, if you wish to receive email notifications about new posts. Sign up here:
Multilingual Reading Skills of Language Models
2023-10-01 The Belebele dataset, designed for a multilingual evaluation of language models’ reading skills, was recently released with respectable 115 languages. I have noticed that the genealogical linguistic analysis in the paper is somewhat lacking, with the authors providing almost no insight into behavior of models across language families. This oversight makes it hard to understand how various models stack up against each other. To address this, I did a simple analysis concentrating on individual languages and language families, leading to some interesting discoveries.
On Using Self-Report Studies to Analyze Language Models
2023-07-30 We are at a curious point in time where our ability to build language models (LMs) has outpaced our ability to analyze them. We do not really know how to reliably determine their capabilities, biases, dangers, knowledge, and so on. The benchmarks we have are often overly specific, do not generalize well, and are susceptible to data leakage. Recently, I have noticed a trend of using self-report studies, such as various polls and questionnaires originally designed for humans, to analyze the properties of LMs. I think that this approach can easily lead to false results, which can be quite dangerous considering the current discussions on AI safety, governance, and regulation. To illustrate my point, I will delve deeper into several papers that employ self-report methodologies and I will try to illustrate some of their weaknesses.
ChatGPT Survey: Performance on NLP datasets
2023-03-27 The popularity of ChatGPT and its various impressive capabilities lead some people to believe that it is a significant step forward for linguistic capabilities over existing systems, that the field of NLP will soon be consumed by generative language models, or even that it foreshadows AGI. To test these claims, I conducted a survey of the arXiv pre-prints that compare ChatGPT with other approaches, mainly with smaller fine-tuned models. ChatGPT’s performance is not as impressive as I expected, as it is often outperformed by significantly smaller models.