Collecting conversations: three approaches to obtaining user-to-user communications data from virtual environments

Mika Lehdonvirta, Vili Lehdonvirta, Akira Baba


Transcripts of conversations are a valuable research resource in social sciences and can be used to make inferences about subjects’ behaviour and intentions. Large-scale communications records can be coded and analysed statistically for generalisable results. Virtual environments are a good place to gather communications records, because they exhibit a wide variety of subject behaviours. However, compared to traditional channels such as forums and chat rooms, virtual environments can be more challenging to obtain data from. In this article, we describe three approaches to collecting user-to-user communications data from virtual environments: requesting back-end records from the operator of the environment, recruiting “data donors” among the users, and setting up researchers’ own “listening posts”. The data collection approaches are evaluated empirically in Uncharted Waters Online, a Japanese massively-multiplayer game. Avatar gender ratio is used as a diagnostic variable to compare the representativeness of the resulting data sets. Both data donors and listening posts yielded data with a gender ratio that corresponds to the back-end records, but the back-end gender ratios differed significantly between two different servers. We conclude that all three approaches can be statistically viable: the choice of method depends more on desired sampling scope and on practical factors such as resources and timetable; but when defining a sampling frame, it cannot be assumed that one server is necessarily representative of the whole platform.


chat log, content analysis, lurking, methodology, online game, research ethics

Full Text:



The full website for the Journal of Virtual Worlds Research can be found at