Disclaimer: This website is not the official project website but merely a development lab that I used and kept online because it seems useful to get an overview and to quickly request some basic information. For citation please use the corresponding paper or Zenodo resource as noted here. This website should also not be considered as a reliable API, so please do not use the web requests in your analytical processes and instead work with the downloaded resources.
Video games are one of the most influential entertainment mediums in our world and influence large parts of the society directly as a means of entertainment, education and recreation or indirectly as it is seen in social gamification processes. A multi billion dollar market has developed around this medium that is in some cases even under legal observation because of harmful effects on parts of the population.
Yet the video game world is not yet a prominent part of the scientific research process. One reason is that it is hard to answer the question how an analysis should be handled. Video games consist of bytecode that can not really be interpreted constructively and are mainly consumed via audiovisual means, which are hard enough to analyse for themselves. Another reason is also one of video games' defining properties: Interactivity. How would we be able to analyse something when its actual content is in big parts created and controlled by the person that consumes it?
The purpose of this project is to provide a solution for both of these problems and hopefully a robust starting point for empirical work in the field of Games Studies. Video game walkthroughs provide a textual representation of the video game in question and contain exactly the information that is needed to complete the game. These descriptions ignore the (theoretically infinite) variance of outcomes that are the result of the interaction element. Additionally they convert the content of a video game into text, an information medium that is routinely analysed in many ways in various research environments.
Goal: | The goal of this project is to publish a text corpus that compiles video game walkthroughs from various sources for textual analysis. |
Project Coordinator: | Dr. Jochen Tiepmar, Natural Language Processing Group, Leipzig University, https://orcid.org/0000-0002-3866-2830 |
Project Start: | 12.02.2020 |
Project End: | "When it's done" |
Contact: | jtiepmar(at)informatik.uni-leipzig.de |
Bitbucket: | https://bitbucket.org/jtiepmar/the-game-walkthrough-corpus/src |
DOI: | https://doi.org/10.5281/zenodo.4562336 |
Citation Data Set: | Tiepmar, J., and Burghardt, M., 2021. Game Walkthrough Corpus (GWTC) (Version 1.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4562336 |
Citation Paper: | Burghardt, M. and Tiepmar, J., 2021. The Game Walkthrough Corpus (GWTC) - A Resource for the Analysis of Textual Game Descriptions. Journal of Open Humanities Data, 7, p.14. DOI: http://doi.org/10.5334/johd.34
[OPEN ACCESS] |
Game walkthroughs are protected by individual copyright notices that are often very strict. That is why this data set does not include the documents but instead various data formats that are useful for text mining and distant reading methods while not allowing to recreate the documents. It is highly unlikely that even a single sentence can be reconstructed from the published data.
Since the documents are not -- not even in part -- published but only text mining statistics about them, no violation of copyright is done by this project. The data that is made available here is published as Creative Commons CC BY 3.0.
Links to the original documents are available in the data section.
You can create subsets by provided a filter in the URL as it is done in the table. The filters will work based on the CTS URNs. The filtered statistics will use ( & e m s p ; ) instead of [TAB] because HTML does not understand [TAB] and I don't know how to write .txt with Javascript similiar to the raw data files.
You can for example request information only for german documents using the filter ".deu." as it is done here and visualized here. If no filter is provided, the statistical information for the whole data set is requested.
If you are interested in specific statistics that are not covered, feel free to contact me.
To comply with copyright regulations, all data are randomized and provided in a way that makes it impossible to recreate the documents (or even a single sentence) while still being useful for analysis.
Document Statistics | Download | Format | Visualisation |
---|---|---|---|
(URLs to) Texts | Raw Data
Usage of Filter |
DocID [TAB] URL [TAB] Release Date | |
Text Length per Document (Characters) | Raw Data
Usage of Filter |
Tab separated key-value pairs |
Bar Chart
Bar Chart with Filter |
Type / Token count | Types
Tokens Type/Token |
Tab separated key-value pairs |
Bar Chart (Types)
Bar Chart (Tokens) Bar Chart (Type/Token) |
Bag of Words | Raw Data (211 MB)
Usage of Filter |
Tab separated key-value pairs with python dictionaries as values. | |
nGrams | Zipped .txt (>1 GB) | Tab separated key-value pairs with python dictionaries as values. | |
Sentence Collocations | Zipped .txt (>1.5GB) Sentence order and order of tokens per sentence are randomized. | Tab separated key-value pairs with lists of sentences represented by python dictionaries. | |
TF IDF | English (>230 MB) German (>20 MB) | Tab separated key-value pairs with python dictionaries as values. | |
Walkthrough Documents per Game | Raw Data Document count |
Tab separated key-value pairs with comma separated values |
Bar Chart
Bar Chart with filter (Metal Gear Solid) |
The metadata is compiled from Steam and RAWG, which means there is a serious PC-Bias but console games are also included.
Metadata | Download | Format | Visualisation |
---|---|---|---|
Full List of Game Titles | Raw Data HTML Table |
Tab separated key-value pairs | |
Short Descriptions (RAWG) | Raw Data HTML Table |
Tab separated key-value pairs | |
Gameplay Tags | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram Time Series "Point&Click" |
Genres | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram Time Series "Indie" |
Publishers | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram Time Series "Square Enix" |
Developers | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram Time Series "Ubisoft" |
Supported Game Languages | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram |
Supported Platforms (PC, Gameboy, iOS,...) | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values |
Histogram
Combinded Histogram |
Release Date | Raw Data HTML Table |
Tab separated key-value pairs with comma separated values YYYY-MM-DD | Time Series |
Combined Metadata | Raw Data HTML Table |
Tab separated with column header(For nested formats see individual entries) | Coverage Overview |
Documents: | 12295 |
Words | more than 140 Mio (types) |
Combined Text Length: | more than 940 Mio characters |
Game Language Associations: | 4631 |
Walkthrough Languages: | deu, eng |
Walkthrough Sources: | portforward neoseeker spieletipps jayisgames gamesetter |
Number of Games: | 6013 |
Genre Associations: | 3806 |
Gameplay Tags: | 10246 |
Release Dates: | 2443 |
Developers: | 3152 |
Publishers: | 2782 |
Steam IDs: | 1086 |
Platform Associations: | 5293 (PC, Gameboy, iOS, Linux,...) |
Suggestions, hints and help are welcome