Japanese Visual Media Graph: Providing researchers with data from enthusiast communities

The project aims to create a graph-based, highly interconnected research database on Japanese visual media, focusing on, but not limited to anime, manga, computer games and visual novels. It is aimed at researchers in Japanese studies who focus on modern media and its expressions, themes, topics, characters and reception.

We envision a structure similar to the Google knowledge graph, that is combined with a flexible search interface and analytic tools. We intend to use the data on Japanese visual media that is being created and curated by the many enthusiast communities on the web. An initial survey of several larger community websites showed an incredible depth of information, a deep understanding of the source material as well as a high attention to details on part of the volunteer contributors. As such, making contact with these communities and learning about their needs and motivations is one of the main project elements. We intend to enter into a meaningful discussion with representatives and administrators of the community sites in order to establish a long-term cooperation that benefits both sides. On the technological side, we plan to harvest the data using either already existing data APIs or download options. If there is no established means to get data, we will work with the individual sites to establish one. Each data set is then mapped to a graph-based representation that closely follows the data model chosen by the donor site, thus keeping all the original information.

In the next step, entities, relations and their attributes are matched to a central data model, that should incorporate all aspects of the research domain. This data modelling will be driven from two sides: the available data and the domain knowledge of the Japanese studies researchers.The architecture will be completely open source and most likely be based on the software stack that is being developed in the Wikidata project. It is one of the more advanced deployment of a huge graph-based database with integrated search and visualization features and has an already established development community. It also provides means to annotate and propose changes to the data in a well-documented and traceable way.The entire development phase will be accompanied by researchers from the Japanese Studies, who will be responsible for data selection and data quality assurance. They will make sure that both the data model and the architecture of the prototype supports their research by conducting example research and verifying the results. Once a first prototype has been developed to the satisfaction of the project partners, it will be made accessible to the larger Japanese media studies research community as well as other researchers who are interested in the data harvesting and modelling aspects.