MUSIL Retreat 2010: Linked Data for Science
1 Background and Motivation
During three days in September 2010, the Münster Semantic Interoperability Lab held its annual research retreat in the Swiss Alps on the topic of Linked Data for Science. The retreat group consisted of 12 MUSIL members and two external guests – Prof. Dr. Femke Reitsma from University of Canterbury (New Zealand) and Prof. Dr. Martin Raubal from University of California at Santa Barbara (USA). We discussed the potential of using Linked Data for Science, identifying emerging benefits, challenges, and research questions. We developed three scenarios of using Linked Data in a university context. At the workshop on Linked Spatio-Temporal Data (LSTD) at GIScience 2010 in Zurich, which was held immediately after the retreat, a larger community discussed practical and theoretical aspects of spatio-temporal information provided as Linked Data. This brief report summarizes the results of the retreat and the relevant insights from the LSTD workshop.

2 Scenarios for Linked Data in Science
The main result from the retreat are three application scenarios that sketch the potential of Linked Data for Science. We point to the specific benefits and challenges of each scenario and sketch strategies for implementation and future research.
2.1 Linked Data for Personal Time Management
Scientists have to cope with a wide range of activities every day. The compartmentalization of research, teaching, and administrative tasks creates a need for efficient and intuitive tools to sort out dependencies and reveal conflicts between activities. Personal and shared online calendars are widely used today, and Linked Data for personal time management should not replace, but empower them. By linking individual activities to remote resources (e.g. a conference program exposed as Linked Data), a range of potential benefits emerge. Users linked to the same activities across different organizational units can help to minimize costs, for instance, by sharing means of transportation. Shared activities point to shared interests, which can form the basis for future collaboration. Finally, linking can significantly reduce efforts to keep schedules consistent and up-to-date: important information such as location or time are not stored locally nor introduced by the user, but fetched and maintained from a remote resource. Hence, changes in the official resources are directly reflected locally.
The persistent storage of personal schedules and the inter-linking with globally shared vocabularies and data for events and tasks enable many new applications. The fact that this data is linked does not imply that it is necessarily open for all. Intuitive user interfaces, integrated with existing tools, have to be developed to give individual users control over who has access to which aspects of the Linked Data representation of their scientific activities. Giving users this right also provides them with the option to control the usage of the data and prevent fraud.
2.2 Linked Data for Documenting and Archiving Research Output
Researchers document their achievements primarily in the form of publication lists and curricula vitae. Invitations to project consortia and meetings as well as department evaluations and promotions heavily depend on such documents. Yet, generating and maintaining them appears to be a problem and time sink for many scientists. Linked Data, which can already now be produced from an advanced university researcher database, could be used by semi-automatic tools to support researchers in generating and maintaining these documents. Besides publication lists and vitae, a researcher’s developed software and models, project participations, acquired funding, invited talks and teaching activities can be easily compiled from Linked Data – even across different universities. Evidently, some of these data have privacy implications and are not public. It is therefore important that users can control access to their data and temporarily grant targeted access – for example, when applying for project funding or when undergoing a department evaluation. While authentication and authorization are established Web technologies, the challenge here is to make them usable for researchers in their daily work without creating technological overhead that creates more problems than it solves.
Such Linked Data produced for individual ”vita maintenance” are also valuable for the university as a whole. The university’s research output is essential to attract students, project funding, and collaboration opportunities. Dissemination of researcher profiles is therefore a strong incentive for the university as a whole to enable Linked Data. Moreover, the approach also creates a distributed and interlinked archive of the university’s researchers and research output that facilitates meaningful queries with a minimum of centralized infrastructure.
2.3 Linked Data for Social Networks and Scientific Landscapes
Understanding one’s social network and position in the scientific landscape is crucial for success in science. To evolve and innovate, it is even more important to establish contacts to researchers outside one’s current social space. For example, researchers looking for partners for a grant proposal want to establish contactsoutside their network, but in a related research area. Yet, social relations also generate conflicts of interest (COI). For example, researchers building a program committee or assigning reviewers need to take social and other relations into account. An additional dimension of conflict is disclosed by somebody’s research topic, which may reveal a certain “school of thought”.
Linked Open Data (LOD) on researchers, social networks, and topic areas can be queried to reveal social spaces, scientific landscapes, and researcher affiliations. There are already social network data (e.g., LinkedIn, XING or Facebook) and standards (e.g., FOAF) to acquire the necessary content from (i.e., the relations among researchers and topics). A comprehensive service for discovering opportunities and conflicts needs to take into account indirect relations derived from spatio-temporal reasoning. For example, if someone worked in a research group concerned with a known topic during some time, one may infer familiarity with the group’s topic, and personal acquaintance with the people who worked in the research group at the same time.
2.4 Summary
The three scenarios sketched above show how individual scientists, universities, and whole research communities can benefit from Linked Data. There is a big potential to improve visibility, exchange, transdisciplinary collaboration, transparency and efficiency by applying the Linked Data principles. At the same time, a number of open research issues, as discussed at the retreat and at the LSTD workshop (summarized below), suggest a concentrated research effort on Linked Data for Science.

3 Workshop on Linked Spatio-Temporal Data
At this GIScience 20102 workshop co-organized by current and former MUSIL members, participants discussed the state of the art as well as research challenges for spatio-temporal information provided as Linked Data. The talks and demos showcased working systems and prototypes building on Linked Data. While the usefulness of Linked Data was shown for areas as different as cultural heritage, risk management, and car navigation, a set of common questions for future research emerged. First and foremost, the core notion of location is still ambiguous and too simplistic for Linked Data. Today’s point locations in undefined coordinate systems are not sufficient, especially not for representing areal or linear features such as roads or rivers. More powerful and flexible models of location and time will be required, together with a combination of geometric and logic reasoning, to exploit spatio-temporal data for reasoning about identity (which may be the core task with Linked Data). They would turn space and time into general enablers naturally attached to all kinds of data on things in the real world, rather than being handled as a more or less separate domain of ”geodata”.
As things in the real world tend to change, their evolution (and that of our knowledge about them) needs to be effectively handled in Linked Data as well. So far, it is impossible to know wether data recorded at some point in the past are still valid, and it is often not even noted when the data were created. Related to such quality and metadata issues are questions of uncertainty and trust, since the Linked Data cloud is a huge collection of information gathered by numerous individuals and organizations. Beyond these problems specific to Linked Data, long-standing problems that originate in terminologies also apply here, such as dealing with multiple con- ceptualizations and multiple natural languages. Finally, privacy issues always emerge once Linked Data on individuals (or even just institutions and buildings) is published. These need to be addressed by technical means such as authentication or authorization as far as possible to put the users in control over their data. They also call for concerted social science research addressing the underlying ethical, institutional, economic, and legal questions.
-
florina-turner reblogged this from musil
-
musil posted this