EPFL.2

ScienceWISE (phase 1)

Long Title: Web-based Interactive Semantic Environment for e-Science - Phase 1
Leading
Organization:
EPF Lausanne
Domain: ELS
Status: finished
Start Date: 01.08.2010
End Date: 31.12.2011
Project Leader: K. Aberer
Deputy Project Leader: A. Boyarsky
Website: http://sciencewise.info

ScienceWISE provides a platform for creation of virtual organizations of scientists, working together on a dynamical generation of professional field-specific ontologies. The system greatly facilitates reading of scientific papers by providing students with the texts, annotated by scientists and linked against a system of expert-written definition articles and collection of web-resources.

(see also Phase 2 and Phase 2 continuation (extension))

Results

Component Description
Bookmarking functionality Private bookmark collections and offline bookmarks (filtering by concept and/or article categories)
Relation Finder Finding relations between concepts, papers, authors ans passes in ontology.
arXiv.org integration Integration with arXiv.org system (bookmarking and annotated PDF). Example: http://arxiv.org/abs/0812.0010
Django-selenium integration library The Selenium framework is used for web-interface testing. The freamework integration codebase has been separated from the ScienceWISE code and the library is released as an open source.
Application Programming Interface (API) for Utopia Documents Utopia system allows to extend the annotation functionalities to PDF-only papers (without a LaTeX source).
ScienceWISE: A Web-based Interactive Semantic Platform for Scientic Collaboration Paper; International Semantic Web Conference (ISWC) 2011. [Demo Paper won the Best Demo Award]
An Integrated Socio-Technical Crowdsourcing Platform for Accelerating Returns in eScience Paper; International Semantic Web Conference (ISWC) 2011. [Outrageous Idea Track; Won one of the three Outrageous Ideas Prizes awarded by the Computing Community Consortium]

Two main functionalities have been implemented and improved in the ScienceWISE system:

  1. Annotations: To annotate a paper its author is presented with the list of the concepts and definitions, automatically identified in the paper and ordered by their relevance. The user can choose some of the concepts and the system will produce a hyperlinked version of the manuscript, inserting hyperlinks to relevant definitions/resources, thus expanding the paper with additional details, comments or pedagogical materials. Competing scientific viewpoints are represented as alternative resources and definitions about the same concept.
  2. Bookmarks: Users can bookmark any ArXiv.org or CDS paper using the ScienceWISE ontology (conceptual indexing). The system automatically selects the most relevant concepts for characterization of a paper, to be further fine-tuned by the user. A concept navigation panel allows to classify bookmarked papers, create collections and easily navigate to any bookmarked paper with several clicks.

Both for annotation and bookmarking, users can add concepts, definitions, resources, and relations that they deem necessary. This occurs exclusively through the ScienceWISE ontology, thus creating a mechanism to expand it manually and validate the results of automated expansion. The ability of users to expand the ontology makes this "restrictions of the natural language flexibility" quite mild, while in return part of the scientist's work is performed automatically and the researcher than has just to check and tune suggested set of concepts for indexing, resources for annotation, etc. In addition, the user is empowered with ontology-based methods, that allow him to perform semantic search and recommendation or to navigate using the ontological neighborhood of a paper.

In a second phase the system will be further improved (functionalities, support) and spread into other fields in science


Initial situation

Modern scientific methods require interdisciplinary efforts within adjacent subjects or even between different sciences.
The amount of scientific information constantly grows. Its systematic organization and tools for an effective access becomes a crucial task for successful research. Creation of intra- and inter-disciplinary knowledge bases is complicated by the voluume of the information and its fast update.
This calls for dynamically generated community-run systems as Wikipedia, a virtual organization replacing classical publishers of encyclopedias. However, the quality of science-related articles in Wikipedia is very uneven and unsystematic, making its use for research problematic beyond the "first impression".
Complexity of modern research makes it impossible to disseminate scientific knowledge (via papers, reports, presentations, etc.) in a self-contained manner while still keeping the size within reasonable boundaries. Therefore a number of leading research journals (e.g. Nature, Science) provide a "Supporting Material" - online information, linked to a published article. Such an organization of scientific publications allows to combine presentations of the results with possibility of in-depth comments.

The prototype version of the ScienceWISE system is already functioning. The first invited experts already started to fill the content of encyclopedia definitions and ontological content.

Goals

ScienceWISE will:

  1. provide a platform for scientists, working together on a dynamical generation of professional field-specific ontologies (with physics being the first example). This ontology offers the possibility to have several single-authored definitions of the same concept. ScienceWISE provides a mechanism for scientific community ranking (by linking against the best definitions) and provides a peer-run quality assurance system.
  2. allow authors to annotate their papers, uploaded to arxiv.org by cross-linking important concepts to the entries in the ScienceWISE ontology, thus expanding content of their papers with supporting material in the form of encyclopedia-like articles. Visbility of articles, own views, subfields or young reserach can be increased.
  3. improve drastically the information communication for interdisciplinary research, making scientific knowledge more accessible for colleagues from different fields.
  4. allow to create papers, project notes etc. online and to share them by ScienceDocs - an online scientific editor and document sharing system that will be developed during this project.

Benefits

ScienceWISE aims to bridge a well-known gap between textbooks topics and subjects of scientific papers that every student encounters. This problem becomes important for students in natural sciences, starting from their 4th year and during their Master's and Ph. D. studies. The ScienceWISE system greatly facilitates reading of the scientific papers by providing students with the texts, annotated by scientists and linked against a system of expert-written definition articles and collection of web-resources. The ontological information (relations between different scientific concepts) within ScienceWISE simplifies students' orientation in the previously unfamiliar fields, permitting their efficient adaptation into the scientific research.
We expect students to be a significant fraction of the users. Thus the main functionality of ScienceWISE provides an important e-learning resource.

Development

The following functionalities will be developed and integrated:

  • include automated unit and integration testing of all system components, including user interface,
  • AAI,
  • authentication system with arxiv.org,
  • automatically populate ScienceWISE ontology by importing physics-related concepts and categorization tree from Wikipedia and other physics-related encyclopedias,
  • automatically request authors of papers to define the most important concepts in it,
  • allow user requests for a definition for a particular concept and show these requests to the authors interested in the topic,
  • make concepts map in the ontology more interactive by using SVG and Ajax technologies,
  • implement semantic full-text search for papers from arxiv.org, search of authors by scientific interests and related functionalities,
  • provide users possibility to create and store private and shared TeX documents and related data on the server (based on TeX4Web online editor),
  • implement advanced features in our TeX handling library to improve annotating of PDF documents,
  • show live preview only for a current part of a document to make it faster and easier to use,
  • implement version control and related features like "view history" and "compare revisions",
  • allow users to selectively share their ScienceDocs documents and files with other users or with everyone; implement collaborative editing of documents,
  • ranking concepts on per-subject and per-author levels,
  • automatic paper categorization semi-automatic methods to create semantic relations (propose variants on concept editing); guess abbreviations for concepts,
  • design and implement tools allowing the users to precisely specify the logical structure of the document,
  • implement real-time natural language processing in TeX4Web based editor,
  • allow certain sections in the documents to be written in controlled natural language; process these sections to extract logical information; design and implement interface to query extracted information,
  • develop algorithms to improve semi-automatic concepts recognition in papers,
  • explore possibility to use machine-learning,
  • visualize all concepts used by an author (group of authors) and relations between them,
  • automatic journal references detection, inserting links in the bibliography,
  • reuse large amount of scientific books available in electronic form to extend and improve the ontology; provide index generation service for books written using ScienceDocs,

Testing will be performed with a group of volunteering scientists.
After the first stage, mass advertising of the ScienceWISE system for scientific community will be started. We expect to register then a significant number of new users who contribute to the system.
In a third stage users will start doing research with the content of the system, semantic, statistical, complex systems and computer-science research.

Once successful this primarily physics oriented scheme is readily expandable to other branches of the natural sciences and linked with other large-scale repositories of preprints and scientific papers.

Back