Interim Report of Working Group

From Online Dictionary of Crystallography

Jump to: navigation, search

This draft of the Report of the Working Group on an Online Dictionary of Crystallography is provided for information to the IUCr Finance Committee at its 2006 Spring meeting. It is planned to present a full report to the Executive Committtee in August, following the recommendations of the Finance Committee.


Contents

Remit

The Dictionary Working Group of the Commission on Crystallographic Nomenclature (CCN) was formed during the 20th IUCr Congress in Florence to provide guidance on the establishment and conduct of a project undertaken under the aegis of the Commission, with the approval of the IUCr Executive Committee and the involvement of other Commissions and appropriate bodies of the IUCr, to provide online definitions of terms used in the practice of crystallography. The remit of the Working Group covered the following topics, each of which is addressed separately in the main body of the Report:

1. Is the project to be and to remain an online project (web URLs)?

Or should a book form be also envisaged in the future (question by Henk Schenk, Chairman of the IUCr/OUP book Committee)?

2. The scientific scope of the project

Broadly speaking, the project should be confined to the subject of crystallography, the area of science over which the IUCr has authority. However, crystallography is used by, and merges with, very many other areas of chemistry, physics, mathematics, materials science, biology, computational data processing, etc. What criteria should be applied for deciding which terms to include within the project, and which to exclude? (Consider, for example, the detailed descriptors for protein secondary structure included in the mmCIF dictionary. Should the online "crystallography dictionary" include definitions of protein folds, beta sheets etc.?)

  • Should the names of compounds (minerals, materials, chemical or biological compounds) be included?
  • Should physical concepts be included (entropy, energy, etc.)?
  • Should mathematical terms be included? (group properties, tensor properties, etc.)
  • Should there be translations of each entry in other languages (French, German, Spanish, Russian, Chinese, Japanese)? See old “red” International Tables as an example.
  • Should names of people be included (Bravais, Bragg, Ewald, Laue, etc.)?
  • Should reference to computer programs be included?
  • How should double-word items be included: “neutron interferometry”, “X-ray interferometry”, or “interferometry (neutron)”, “interferometry (X-ray)”?
  • Should specialized expressions such as “normalized structure factors” be itemized as such or appear within the definition of “structure factor”? (there are many such examples).
  • Should equations be included?

3. The granularity of definitions

What is the appropriate amount of text for each entry in the compilation? (This may determine, among other things, the name of the project - glossary, dictionary, index, thesaurus, encyclopaedia?) From Longman's Dictionary of the English Language:

  • glossary: a list of terms (e.g. those used in a particular text or in a specialized field), usually with their meanings
  • dictionary: a reference book containing words, usually alphabetically arranged, together with information about them, especially their forms, pronunciations, parts of speech, meanings, origins, grammatical requirements, and idiomatic uses
  • index: a guide or list to aid reference: e.g. an alphabetical list of items (e.g. topics or names) treated in a printed work that gives for each item the page number where it appears, or a list of items of a specified type
  • thesaurus: 1. a book of words or of information about a particular field or set of concepts; especially a book of words grouped according to their meaning. 2. a list of subject headings or index terms, usually with a cross-reference system for use in the organization of a collection of documents for reference and retrieval
  • encyclopaedia: a reference work that contains information on all branches of knowledge or treats comprehensively a specified branch of knowledge, usually in articles arranged in alphabetical order of subjects either in a single list or within each of several large subsections
  • compendium: a full list or inventory (Webster), a book containing a list of useful hints (Collins); as an example see the IUPAC gold book (http://gold.zvon.org): a list of names, each one with hyperlinks.

What is the quantity of illustrations to be given (drawings, diagrams, spectra, photographs, etc.)

Should the work be organized in Categories and Subcategories?

4. The level of definitions

Should this be a reference work for authors and referees of IUCr Journals, research professionals, undergraduate students, high-school students, the general public? Can it be designed as a multi-level resource?

5. Delegation of authority and labour

What is an appropriate editorial structure to commission, review and implement definitions? This needs to take into account the involvement of Commissions, the possibility of different educational levels for the completed work, and perhaps some technical aspects of presentation and online editing.

6. Presentation

Some consideration should be given to broad aspects of how the project will be presented: as a single web site, as multiple sites (perhaps appropriate if different educational levels are supported), as a free resource or a potential source of revenue, as a companion to International Tables?

7. Financial implications

The project as currently envisaged in its project definition phase will rely heavily on volunteer scientist labour and existing hardware resources, but there may be a need for editorial honoraria, or other costs that the Working Group can specify, such as hardware, technical editing, secretarial help, etc. The Working Group will report its findings to the Finance Committee.


Medium

The project should be executed initially as solely an online project because of the flexibility of the online medium, the fact that there is no limit on the number of entries, the possibility of hyperlinks to IUCr and other web resources. It is possible to consider at a later stage a physical book with a CD containing all the hyperlinks.


Scientific scope

Terms selected for inclusion should have a clear crystallographic application. Terms from connected disciplines (mathematics, physics, chemistry, mineralogy, biology) should be included insofar as they relate to crystallography, e.g. “crystallographic group”. Names of chemical or biological substances or minerals should not be included at the present stage, but terms such as “albite twin law” should.

The Working Group agrees that translations of terms in other languages than English should be given, but not of their definitions.

Reference to computer programs per se should not be included, but there might be instances when it becomes essential, e.g. SHELX.

Names of people should only be included if they relate to crystallographic concepts, e.g. “Bragg’s law”, “Ewald sphere”.

Double-word items such as “X-ray interferometry” should be entered as such. A search on “interferometry “ will automatically retrieve them.

Equations should be included.

The number of entries is not predetermined. There is no technical limitation requiring this and the project can grow with time. The Working Group has contributed a small number of terms (~60) in order to get the pilot project operational. The project has recently been opened to the whole Commission on Crystallographic Nomenclature and a number of around 500 terms should be established as a target for when the web pages are first opened to the public. If the idea is successful, it will probably grow to some thousands of distinct terms over the years.


Granularity of definitions

The Working Group recommends a reference product that is a blend between “dictionary” and “encyclopaedia”, what the French call a dictionnaire encyclopédique: a list of terms with short definitions and cross-links to other entries in the work, with at times longer developments. For instance, “reciprocal lattice” will have a short definition and a hyperlink to the corresponding pamphlet on the IUCr web site (open access) and to the appropriate chapter of IT Volume B; the entry “Bragg’s law” should give the law with a drawing and its derivation could be obtained via an appropriate hyperlink.

Links to CIF dictionaries will be provided where appropriate.

The work may be structured in several ways to assist navigation. The terms will be entered alphabetically and can be retrieved alphabetically, but the WiKi software allows an ordering in categories (and eventually in subcategories). Each entry can be attached to one or more categories (and subcategories). These categories could correspond for instance to titles of IT Volumes, but with additional subjects (“mathematical crystallography” etc.); subcategories could correspond to Chapters. At the time of writing, categories are being assigned to entries on an ad hoc basis in an attempt to determine suitable structuring mechanisms. A click on a category provides links to all the entries related to that category.

As an example, to get the entry “Grüneisen relations”, one may either click on that term, or click on the Category “Physical Properties”; thzt will provide links to subcategories, one of them being “Thermal expansion”; a click on that subcategory will provide a list of links to all the entries related to that topic, one of them being “Grüneisen relations”. The entry “Grüneisen relations” will give a definition and hyperlinks to Chapters 1.4 and 2.1 of International Tables Volume D.

There are several advantages to having categories and subcategories. One is to allow searches on areas of interest, for instance if you are looking for a particular type of twinning, but don’t remember its exact name. Another one is to make the work of preparing the dictionary easier by assigning editors and subeditors to categories and subcategories. Their duty would be to oversee the definitions and to check that there are no obvious omissions.

Note that the Wiki software allows searches on headwords, but also full-text searching of the entire corpus, so that the user has available a large number of query-based informational retrieval strategies.


Audience

The primary goal is to be a reference for authors and referees of IUCr Journals and to research professionals in general: it will give the “official” IUCr acceptance of terms. As such it will also be useful to students and to the general public.

The work forms part of a multi-level resource in the sense that, besides the short definition, hyperlinks will be provided either to a longer definition or to appropriate existing IUCr resources. It will complement International Tables.

The General Secretary discussed with the Research & Development Officer the possibility that the Dictionary could include links to educational web sites. There is no objection to the idea of enriching the value of the dictionary in this way (although it is difficult to see how such compendia of links would fit into an alphabetic structure). However, it may be worth considering whether such a project is better considered as a companion to the Dictionary (sponsored, e.g. by the Commission on Crystallographic Teaching), and managed separately - although with close collaboration between the sponsoring bodies - so as not to dilute attention to the Dictionary project itself.


Organization of contributors

The Editorial Board should consist of the members of the CCN, with representatives from the other Commissions as consultants for the various fields of crystallography. It is clear that, as Editors of the various IUCr publications, the members of the CCN are the people whose duty is to say how crystallographic terms should be used. Efficiency requires that the work should be done under the supervision of a Main Editor or Editor-in-Chief and Editors (and subeditors) for the various categories (and subcategories), chosen among the CCN members and consultants.

The initial experience of the Working Group has been, however, that even the greatest enthusiasts for the project are so busy that they find it difficult to spend the time necessary to make substantial contributions. The authoring privilege has been extended only recently to the rest of the CCN, so it is too early to assess how responsive the Commission as a whole will be to the challenges of authorship. Early indications are that, again, the rate of accretion of new definitions is slower than we would like to see. However, we are optimistic that the project will build up momentum as new contributors begin to experience the satisfaction of seeing their definitions on the web, and as the usefulness of the dictionary increases with its growing number of definitions.


Presentation

It is expected that the resource would appear as a single web site. However, it should also act as a companion to International Tables and to the Journals, as well as to educational resources such as the Teaching Pamphlets and any new educational initiatives arising from the Teaching Commission. As the Online Dictionary of Crystallography would be an important and useful service to researchers, students and authors, it is desirable that it should be open access, bearing in mind that most definitions will have links to IT Volumes, which are not open access. This last point may encourage people to subscribe to International Tables Online.


Financial implications

The project as initially envisaged will rely heavily on volunteer labour and existing hardware resources. The current pilot implementation shares the same hardware as the main IUCr web site (although is managed as a separate virtual server, so can easily be moved to its own server machine if required). Some additional software development will be required (e.g. implementation of a reliable backup strategy, modifications to the style to conform with other IUCr web components); but so long as these are not time-critical, they can be absorbed within the existing workload of the R&D department. Significant software developments (such as creation of a hard-copy edition) would need to be assessed and costed separately. Note that hardware costs in the event of a migration to a separate server would be modest (e.g. of the order of GBP 1000 would suffice for a powerful dedicated machine).

Technical editing costs are ruled out at this stage (it is assumed that the invited contributors will have a high degree of literacy, and that there will be a measure of self-regulation as contributors edit each other's entries to correct minor spelling and typographic errors). Since each entry will be presented as a separate web page, minor inconsistencies of style and presentation will not be so important as they would be in a hard-copy publication. Conversely, however, the decision to produce a hard-copy publication would be likely to involve more rigorous technical editing, with subsequent added costs.

The Finance Committee should monitor the possible need for payment of editorial honoraria. It is expected that the project will require an Editor-in-Chief responsible for its overall shape and direction (at present this role is filled by the project initiator, Professor Authier). The roles of such an Editor-in-Chief will also cover the possible appointment of subsidiary editors to supervise the collection of definitions in topic areas where they have particular expertise, and the commissioning of definitions or sets of definitions to address topics not currently covered. The number and roles of secondary editors will depend in part on the readiness of the volunteer pool of contributors to identify deficiencies and provide needed definitions without prompting. The experience of the Wikipedia project suggests that this is possible in principle, but the early experience of the pilot suggests that significant effort will be needed, at least in the early stages, to build an initial critical mass of content that will inspire more active involvement by volunteer contributors. It is intended to return to this point when the project produces its next report. In the mean time, it is not unreasonable to provide conservatively for the appointment of a small number - six to a dozen - of specialist Editors responsible for commissioning content within their fields of expertise, and possibly paid a modest honorarium in recognition of their successes (by analogy in some way with journal editors' handling of manuscripts).


Timescale

A Pilot Project with an initial set of trial entries has been implemented. Some guidelines for further development and an indication of financial implications are given in this draft of the report supplied to the Finance Committee meeting at the end of March 2006. A further assessment of progress will be reported to the Executive Committee in August 2006.


APPENDIX 1: Membership of the Working Group

The initial membership of the Working Group established in Florence consisted of:

  • Andre Authier (Chair)
  • John Helliwell
  • Bill Clegg
  • Paola Spadon
  • I. David Brown
  • Brian McMahon

Giovanni Ferraris subsequently joined the group as an additional representative of the Teaching Commission, and Peter Strickland, Managing Editor of IUCr publications, as observer. Howard Flack also provided sample entries and useful feedback.



APPENDIX 2: Technical considerations

A major goal of the initial pilot project was to identify a software platform capable of supporting collaborative work on an online dictionary by the distributed authorship that the project seems to require. The pilot is not specifically directed towards identifying a dissemination mechanism, but clearly it is helpful to consider tools that create a version of the dictionary already suitable for public access.

Content management systems

The conventional software platform for managing the collection, editing, revision and publication of a large number of separate items is known as a content management system. Commercial implementations of such systems have been available for decades, and have been distinguished by their high price and complexity of use. The IUCr editorial office has considered such systems in the past (e.g. Texcel Information Manager), but concluded that the high costs and steep learning curves associated with such systems, coupled with their lack of flexibility for the innovative procedures we have developed, have made them less attractive than home-grown systems. Traditional packages were also poorly suited to web use.

More recently, open-source web-based packages such as Bricolage have begun to appear. Bricolage in particular is a system that is under consideration as a basis for collaborative input into the next generation of IUCr public web services. However, it shares many of the drawbacks of older content management systems. It requires heavy investment of time to configure it for a particular organization; there is a significant learning curve for content contributors (authors) to master; it offers rather little in the way of flexibility if one wishes to integrate the managed content with existing material, or with contributions from other sources; and it is rather poorly suited for technical content (especially mathematics). While it remains under consideration for possible future editorial use, it seems too heavyweight for the dictionary project as currently envisaged.

WiKi software

An alternative approach that was put forward at the Florence Congress and enthusiastically received by the members of the Nomenclature Commission present at that meeting was the use of so-called 'WiKis'. A WiKi (from the Hawai'ian for 'quick' or 'fast') is a web-centric content management system designed to be lightweight and encourage rapid development of web sites by a more or less informal collaboration of authors and editors. The public Wikipaedia project demonstrates the possibility to compile a very large compendium of content (at the moment almost a million encyclopaedic entries in the English-language edition, written and edited by tens of thousands of users). Although Wikipaedia encourages the process of authorship, it was felt that (at least for its initial implementation) the Online Dictionary of Crystallography should be seen as the work of expert authors, and that therefore controls should exist on the users able to contribute or edit content. (Indeed, Wikipaedia also has administrative privileges that control access to articles, though by design these are not used routinely.) A requirement therefore was for a lightweight WiKi implementation that had appropriate access control/user management functionality.

MoinMoin

The first package investigated was MoinMoin ([1]), which is already used in the IUCr editorial office for maintaining internal documentation. A pilot MoinMoin implementation was set up in mid-November 2005, and used extensively by Andre Authier (with a small amount of input from John Helliwell). Its advantages were:

  • ease of installation
  • ease of maintenance (individual entries are stored as files on the hard disk, and can readily be backed up, restored, deleted or moved)
  • simple markup and ways of creating internal hyperlinks and links to resources on the Web
  • simple access control mechanisms
  • relatively easy modification to style sheets (so that only authorised users see the tabs/buttons allowing a page to be edited)
  • ability to track all recent changes (essential for the chief editor(s) and system administrator)
  • support for categorizing entries and for managing and indexing categories
  • page templating

Its disadvantages were considered to be:

  • multilingual support (perversely, since a requirement of the Online Dictionary is that it provides access to terms in multiple languages); the main problem was that the software is too well suited for multilingual operation: it recognised that Andre was using a French browser, and displayed the standard system pages and facilities in French. This behaviour would entail translation of all relevant help pages, Introduction and internal labels to French, German and a host of other 'supported' languages.
  • lack of support for mathematics
  • lack of support for images and graphical illustrations
  • limited control over layout of complex pages

The multilingual support was an unexpected problem, and more of a nuisance than a real obstacle. Nevertheless, it would probably involve a significant amount of time to make the resource truly multilingual (note that this does not refer to the articles themselves, which are intended to be in English, but to the descriptive and navigational terms needed for effective use of the site).

We explored the ability to mark up mathematical content. The native markup allowed the creation of italic, bold, subscript and superscript rendering, and the use of Unicode allowed access to many mathematical symbols, but complex maths (e.g. built-up fractions) could not be rendered. The largest single obstacle to progress was the inability to render overbar characters (p\bar1, for example), which significantly impedes progress in descriptions of crystallography!

It was considered possible to write extensions to allow users to upload images for incorporation in the pages created on the MoinMoin site, and it is also possible that add-on processing could be written to extract markup in TeX and pipe it through an external process to render complex maths, but both would require a considerable investment of research and development time, and it was therefore decided to investigate another software platform.


mediawiki

mediawiki ([2]) is the software that is used by Wikipaedia itself, and therefore has a proven track record for the management of large sites with graphical and mathematical content. A first mediawiki implementation was set up in December 2005, and a reimplementation with updated software and appropriate access control mechanisms in late January 2006. All the initial content in the MoinMoin WiKi was transferred with little difficulty to the mediawiki version, and additional entries have been added by Andre Authier and Howard Flack.

The advantages of the new implementation are:

  • native support for uploading of images and other non-text files
  • native support for TeX-based processing of suitable marked-up mathematics content
  • support for a substantial amount of raw HTML markup, allowing for the construction of complex tables and relatively complex page layout
  • support for simple markup (similar to that used by MoinMoin) which is easy for a new author to learn, and is suitable for simple text-only entries)
  • layered and extensible access rights, allowing the establishment of different classes of user: we envisage 'reader', 'author', 'editor' and 'systems administrator'
  • support for categories (as with MoinMoin)
  • automated section numbering
  • numerous admin functions (collection of statistics, autoindexing of categories and of the entire site, identification of broken internal links etc.)
  • support for automated rights metadata (the current pilot is advertising Creative Commons rights to copy, distribute, display, and perform the work, and to make derivative works - although the proper form of licensing has yet to be discussed by the Working Group)

The disadvantages are:

  • much greater difficulty in set up
  • greater administrative complexity (entries are stored in a database, requiring systematic dumping for purposes of backup, and with greater risk of corruption)
  • less sophisticated handling of styles (it was necessary to write a new stylesheet to prevent unprivileged readers from seeing the "edit" tabs that they are unable to use anyway)
  • limited ability (compared with MoinMoin) to track changes to the site overall (though it does provide an RSS feed to the autogenerated page that tracks recent changes, which is helpful)
  • poor support for page templates in the style of MoinMoin (although templated data fields and transclusion may be useful features in the longer term)
  • poor local documentation

Platform of choice

Both MoinMoin and mediawiki offer many features that are suitable for the Online Dictionary project - ability to create and edit entries, store version histories, exercise editorial control to freeze definitions if necessary, internal hyperlinking, indexing and search engines, the ability to annotate and discuss articles. Both also seem suitable as dissemination platforms (as well as the authoring environment that is the main concern of this stage of the project). mediawiki has greater complexity from the systems administration viewpoint; but much of that has to do with the initial setup, which has now been achieved for both platforms. mediawiki offers much better support off the shelf for maths and images, both of which were identified by Andre as essential for an effective crystallography dictionary.

The next phase of development of the Online Dictionary of Crystallography will therefore be based on the mediawiki implementation that can currently be found at http://reference.iucr.org/dictionary