| |
Name of Collaboratory :
|
|
Project Gutenberg's Distributed Proofreaders (PGDP)
|
|
| |
|
Logo :
|
|
|
|
|
|
| |
URL :
|
|
http://www.pgdp.net/c/default.php |
|
| |
Collaboratory Status
:
|
|
| Operational
|
|
Start Date :
2000
|
End Date :
|
|
|
| |
Primary Collaboratory Function :
|
|
Open Community Contribution System |
|
| |
Secondary Collaboratory Functions :
|
|
Virtual Community of Practice |
|
| |
Domain(s)
:
|
|
|
|
| |
Brief Description
of the Collaboratory :
|
|
Distributed Proofreaders (DP) was founded in 2000 by Charles Franks to support the digitization of Public Domain books and to provide material more efficiently for Project Gutenberg (PG). DP recruits individuals to find and correct computer scanning and formatting errors on pages of archival material that has been automatically scanned using Optical Character Recognition software. Because of the enormity of the task of proofreading an entire work, the DP structure enables volunteers to be able to proofread scanned texts one page at a time.
Though Project Guttenberg and Distributed Proofreaders are related, they are two separate orgnizations. DP has only recently come under the auspicious of PG, which provides a small amount of funding. In 2002, Distributed Proofreaders finally became an official Project Gutenberg site and as such is supported by Project Gutenberg. At the moment all of the DP output goes to PG. However DP is looking for other avenues for which to contribute its output. All the proofreaders, managers, developers and other participants are volunteers.
Goal:
DP's goal is to make it easy for new volunteers to get involved - while at the same time having a scalable and efficient infrastructure. This is so as to keep the incremental costs of digitization small, essentially keeping it to the cost of purchasing and shipping the books.
PGDP Site Concept:
This site provides a web-based method of easing the proofreading work associated with the digitization of books by breaking the work into individual pages and distributing it. This allows many proofreaders to be working on the same book at the same time. This significantly speeds up the proofreading/e-book creation process.
When a proofer elects to proofread a page of a particular book, the text and image file are displayed on a customized web page and proofreading tool (see image of collaboratory below). This allows the page text to be easily reviewed and compared to the image file, thus assisting the proofreading of the page text. The edited text is then submitted back to the site via the same web page that it was edited on. A second proofreader is then presented with the work of the first proofreader and the page image. Once they have verified the work of the first proofreader and corrected any additional errors the page text is again submitted back to the site.
Once all pages for a particular book have been processed, a post-processor joins the pieces, properly formats them into a Project Gutenberg e-book and submits it to the Project Gutenberg archive.
Organization:
In terms of the day-to-day production of eBooks, the volunteers communicate and organize themselves with the aid of the website. This is facilitated with the use of PHP BB which has forums and the private messaging that goes with it. In addition, volunteers are able to sign up and volunteer to proofread pages one page at a time. And these pages are submitted to a project facilitator who aggregates the proofread and digitized pages. This is undertaken one page at a time through a Proofreading Tool interface and submitted online when completed. The website provide tools for the proofreading, aggregation into e-texts, and communicaton between Project volunteers.
The central administration team is comprised of 3 systems Administrators, a number of developers and 5 project facilitators all of whom are distributed across several countries such as Europe, Australia and New Zealand. In addition, there are moderators who oversee the forums and a person who oversees the training of new volunteers and proofreading mentors. The group functions entirely by communicating online through the use of email and instant messaging.
The central administration group have only had one face-to-face meeting since the collaboratory has started. This took place in May 2004 and not everyone on in the central administration made it to that meeting. Despite that, the administrators made use of streaming audio and the instant messaging to allow the offsite administrators to participate in the first DP meeting. It would be safe to say that DP is a true 'virtual' community which managed to leverage customized technology development and the good will of their volunteers to achieve their goals.
One of the major factors that has contributed to DPs success has been the incentives and motivations for individuals to volunteer their time. The central administration team attempts to facilitate this by providing users with counts of the number of pages that they have proofread as well as who the top-ten proofreaders were. Another thing that has been instituted is the proofreader rankings. As one finishes more pages, one moves from novice to ace. By tradition, DP doesn’t publish what the next rank is and keeps the volunteer in suspense as to how many pages are needed to be proofread before one moves on to the next rank. Also, the higher in rank the proofer, the more pages needed to be proofed in order to get to the next stage. This seems to be pretty successful as people seem to enjoy the fact that they have attained some level of achievement and recognition with the ranking system. However, there exists a tension of the quantity of pages proofread and the quality of the work being produced which needs to be reconciled in the light of motivating volunteers.
Another incentive for the volunteers to participate is the ability to form virtual social networks through proofreading teams. Members can elect to band together in teams in order to compete with other teams as to the numbers of pages that have been proofread. By default all new volunteers are automatically signed on to the Distributed Proofreaders team. However, volunteers can elect to form their own teams or join other teams as well. The teams are largely centered around outside affiliations such as your country of origin. One of the larger teams on DP is called Team Canada and they have proofread an impressive 228,750 pages.
Underlying the incentives to participate, the administrators at DP believe that much of the success of the project relies upon the strong sense of community that exists between the volunteers. This sense of community is attributed to a number of factors:
- the strong belief in the value of the DP endeavour, which is likened to the work of monks transcribing ancient works for posterity. A lot of volunteers see themselves that way, and it seems to appeal to a huge range of people from different backgrounds. The tagline of the project, "Preserving History one page at a time" captures the spirit of the community well.
- the DP community is also consciously tolerant of differences and everyone is welcome to participate. The barriers to participation are kept low enough to enable lay people to participate in the project.
- another thing that makes DP work is that it is not just a community of interest, but people who are actually working together. There is something about working and contributing together that builds more of an attachment/closeness. People have to do something at DP in order to be a part of the community.
|
|
| |
Access to
Instruments :
|
|
The volunteers at DP utilize custom built tools to enable them to proofread texts one page at a time, submit their work and manage volunteers. (See images of the tools below) |
|
| |
Access to
Information Resources :
|
|
High speed scanners are used to scan text destructively, OCR'd and uploaded on to the data base. DP uses ABBYY FineReader version 6 as its OCR. |
|
| |
Access to
People as Resources :
|
|
Volunteers are managed through a system which enables them to sign up for an account. Once attaining an account, a volunteer chooses a text to proofread one page at a time. The volunteer is able to get advice and help from mentors and trainers. In addition, his work is looked over by another proofreader and the entire work is aggregated by a project facilitator. |
|
| |
Funding Agency
or Sponsor : |
|
|
|
| |
| |
|
Notes on Funding Agencies/Sponsors:
The project is mainly run by volunteers who contribute their own time and resources. There has been a smattering of funding for DP, which is summarized below:
- PG funded the purchase of 2 high speed scanners, a new server and and is paying for the hosting fees of the server as well, which amounts to $6000 in total.
- In September of 2002, a server was provided to DP by the Internet Archives.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| TOTAL
PARTICIPANTS: |
|
|
|
Notes on Participants/Organizations:
Project Guttenberg: http://www.gutenberg.net/
Project Gutenberg of Australia
Project Gutenberg of Europe
Projekt Gutenberg-DE
Project Runeberg
Distributed Proofreaders Europe: http://dp.rastko.net/default.php
|
|
|
|
| |
|
|
|
|
| |
Communications Technology
Used :
|
|
Email
Instant Messaging - Jabber
PHP Bulletin Board
Telephone conferencing |
|
| |
Technical
Capabilities :
|
|
Management of technical resources
Access control/login facilities
Support for transition between synch and asynch
Notification of events or a need by someone, Awareness of people's activities
Asynchronous object sharing
Audit trail of events, "Handoff" authoring
Asynchronous conversation
Threaded discussion, Email
|
|
| |
Key Articles : |
|
Newby, Gregory B. & Franks, Charles (2003). Distributed Proofreading. International Conference on Digital Libraries
Third ACM/IEEE-CS joint conference on digital libraries. (pp. 3). Washington D.C., USA: IEEE Computer Society.
|
|
| |
Project-reported performance data
:
|
|
An estimate given by the DP's Chief Administrator states that:
- 400 to 500 distinct accounts are accessed over a 24 hour period.
- 1000 distinct accounts accessed over a 7 days.
- 2000 distinct accounts accessed between 2 weeks to a month. |
|
| |
Images
of the Collaboratory: |
|
 |
Individual member page where statistics about proofreading performance and output are recorded and tracked to motivate volunteers.
|
 |
Teams of proofreaders can be formed and their statistics tracked. Statistics as to the number of pages proofread are monitored and used as a motivational device for proofreaders to improve their output.
|
 |
The standard interface for volunteers to proofread the text.
|
 |
Proofreading Interface which allows the volnteer to view the actual image of the text and edit it side by side.
|
|
|