A Story Generation and Evaluation Platform

Paper

Human-Readable Summary

This summary IS NOT a substitute for the Agreement.
You are free to use this data for research purposes.
You are explicitly prohibited from providing this data to other individuals not directly working with you on research.
You must provide attribution for any public disclosure of research related to this data.
This Agreement has been adapted from The Federal Demonstration Partnership (FDP) Data Transfer and Use Agreement (DTUA) Template.

Data Transfer and Use Agreement ("Agreement")

Terms and Conditions

Section 1 - Definitions

Data means the collection of stories, characters, comments, and any related JSON metadata collected from the Storium platform as provided by Protagonist Labs, Inc. to the Recipient.
Provider means the individual(s) or entity(ies) granting rights under this Agreement.
Recipient means the individual or entity exercising the rights under this Agreement.
Recipient Personnel means: faculty, employees, fellows, students, and agents of the Recipient (i) who have a need to use or provide a service in respect of the Data in connection with the Recipient’s research, and (ii) have been made aware of the terms of this Agreement and agreed to comply, and to cause its personnel to comply, with such terms.
Collaborator Personnel means: faculty, employees, fellows, students, and agents of an institution, which institution (i) has agreed to collaborate on the Recipient’s research (ii) who have a need to use or provide a service in respect of the Data in connection with the Recipient’s research, and (iii) have been made aware of the terms of this Agreement and agreed to comply, and to cause its personnel to comply, with such terms.

Section 2 - Conditions

Provider shall provide the data set described to Recipient for research purposes only. Provider shall retain ownership of any rights it may have in the Data, and Recipient does not obtain any rights in the Data other than as set forth herein.
Recipient shall not use the Data except as authorized under this Agreement. The Data will be used solely to conduct research and solely by Recipient, Recipient Personnel, and Collaborator Personnel that have a need to use, or provide a service in respect of, the Data in connection with the research and whose obligations of use are consistent with the terms of this Agreement (collectively, “Authorized Persons”).
Except as authorized under this Agreement or otherwise required by law, Recipient agrees to retain control over the Data and shall not disclose, release, sell, rent, lease, loan, or otherwise grant access to the Data to any third party, except Authorized Persons, without the prior written consent of Provider. Recipient agrees to establish appropriate administrative, technical, and physical safeguards to prevent unauthorized use of or access to the Data.
Recipient agrees to use the Data in compliance with all applicable laws, rules, and regulations, as well as all professional standards applicable to such research.
Recipient agrees to recognize the contribution of the Provider as the source of the Data in all written, visual, or oral public disclosures concerning Recipient’s research using the Data, as appropriate in accordance with scholarly standards.
Provider reserves the right to terminate this Agreement with thirty (30) days written notice to the Recipient. Upon termination of this Agreement, Recipient shall delete all copies of the Data, provided, however, that Recipient may retain one (1) copy of the Data to the extent necessary to comply with the records retention requirements under any law, and for the purposes of research integrity and verification.
Except as provided below or prohibited by law, any Data delivered pursuant to this Agreement is understood to be provided “AS IS.” PROVIDER MAKES NO REPRESENTATIONS AND EXTENDS NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED. THERE ARE NO EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, OR THAT THE USE OF THE DATA WILL NOT INFRINGE ANY PATENT, COPYRIGHT, TRADEMARK, OR OTHER PROPRIETARY RIGHTS. Notwithstanding, Provider, to the best of its knowledge and belief, has the right and authority to provide the Data to Recipient for use in the research.
Except to the extent prohibited by law, the Recipient assumes all liability for damages which may arise from its use, storage, disclosure, or disposal of the Data. The Provider will not be liable to the Recipient for any loss, claim, or demand made by the Recipient, or made against the Recipient by any other party, due to or arising from the use of the Data by the Recipient, except to the extent permitted by law when caused by the gross negligence or willful misconduct of the Provider. No indemnification for any loss, claim, damage, or liability is intended or provided by either party under this Agreement.
Neither party shall use the other party’s name, trademarks, or other logos in any publicity, advertising, or news release without the prior written approval of an authorized representative of that party. The parties agree that each party may disclose factual information regarding the existence and purpose of the relationship that is the subject of this Agreement for other purposes without written permission from the other party provided that any such statement shall accurately and appropriately describe the relationship of the parties and shall not in any manner imply endorsement by the other party whose name is being used.
No modification or waiver of this Agreement, except as described in Section 2(6), shall be valid unless in writing and executed by duly authorized representatives of both parties.

About

Systems for story generation are asked to produce plausible and enjoyable stories given an input context. This task is underspecified, as a vast number of diverse stories can originate from a single input. The large output space makes it difficult to build and evaluate story generation models, as (1) existing datasets lack rich enough contexts to meaningfully guide models, and (2) existing evaluations (both crowdsourced and automatic) are unreliable for assessing long-form creative text. To address these issues, we introduce a dataset and evaluation platform built with STORIUM, an online collaborative storytelling community. Our author-generated dataset contains 6K lengthy stories (125M tokens) with fine-grained natural language annotations, in the form of cards, interspersed throughout each narrative, forming a robust source for guiding models. Our evaluation platform is integrated directly with STORIUM, where real authors can query a model for suggested story continuations and then edit them. We provide a leaderboard with automatic metrics computed over these edits, which correlate well with both user ratings of generated stories and qualitative feedback from semi-structured user interviews. We release both the dataset and evaluation platform to spur more principled research into story generation.

A high-level outline of our dataset and platform. In this example from a real STORIUM game, the character ADIRA MAKAROVA uses the strength card DEADLY AIM to DISRUPT THE GERMANS, a challenge card. Our model conditions on the natural language annotations in the scene intro, challenge card, strength card, and character, along with the text of the previous scene entry (not shown) to generate a suggested story continuation. Players may then edit the model output, by adding or deleting text, before publishing the entry. We collect these edits, using the matched text as the basis of our USER metric. New models can be added to the platform by simply implementing four methods: startup, shutdown, preprocess, and generate.

Paper

If you use our dataset or evaluation platform, please cite:

@inproceedings{storium2020,
  Author = {Nader Akoury, Shufan Wang, Josh Whiting, Stephen Hood, Nanyun Peng and Mohit Iyyer},
  Booktitle = {Empirical Methods for Natural Language Processing,
  Year = "2020",
  Title = {{STORIUM}: {A} {D}ataset and {E}valuation {P}latform for {S}tory {G}eneration}
}

Read the paper

Contact

If you have any questions or comments about this work, please visit my website which has my contact information, CV, and an up-to-date listing of my publications.