What is ePiC?

ePiC is a crowdsourced dataset of narratives for employing proverbs in context as a benchmark for abstract language understanding. The dataset provides fine-grained annotation of aligned spans between proverbs and narratives, and contains minimal lexical overlaps between narratives and proverbs, ensuring that models need to go beyond surface-level reasoning to succeed. The dataset is accompanied by three tasks : (1) proverb recommendation and alignment prediction, (2) narrative generation for a given proverb and topic, and (3) identifying narratives with similar motifs.

For more details about ePiC, please refer to our paper:


The dataset can be downloaded using the link below.


The code for all models in the paper are hosted in the GitHub repository linked below


If you use ePiC in your research, please cite our paper with the following BibTeX entry

  title = "e{P}i{C}: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding",
  author = "Ghosh, Sayan  and Srivastava, Shashank",
  booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2022"

Dataset Explorer

Choose the proverb: