1 |
Recovering Lexically and Semantically Reused Texts ...
|
|
|
|
Abstract:
Writers often repurpose material from existing texts when composing new documents. Because most documents have more than one source, we cannot trace these connections using only models of document-level similarity. Instead, this paper considers methods for local text reuse detection (LTRD), detecting localized regions of lexically or semantically similar text embedded in otherwise unrelated material. In extensive experiments, we study the relative performance of four classes of neural and bag-of-words models on three LTRD tasks -- detecting plagiarism, modeling journalists' use of press releases, and identifying scientists' citation of earlier papers. We conduct evaluations on three existing datasets and a new, publicly-available citation localization dataset. Our findings shed light on a number of previously-unexplored questions in the study of LTRD, including the importance of incorporating document-level context for predictions, the applicability of of-the-shelf neural models pretrained on ``general'' ...
|
|
URL: https://underline.io/lecture/29779-recovering-lexically-and-semantically-reused-texts https://dx.doi.org/10.48448/xs99-h225
|
|
BASE
|
|
Hide details
|
|
3 |
Minority protection and kin-state engagement: Karta Polaka in comparative perspective
|
|
|
|
BASE
|
|
Show details
|
|
4 |
“All the touts we need”: HUMINT experience in Northern Ireland
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Drivers of English Syntactic Change in the Canadian Parliament
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Emerging English Transitives over the Last Two Centuries
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Variability in the analysis of a single neuroimaging dataset by many teams
|
|
|
|
In: ISSN: 0028-0836 ; EISSN: 1476-4679 ; Nature ; https://www.hal.inserm.fr/inserm-02914443 ; Nature, Nature Publishing Group, 2020, 582 (7810), pp.84-88. ⟨10.1038/s41586-020-2314-9⟩ (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Detecting de minimis Code-Switching in Historical German Books ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Multimodal Mapping of the Face Connectome
|
|
|
|
In: Nat Hum Behav (2020)
|
|
BASE
|
|
Show details
|
|
12 |
A Corpus-linguistic Approach to the Analysis of Latin Morphology ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
National cultural autonomy and linguistic rights in Central and Eastern Europe
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Who goes where? The importance of peer groups on attainment and the student use of the lecture theatre teaching space
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Modeling the Decline in English Passivization
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Megaxyela fulvago Stephan M. Blank & Katja Kramp & David R. Smith & Yuri N. Sundukov & Meicai Wei & Akihiko Shinohara 2017, sp. nov. ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Megaxyela euchroma Stephan M. Blank & Katja Kramp & David R. Smith & Yuri N. Sundukov & Meicai Wei & Akihiko Shinohara 2017, sp. nov. ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Megaxyela fulvago Stephan M. Blank & Katja Kramp & David R. Smith & Yuri N. Sundukov & Meicai Wei & Akihiko Shinohara 2017, sp. nov. ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Megaxyela euchroma Stephan M. Blank & Katja Kramp & David R. Smith & Yuri N. Sundukov & Meicai Wei & Akihiko Shinohara 2017, sp. nov. ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|