DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 24

1
Measuring the quality of unstructured text in routinely collected electronic health data: a review and application
Nesca, Marcello. - 2022
Abstract: Introduction: Routinely collected electronic health data (RCEHD), can be comprised of structured, semi-structured, or unstructured information. Electronic medical records (EMRs), one type of RCEHD, often contain unstructured text data (UTD), which are typically prepared for analysis (i.e., preprocessed) and analyzed using natural language processing (NLP) techniques. At present, there are few studies about the specific types of NLP methods used to preprocess UTD to address data quality issues prior to analysis or modelling. Purpose & Objectives: The purpose was to examine preprocessing methods for UTD and evaluate the quality of UTD in EMRs. The objectives were to: 1) systematically document current research and practices for preprocessing UTD to describe or improve its quality, and 2) apply data quality indicators identified from current research and practices to UTD in EMRs from the Manitoba Primary Care Research Network and describe the quality of these data. Methods: Objective 1 involved a scoping review. Scopus, Web of Science, ProQuest, and EBSCOhost were searched for literature on current research and practices to prepare UTD for analysis, up to and including 2021. For objective 2, a case study was undertaken where data quality indicators and preprocessing methods identified in the scoping review were applied to UTD from EMRs. Results: 41 articles were included in the scoping review for objective 1; over 50% were published between 2016 and 2021 and over 90% were empirical research articles. Data quality indicator topics for UTD in EMRs included misspelled words, security, word variability, sources of noise, quality of annotations, ambiguous abbreviations, and manual annotations. For objective 2, we selected 193,206 clinical encounter notes from EMRs between 1985 and 2020. Overall, the clinical encounter notes contained an average (standard deviation [SD]) of 27.3 (27.0) stop words, 25.7 (27.8) punctuation symbols, 12.1 (11.1) spelling errors, and 2.9 (2.6) special characters. The average (SD) length of a clinical encounter note was 555.8 (551.1) characters, and 71.5 (59.7) words. Lexical diversity, had a mean (SD) of 86.2 (11.9). Conclusion: This study identified multiple data quality indicators that have been used to preprocess UTD in published literature and demonstrated their application to real-world data. ; February 2022
Keyword: Data quality; Electronic Medical Records; Health research; Natural language processing; pre-processing unstructured text data
URL: http://hdl.handle.net/1993/36163
BASE
Hide details
2
LEXICON BASED RULE EXTRACTION FOR SENTIMENT ANALYSIS UNDER BIG DATA ENVIRONMENT ...
B. Sevugamoorthy. - : Zenodo, 2019
BASE
Show details
3
LEXICON BASED RULE EXTRACTION FOR SENTIMENT ANALYSIS UNDER BIG DATA ENVIRONMENT ...
B. Sevugamoorthy. - : Zenodo, 2019
BASE
Show details
4
NgramPOS: A Bigram-based Linguistic and Statistical Feature Process Model for Unstructured Text Classification
BASE
Show details
5
Big Data Text Summarization: Using Deep Learning to Summarize Theses and Dissertations
Kahu, Sampanna; Ingram, William A.; Jude, Palakh. - : Virginia Tech, 2018
BASE
Show details
6
Face value of companies: deep learning for nonverbal communication ...
Burgard, Sophie. - : Humboldt-Universität zu Berlin, 2017
BASE
Show details
7
Face value of companies: deep learning for nonverbal communication
Burgard, Sophie. - : Humboldt-Universität zu Berlin, 2017
BASE
Show details
8
Supervised Process of Un-structured Data Analysis for Knowledge Chaining
In: Procedia CIRP ; CIRP design conference ; https://hal.archives-ouvertes.fr/hal-01347030 ; CIRP design conference, KTH, Jun 2016, Stockholm, Sweden. pp.436-441, ⟨10.1016/j.procir.2016.04.123⟩ ; http://cirpdesign2016.org/ (2016)
BASE
Show details
9
Leveraging Lexical Link Analysis (LLA) To Discover New Knowledge
In: Military Cyber Affairs (2016)
BASE
Show details
10
A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text
Minhas, Saliha Z. - : University of Stirling, 2016
BASE
Show details
11
Situation Tracking in Large Data Streams
In: DTIC (2015)
BASE
Show details
12
Sentiment Big Data Flow Analysis by Means of Dynamic Linguistic Patterns
BASE
Show details
13
Lexical Link Analysis Application: Improving Web Service to Acquisition Visibility Portal
In: DTIC (2013)
BASE
Show details
14
Automated Extraction and Characterisation of Social Network Data from Unstructured Sources -- An Ontology-Based Approach
In: DTIC (2013)
BASE
Show details
15
Applications of Lexical Link Analysis Web Service for Large-Scale Automation, Validation, Discovery, Visualization, and Real-Time Program Awareness
In: DTIC (2012)
BASE
Show details
16
System Self-Awareness and Related Methods for Improving the Use and Understanding of Data within DoD
BASE
Show details
17
Collective knowledge systems: Where the social web meets the semantic web
In: http://www.websemanticsjournal.org/papers/2007119/CollectiveKnowledgeSystemsGruberV6I1.pdf (2008)
BASE
Show details
18
A conceptual-modeling approach to extracting data from the web
In: http://www.deg.byu.edu/papers/er98.pdf (1998)
BASE
Show details
19
A Conceptual-Modeling Approach to Extracting Data from the Web
In: http://osm7.cs.byu.edu/deg/papers/er98.ps (1998)
BASE
Show details
20
A Conceptual-Modeling Approach to Extracting Data from the Web
In: http://lantern.cs.byu.edu/papers/er98.ps (1998)
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
24
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern