Provenience of discharge summaries Pythonic access#
This library provides integrated MIMIC-III with discharge summary provenance of data annotations and Pythonic classes. This is a package meant for other researchers based on the paper Hospital Discharge Summarization Data Provenance.
Documentation#
See the full documentation. The API reference is also available.
Installation and Configuration#
The library can be installed with pip from the pypi repository:
pip3 install zensols.dsprov
The MIMIC-III database must be installed and configured as documented in the
mimic package’s configuration, which is necessary to render the EHR data used
by the annotations. However, instead of using ~/.mimicrc
, the file must be
called ~/.dsprovrc
if not explicitly set with the --config
option.. The
mimic package’s configuration section also provides instructions on how to
install a MIMIC-III database via PostgreSQL, SQLite or in a Docker container.
For example, to store cached files in ~/.dsprov/cache
using a the SQLite
MIMIC-III database file ~/.dsprov/cache/mimic3.sqlite3
, your ~/.dsprovrc
would be:
[default]
# the directory where cached data is stored
data_dir = ~/.dsprov/cache
[mimic_sqlite_conn_manager]
# location of the MIMIC-III SQLite database file
db_file = path: ~/.dsprov/cache/mimic3.sqlite3
Usage#
The package includes a command line interface, which is probably most useful by dumping selected admission annotations.
Command line#
# help
$ dsprov --help
# get two admission IDs (hadm_id)
$ dsprov ids -l 2
# print out two admissions
$ dsprov show -l 2
# print out admissions 139676
$ dsprov show -d 139676
# output the JSON of two admissions with indent 4
$ dsprov show -i 4 -f json -d $(dsprov ids -l 2 | awk '{print $1}' | paste -s -d, -)
API#
The package can be used directly in your research to provide Python object oriented access to the annotations:
>>> from zensols.nlp import FeatureDocument
>>> from zensols.dsprov import ApplicationFactory, AdmissionMatch
>>> stash = ApplicationFactory.get_stash()
>>> am: AdmissionMatch = next(iter(stash.values()))
>>> doc: FeatureDocument = am.note_matches[0].discharge_summary.note.doc
>>> print(f'hadm: {am.hadm_id}')
>>> print(f'sentences: {len(doc.sents)}')
>>> print(f'tokens: {doc.token_len}')
>>> print(f'entities: {doc.entities}')
hadm: 120334
sentences: 46
tokens: 1039
entities: (<Admission>, <Date>, <Discharge>, <Date>, <Date of Birth>, <Sex>, ...)
Docker#
A docker image is available as well.
To use the docker image, do the following:
Create (or obtain) the Postgres docker image
Clone this repository
git clone --recurse-submodules https://github.com/plandes/dsprov
Set the working directory to the repo:
cd dsprov
Copy the configuration from the installed mimicdb image configuration:
make -C docker/mimicdb SRC_DIR=<cloned mimicdb directory> cpconfig
Start the container:
make -C docker/app up
Test sectioning a document:
make -C docker/app testdumpsec
Log in to the container:
make -C docker/app devlogin
Citation#
If you use this project in your research please use the following BibTeX entry:
@inproceedings{landesHospitalDischargeSummarization2023,
title = {Hospital {{Discharge Summarization Data Provenance}}},
booktitle = {The 22nd {{Workshop}} on {{Biomedical Natural Language Processing}} and {{BioNLP Shared Tasks}}},
author = {Landes, Paul and Chaise, Aaron and Patel, Kunal and Huang, Sean and Di Eugenio, Barbara},
date = {2023-07},
pages = {439--448},
publisher = {{Association for Computational Linguistics}},
location = {{Toronto, Canada}},
url = {https://aclanthology.org/2023.bionlp-1.41},
urldate = {2023-07-10},
eventtitle = {{{BioNLP}} 2023}
}
Also please cite the Zensols Framework:
@article{Landes_DiEugenio_Caragea_2021,
title={DeepZensols: Deep Natural Language Processing Framework},
url={http://arxiv.org/abs/2109.03383},
note={arXiv: 2109.03383},
journal={arXiv:2109.03383 [cs]},
author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
year={2021},
month={Sep}
}
Changelog#
An extensive changelog is available here.
License#
Copyright (c) 2023 Paul Landes