Provenience of discharge summaries Pythonic access#
This library provides integrated MIMIC-III with discharge summary provenance of data annotations and Pythonic classes. This is a package meant for other researchers based on the paper Hospital Discharge Summarization Data Provenance.
Documentation#
See the full documentation. The API reference is also available.
Installation and Configuration#
The library can be installed with pip from the pypi repository:
pip3 install zensols.dsprov
The MIMIC-III database must be installed and configured as documented in the
mimic package’s configuration, which is necessary to render the EHR data used
by the annotations.  However, instead of using ~/.mimicrc, the file must be
called ~/.dsprovrc if not explicitly set with the --config option..  The
mimic package’s configuration section also provides instructions on how to
install a MIMIC-III database via PostgreSQL, SQLite or in a Docker container.
For example, to store cached files in ~/.dsprov/cache using a the SQLite
MIMIC-III database file ~/.dsprov/cache/mimic3.sqlite3, your ~/.dsprovrc
would be:
[default]
# the directory where cached data is stored
data_dir = ~/.dsprov/cache
[mimic_sqlite_conn_manager]
# location of the MIMIC-III SQLite database file
db_file = path: ~/.dsprov/cache/mimic3.sqlite3
Usage#
The package includes a command line interface, which is probably most useful by dumping selected admission annotations.
Command line#
# help
$ dsprov --help
# get two admission IDs (hadm_id)
$ dsprov ids -l 2
# print out two admissions
$ dsprov show -l 2
# print out admissions 139676
$ dsprov show -d 139676
# output the JSON of two admissions with indent 4
$ dsprov show -i 4 -f json -d $(dsprov ids -l 2 | awk '{print $1}' | paste -s -d, -)
API#
The package can be used directly in your research to provide Python object oriented access to the annotations:
>>> from zensols.nlp import FeatureDocument
>>> from zensols.dsprov import ApplicationFactory, AdmissionMatch
>>> stash = ApplicationFactory.get_stash()
>>> am: AdmissionMatch = next(iter(stash.values()))
>>> doc: FeatureDocument = am.note_matches[0].discharge_summary.note.doc
>>> print(f'hadm: {am.hadm_id}')
>>> print(f'sentences: {len(doc.sents)}')
>>> print(f'tokens: {doc.token_len}')
>>> print(f'entities: {doc.entities}')
hadm: 120334
sentences: 46
tokens: 1039
entities: (<Admission>, <Date>, <Discharge>, <Date>, <Date of Birth>, <Sex>, ...)
Docker#
A docker image is available as well.
To use the docker image, do the following:
- Create (or obtain) the Postgres docker image 
- Clone this repository - git clone --recurse-submodules https://github.com/plandes/dsprov
- Set the working directory to the repo: - cd dsprov
- Copy the configuration from the installed mimicdb image configuration: - make -C docker/mimicdb SRC_DIR=<cloned mimicdb directory> cpconfig
- Start the container: - make -C docker/app up
- Test sectioning a document: - make -C docker/app testdumpsec
- Log in to the container: - make -C docker/app devlogin
Citation#
If you use this project in your research please use the following BibTeX entry:
@inproceedings{landesHospitalDischargeSummarization2023,
  title = {Hospital {{Discharge Summarization Data Provenance}}},
  booktitle = {The 22nd {{Workshop}} on {{Biomedical Natural Language Processing}} and {{BioNLP Shared Tasks}}},
  author = {Landes, Paul and Chaise, Aaron and Patel, Kunal and Huang, Sean and Di Eugenio, Barbara},
  date = {2023-07},
  pages = {439--448},
  publisher = {{Association for Computational Linguistics}},
  location = {{Toronto, Canada}},
  url = {https://aclanthology.org/2023.bionlp-1.41},
  urldate = {2023-07-10},
  eventtitle = {{{BioNLP}} 2023}
}
Also please cite the Zensols Framework:
@article{Landes_DiEugenio_Caragea_2021,
  title={DeepZensols: Deep Natural Language Processing Framework},
  url={http://arxiv.org/abs/2109.03383},
  note={arXiv: 2109.03383},
  journal={arXiv:2109.03383 [cs]},
  author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
  year={2021},
  month={Sep}
}
Changelog#
An extensive changelog is available here.
License#
Copyright (c) 2023 Paul Landes