Provenience of discharge summaries Pythonic access#

PyPI Python 3.9 Python 3.10

This library provides integrated MIMIC-III with discharge summary provenance of data annotations and Pythonic classes. This is a package meant for other researchers based on the paper Hospital Discharge Summarization Data Provenance.

Documentation#

See the full documentation. The API reference is also available.

Installation and Configuration#

The library can be installed with pip from the pypi repository:

pip3 install zensols.dsprov

The MIMIC-III database must be installed and configured as documented in the mimic package’s configuration, which is necessary to render the EHR data used by the annotations. However, instead of using ~/.mimicrc, the file must be called ~/.dsprovrc if not explicitly set with the --config option.. The mimic package’s configuration section also provides instructions on how to install a MIMIC-III database via PostgreSQL, SQLite or in a Docker container.

For example, to store cached files in ~/.dsprov/cache using a the SQLite MIMIC-III database file ~/.dsprov/cache/mimic3.sqlite3, your ~/.dsprovrc would be:

[default]
# the directory where cached data is stored
data_dir = ~/.dsprov/cache

[mimic_sqlite_conn_manager]
# location of the MIMIC-III SQLite database file
db_file = path: ~/.dsprov/cache/mimic3.sqlite3

Usage#

The package includes a command line interface, which is probably most useful by dumping selected admission annotations.

Command line#

# help
$ dsprov --help

# get two admission IDs (hadm_id)
$ dsprov ids -l 2

# print out two admissions
$ dsprov show -l 2

# print out admissions 139676
$ dsprov show -d 139676

# output the JSON of two admissions with indent 4
$ dsprov show -i 4 -f json -d $(dsprov ids -l 2 | awk '{print $1}' | paste -s -d, -)

API#

The package can be used directly in your research to provide Python object oriented access to the annotations:

>>> from zensols.nlp import FeatureDocument
>>> from zensols.dsprov import ApplicationFactory, AdmissionMatch
>>> stash = ApplicationFactory.get_stash()
>>> am: AdmissionMatch = next(iter(stash.values()))
>>> doc: FeatureDocument = am.note_matches[0].discharge_summary.note.doc
>>> print(f'hadm: {am.hadm_id}')
>>> print(f'sentences: {len(doc.sents)}')
>>> print(f'tokens: {doc.token_len}')
>>> print(f'entities: {doc.entities}')
hadm: 120334
sentences: 46
tokens: 1039
entities: (<Admission>, <Date>, <Discharge>, <Date>, <Date of Birth>, <Sex>, ...)

Docker#

A docker image is available as well.

To use the docker image, do the following:

  1. Create (or obtain) the Postgres docker image

  2. Clone this repository git clone --recurse-submodules https://github.com/plandes/dsprov

  3. Set the working directory to the repo: cd dsprov

  4. Copy the configuration from the installed mimicdb image configuration: make -C docker/mimicdb SRC_DIR=<cloned mimicdb directory> cpconfig

  5. Start the container: make -C docker/app up

  6. Test sectioning a document: make -C docker/app testdumpsec

  7. Log in to the container: make -C docker/app devlogin

Citation#

If you use this project in your research please use the following BibTeX entry:

@inproceedings{landesHospitalDischargeSummarization2023,
  title = {Hospital {{Discharge Summarization Data Provenance}}},
  booktitle = {The 22nd {{Workshop}} on {{Biomedical Natural Language Processing}} and {{BioNLP Shared Tasks}}},
  author = {Landes, Paul and Chaise, Aaron and Patel, Kunal and Huang, Sean and Di Eugenio, Barbara},
  date = {2023-07},
  pages = {439--448},
  publisher = {{Association for Computational Linguistics}},
  location = {{Toronto, Canada}},
  url = {https://aclanthology.org/2023.bionlp-1.41},
  urldate = {2023-07-10},
  eventtitle = {{{BioNLP}} 2023}
}

Also please cite the Zensols Framework:

@article{Landes_DiEugenio_Caragea_2021,
  title={DeepZensols: Deep Natural Language Processing Framework},
  url={http://arxiv.org/abs/2109.03383},
  note={arXiv: 2109.03383},
  journal={arXiv:2109.03383 [cs]},
  author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
  year={2021},
  month={Sep}
}

Changelog#

An extensive changelog is available here.

License#

MIT License

Copyright (c) 2023 Paul Landes