zensols.mimic package¶
Submodules¶
zensols.mimic.adm module¶
Hospital admission/stay details.
- class zensols.mimic.adm.HospitalAdmission(admission, patient, diagnoses, procedures)[source]¶
Bases:
PersistableContainer,DictableRepresents data collected by a patient over the course of their hospital admission. Note: this object keys notes using their
row_idIDs used in the MIMIC dataset as integers and not strings like some note stashes.- __init__(admission, patient, diagnoses, procedures)¶
- property feature_dataframe: DataFrame¶
The feature dataframe for the hospital admission as the constituent note feature dataframes.
- get_duplicate_notes(text_start=None)[source]¶
Notes with the same note text, each in their respective set.
- get_non_duplicate_notes(dup_sets, filter_fn=None)[source]¶
Return non-duplicated notes.
- Parameters:
dup_sets (
Tuple[Set[str]]) – the duplicate sets generated fromget_duplicate_notes()filer_fn – if provided it is used to filter duplicates; if everything is filtered, a note from the respective duplicate set is chosen at random
- Return type:
- Returns:
a tuple of
(<note>, <is duplicate>)pairs- See:
duplicate_notes
- property notes_by_category: Dict[str, Tuple[Note, ...]]¶
All notes by
Note.categoryas keys with the list of resepctive notes as a list as values.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, include_admission=False, include_patient=False, include_diagnoses=False, include_procedures=False, **note_kwargs)[source]¶
Write the admission and the notes of the admission.
- Parameters:
note_kwargs – the keyword arguments gtiven to
Note.write_full()
- write_full(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **kwargs)[source]¶
Write a verbose output of the admission.
- Parameters:
kwargs – the keyword arguments given to meth:write
- write_notes(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, note_limit=9223372036854775807, categories=None, include_note_id=False, **note_kwargs)[source]¶
Write the notes of the admission.
- Parameters:
note_limit (
int) – the number of notes to writeinclude_note_id (
bool) – whether to include the note identification infonote_kwargs – the keyword arguments gtiven to
Note.write_full()
- class zensols.mimic.adm.HospitalAdmissionDbFactoryStash(delegate, factory, enable_preemptive=True, dump_factory_nones=True, force_clear_factory=False, doc_stash=None, mimic_note_context=None)[source]¶
Bases:
FactoryStash,PrimeableA factory stash that configures
NoteEventinstances so they can parse the MIMIC-III English text asFeatureDocumentinstances.- __init__(delegate, factory, enable_preemptive=True, dump_factory_nones=True, force_clear_factory=False, doc_stash=None, mimic_note_context=None)¶
- clear()[source]¶
Delete all data from the from the stash.
Important: Exercise caution with this method, of course.
- load(hadm_id)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
get()- Return type:
- class zensols.mimic.adm.HospitalAdmissionDbStash(config_factory, mimic_note_factory, admission_persister, diagnosis_persister, patient_persister, procedure_persister, note_event_persister, note_stash, hospital_adm_name)[source]¶
Bases:
ReadOnlyStash,PrimeableA stash that creates
HospitalAdmissioninstances. This instance is used by caching stashes per the default resource library configuration for this package.- __init__(config_factory, mimic_note_factory, admission_persister, diagnosis_persister, patient_persister, procedure_persister, note_event_persister, note_stash, hospital_adm_name)¶
-
admission_persister:
AdmissionPersister¶ The persister for the
admissionstable.
-
config_factory:
ConfigFactory¶ The factory used to create domain objects (ie hospital admission).
-
diagnosis_persister:
DiagnosisPersister¶ The persister for the
diagnosistable.
- exists(hadm_id)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
-
hospital_adm_name:
str¶ The configuration section name of the
HospitalAdmissionused to load instances.
- load(hadm_id)[source]¶
Create a complete picture of a hospital stay with admission, patient and notes data.
- Parameters:
hadm_id (
str) – the ID that specifics the hospital admission to create- Return type:
-
mimic_note_factory:
NoteFactory¶ The factory that creates
Notefor hopsital admissions.
-
note_event_persister:
NoteEventPersister¶ The persister for the
noteeventstable.
-
patient_persister:
PatientPersister¶ The persister for the
patientstable.
-
procedure_persister:
ProcedurePersister¶ The persister for the
proceduretable.
- class zensols.mimic.adm.NoteDocumentPreemptiveStash(delegate, config, name, chunk_size=0, workers=1, processor_class=<class 'zensols.multi.stash.PoolMultiProcessor'>, note_event_persister=None, adm_factory_stash=None)[source]¶
Bases:
MultiProcessDefaultStashContains the stash that preemptively creates
Admission,NoteandFeatureDocumentcache files. This class is not useful for returning any data (see :class:`.HospitalAdmissionDbFactoryStash).- __init__(delegate, config, name, chunk_size=0, workers=1, processor_class=<class 'zensols.multi.stash.PoolMultiProcessor'>, note_event_persister=None, adm_factory_stash=None)¶
-
adm_factory_stash:
HospitalAdmissionDbFactoryStash= None¶ The factory to create the admission instances.
-
note_event_persister:
NoteEventPersister= None¶ The persister for the
noteeventstable.
- prime()[source]¶
If the delegate stash data does not exist, use this implementation to generate the data and process in children processes.
zensols.mimic.app module¶
A utility library for parsing the MIMIC-III corpus
- class zensols.mimic.app.Application(config_factory, doc_parser, corpus, preempt_stash)[source]¶
Bases:
objectA utility library for parsing the MIMIC-III corpus
- __init__(config_factory, doc_parser, corpus, preempt_stash)¶
-
config_factory:
ConfigFactory¶ Used to get temporary resources
-
doc_parser:
FeatureDocumentParser¶ Used to parse command line documents.
- preempt_notes(input_file, workers=None)[source]¶
Preemptively document parse notes across multiple threads.
-
preempt_stash:
NoteDocumentPreemptiveStash¶ A multi-processing stash used to preemptively parse notes.
- show(sent)[source]¶
Parse a sentence and print all features for each token.
- Parameters:
sent (
str) – the sentence to parse and generate features
- uniform_sample_hadm_ids(limit=1)[source]¶
Print a uniform random sample of admission hadm_ids.
- Parameters:
limit (
int) – the number to fetch
- write_admission(hadm_id, out_dir=PosixPath('.'), output_format=NoteFormat.text)[source]¶
Write all the notes of an admission.
- Parameters:
hadm_id (
str) – the hospital admission ID or-for a random IDout_dir (
Path) – the output directoryoutput_format (
NoteFormat) – the output format of the note
- write_admission_summary(hadm_id)[source]¶
Write an admission note categories and section names.
- Parameters:
hadm_id (
str) – the hospital admission ID or-for a random ID
- write_discharge_reports(limit=1, out_dir=PosixPath('.'))[source]¶
Write discharge reports (as apposed to addendums).
- write_features(sent, out_file=None)[source]¶
Parse a sentence as MIMIC data and write features to CSV.
- write_hadm_id_for_note(row_id)[source]¶
Get the hospital admission ID (
hadm_id) that has noterow_id.
- write_note(row_id, out_file=None, output_format=NoteFormat.text)[source]¶
Write a note.
- Parameters:
row_id (
int) – the unique note identifier in the NOTEEVENTS tableoutput_format (
NoteFormat) – the output format of the noteout_file (
Path) – the file to write
zensols.mimic.cli module¶
Command line entry point to the application.
- class zensols.mimic.cli.ApplicationFactory(*args, **kwargs)[source]¶
Bases:
ApplicationFactory
zensols.mimic.corpus module¶
Discharge summary research and Mimic III data exploration.
- class zensols.mimic.corpus.Corpus(config_factory, patient_persister, admission_persister, diagnosis_persister, note_event_persister, hospital_adm_stash, temporary_results_dir)[source]¶
Bases:
DictableA container class provided access to the MIMIC-III dataset using a relational database (by default Postgress per the resource library configuration). It also has methods to dump corpus statistics.
- See:
- __init__(config_factory, patient_persister, admission_persister, diagnosis_persister, note_event_persister, hospital_adm_stash, temporary_results_dir)¶
-
admission_persister:
AdmissionPersister¶ The persister for the
admissionstable.
- clear(include_notes=True)[source]¶
Clear the all cached admission and note parses.
- Parameters:
include_notes (
bool) – whether to also clear the parsed notes cache
-
config_factory:
ConfigFactory¶ Used to clear the note event cache.
-
diagnosis_persister:
DiagnosisPersister¶ The persister for the
diagnosistable.
- get_hospital_adm_by_id(hadm_id)[source]¶
Return a hospital admission by its unique identifier.
- Return type:
- get_hospital_adm_for_note(row_id)[source]¶
Return an admission that has note
row_id.- Raise:
RecordNotFoundError if
row_idis not found in the database- Return type:
- get_note_by_id(row_id)[source]¶
Return the note (via the hospital admission) for
row_id.- Raise:
RecordNotFoundError if
row_idis not found in the database- Return type:
-
hospital_adm_stash:
HospitalAdmissionDbStash¶ Creates hospital admission instances. Note that this might be a caching stash instance, but method calls are delegated through to the instance of
HospitalAdmissionDbStash.
-
note_event_persister:
NoteEventPersister¶ The persister for the
noteeventstable.
-
patient_persister:
PatientPersister¶ The persister for the
patientstable.
-
temporary_results_dir:
Path¶ The path to create the output results. This is not used, but needs to stay until the next
zensols.mimicsidis retrained.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- write_hospital_admission(hadm_id, depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, note_line_limit=9223372036854775807)[source]¶
Write the hospital admission identified by
hadm_id.
- write_hosptial_count_admission(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, limit=9223372036854775807)[source]¶
Write the counts for each hospital admission.
- Parameters:
limit (
int) – the limit on the return admission counts- See:
AdmissionPersister.get_admission_admission_counts()
zensols.mimic.domain module¶
Domain classes for the corpus notes.
- class zensols.mimic.domain.Admission(row_id, subject_id, hadm_id, admittime, dischtime, deathtime, admission_type, admission_location, discharge_location, insurance, language, religion, marital_status, ethnicity, edregtime, edouttime, diagnosis, hospital_expire_flag, has_chartevents_data)[source]¶
Bases:
MimicContainerThe ADMISSIONS table gives information regarding a patient’s admission to the hospital. Since each unique hospital visit for a patient is assigned a unique HADM_ID, the ADMISSIONS table can be considered as a definition table for HADM_ID. Information available includes timing information for admission and discharge, demographic information, the source of the admission, and so on.
Table source: Hospital database.
Table purpose: Define a patient’s hospital admission, HADM_ID.
Number of rows: 58976
- Links to:
PATIENTS on SUBJECT_ID
- See:
- __init__(row_id, subject_id, hadm_id, admittime, dischtime, deathtime, admission_type, admission_location, discharge_location, insurance, language, religion, marital_status, ethnicity, edregtime, edouttime, diagnosis, hospital_expire_flag, has_chartevents_data)¶
-
diagnosis:
str¶ The DIAGNOSIS column provides a preliminary, free text diagnosis for the patient on hospital admission. The diagnosis is usually assigned by the admitting clinician and does not use a systematic ontology. As of MIMIC-III v1.0 there were 15,693 distinct diagnoses for 58,976 admissions. The diagnoses can be very informative (e.g. chronic kidney failure) or quite vague (e.g. weakness). Final diagnoses for a patient’s hospital stay are coded on discharge and can be found in the DIAGNOSES_ICD table. While this field can provide information about the status of a patient on hospital admission, it is not recommended to use it to stratify patients.
-
edregtime:
datetime¶ Time that the patient was registered and discharged from the emergency department.
-
has_chartevents_data:
int¶ Hospital admission has at least one observation in the CHARTEVENTS table.
-
hospital_expire_flag:
int¶ This indicates whether the patient died within the given hospitalization. 1 indicates death in the hospital, and 0 indicates survival to hospital discharge.
-
insurance:
str¶ The INSURANCE, LANGUAGE, RELIGION, MARITAL_STATUS, ETHNICITY columns describe patient demographics. These columns occur in the ADMISSIONS table as they are originally sourced from the admission, discharge, and transfers (ADT) data from the hospital database. The values occasionally change between hospital admissions (HADM_ID) for a single patient (SUBJECT_ID). This is reasonable for some fields (e.g. MARITAL_STATUS, RELIGION), but less reasonable for others (e.g. ETHNICITY).
- class zensols.mimic.domain.Diagnosis(row_id, icd9_code, short_title, long_title)[source]¶
Bases:
ICD9ContainerTable source: Hospital database.
Table purpose: Contains ICD diagnoses for patients, most notably ICD-9 diagnoses.
Number of rows: 651,047
Links to:
PATIENTS on SUBJECT_ID ADMISSIONS on HADM_ID D_ICD_DIAGNOSES on ICD9_CODE
- __init__(row_id, icd9_code, short_title, long_title)¶
- class zensols.mimic.domain.HospitalAdmissionContainer(row_id, hadm_id)[source]¶
Bases:
MimicContainerAny data container that has a unique identifier with an (inpatient) non-null identifier.
- __init__(row_id, hadm_id)¶
- class zensols.mimic.domain.ICD9Container(row_id, icd9_code, short_title, long_title)[source]¶
Bases:
MimicContainerA data container that has ICD-9 codes.
- __init__(row_id, icd9_code, short_title, long_title)¶
- class zensols.mimic.domain.MimicContainer(row_id)[source]¶
Bases:
PersistableContainer,DictableAbstract base class for data containers, which are plain old Python objects that are CRUD’d from DAO persisters.
- __init__(row_id)¶
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, dct=None)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- exception zensols.mimic.domain.MimicError[source]¶
Bases:
APIErrorRaised for any application level error.
- __module__ = 'zensols.mimic.domain'¶
- exception zensols.mimic.domain.MimicParseError(text)[source]¶
Bases:
MimicErrorRaised for MIMIC note parsing errors.
- __annotations__ = {}¶
- __module__ = 'zensols.mimic.domain'¶
- class zensols.mimic.domain.NoteEvent(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
MimicContainerTable source: Hospital database.
Table purpose: Contains all notes for patients.
Number of rows: 2,083,180
- Links to:
PATIENTS on SUBJECT_ID
ADMISSIONS on HADM_ID
CAREGIVERS on CGID
- See:
- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
-
category:
str¶ Category of the note, e.g. Discharge summary.
CATEGORY and DESCRIPTION define the type of note recorded. For example, a CATEGORY of ‘Discharge summary’ indicates that the note is a discharge summary, and the DESCRIPTION of ‘Report’ indicates a full report while a DESCRIPTION of ‘Addendum’ indicates an addendum (additional text to be added to the previous report).
-
chartdate:
datetime¶ Date when the note was charted.
CHARTDATE records the date at which the note was charted. CHARTDATE will always have a time value of 00:00:00.
CHARTTIME records the date and time at which the note was charted. If both CHARTDATE and CHARTTIME exist, then the date portions will be identical. All records have a CHARTDATE. A subset are missing CHARTTIME. More specifically, notes with a CATEGORY value of ‘Discharge Summary’, ‘ECG’, and ‘Echo’ never have a CHARTTIME, only CHARTDATE. Other categories almost always have both CHARTTIME and CHARTDATE, but there is a small amount of missing data for CHARTTIME (usually less than 0.5% of the total number of notes for that category).
STORETIME records the date and time at which a note was saved into the system. Notes with a CATEGORY value of ‘Discharge Summary’, ‘ECG’, ‘Radiology’, and ‘Echo’ never have a STORETIME. All other notes have a STORETIME.
-
charttime:
datetime¶ Date and time when the note was charted. Note that some notes (e.g. discharge summaries) do not have a time associated with them: these notes have NULL in this column.
- See:
-
context:
InitVar¶ Contains resources needed by new and re-hydrated notes, such as the document stash.
- property doc: FeatureDocument¶
The parsed document of the
nameof the section.
- get_normal_name(include_desc=True)[source]¶
A normalized name of the note useful as a file name (sans extension).
-
subject_id:
int¶ Foreign key. Identifies the patient.
Identifiers which specify the patient: SUBJECT_ID is unique to a patient and HADM_ID is unique to a patient hospital stay.
:see
hadm_id
- class zensols.mimic.domain.Patient(row_id, subject_id, gender, dob, dod, dod_hosp, dod_ssn, expire_flag)[source]¶
Bases:
MimicContainerTable source: CareVue and Metavision ICU databases.
Table purpose: Defines each SUBJECT_ID in the database, i.e. defines a single patient.
Number of rows: 46,520
Links to: ADMISSIONS on SUBJECT_ID ICUSTAYS on SUBJECT_ID
- __init__(row_id, subject_id, gender, dob, dod, dod_hosp, dod_ssn, expire_flag)¶
- class zensols.mimic.domain.Procedure(row_id, icd9_code, short_title, long_title)[source]¶
Bases:
ICD9ContainerTable source: Hospital database.
Table purpose: Contains ICD procedures for patients, most notably ICD-9 procedures.
Number of rows: 240,095
Links to:
PATIENTS on SUBJECT_ID ADMISSIONS on HADM_ID D_ICD_PROCEDURES on ICD9_CODE
- __init__(row_id, icd9_code, short_title, long_title)¶
zensols.mimic.note module¶
EHR related text documents.
- class zensols.mimic.note.DefaultNoteFactory(config_factory, category_to_note, mimic_default_note_section)[source]¶
Bases:
NoteFactoryA note factory that creates only default notes.
- __init__(config_factory, category_to_note, mimic_default_note_section)¶
- class zensols.mimic.note.GapSectionContainer(delegate, filter_empty)[source]¶
Bases:
SectionContainerA container that fills in missing sections of text from a note with additional sections.
- __init__(delegate, filter_empty)¶
- class zensols.mimic.note.Note(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
NoteEvent,SectionContainerA container class of
Sectionfor each section for the text in the note events given by the propertysections.- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- property section_annotator_type: SectionAnnotatorType¶
A human readable string describing who or what annotated the note.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write the note event.
- Parameters:
line_limit – the number of lines to write from the note text
write_divider – whether to write a divider before the note text
indent_fields – whether to indent the fields of the note
note_indent – how many indentation to indent the note fields
- write_fields(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write note header fields such as the
row_idandcategory.
- write_full(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, note_line_limit=9223372036854775807, section_line_limit=9223372036854775807, section_sent_limit=9223372036854775807, include_section_header=True, sections=None, include_fields=True, include_note_divider=True, include_section_divider=True)[source]¶
Write the custom parts of the note.
- Parameters:
note_line_limit (
int) – the number of lines to write from the note textsection_line_limit (
int) – the number of line of the section’s body and number of sentences to outputpar_limit – the number of paragraphs to output
include_section_header (
bool) – whether to include the headerinclude_fields (
bool) – whether to write the note fieldsinclude_note_divider (
bool) – whether to write dividers between notesinclude_section_divider (
bool) – whether to write dividers between sections
- class zensols.mimic.note.NoteFactory(config_factory, category_to_note, mimic_default_note_section)[source]¶
Bases:
PrimeableCreates an instance of
NotefromNoteEvent.- __init__(config_factory, category_to_note, mimic_default_note_section)¶
-
category_to_note:
Dict[str,str]¶ .Note` configuration.
- Type:
A mapping between notes’ category to section name for
- Type:
class
-
config_factory:
ConfigFactory¶ The factory used to create notes.
-
mimic_default_note_section:
str¶ The section name holding the configuration of the class to create when there is no mapping in
category_to_note.
- class zensols.mimic.note.NoteFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumUsed in
Note.format()for a parameterized method to write a note.- json = 5¶
- markdown = 7¶
- raw = 2¶
- summary = 4¶
- text = 1¶
- verbose = 3¶
- yaml = 6¶
- class zensols.mimic.note.ParagraphFactory[source]¶
Bases:
objectSplits a document in to constituent paragraphs.
- __init__()¶
- class zensols.mimic.note.Section(id, name, container, header_spans, body_span)[source]¶
Bases:
PersistableContainer,DictableA section segment with an identifier and represents a section of a
Note, one for each section. An example of a section is the history of present illness in a discharge note.- __init__(id, name, container, header_spans, body_span)¶
- property body_doc: FeatureDocument¶
A feature document of the body of this section’s body text.
-
body_span:
LexicalSpan¶ Like
header_spansbut for the section body. The body and name do not intersect.
- property body_tokens: Iterable[FeatureToken]¶
-
container:
SectionContainer¶ The container that has this section.
- property doc: FeatureDocument¶
A feature document of the section’s body text.
-
header_spans:
Tuple[LexicalSpan,...]¶ The character offsets of the section headers. The first is usually the
nameof the section. If there are no headers, this is an 0-length tuple.
- property header_tokens: Iterable[FeatureToken]¶
- property lexspan: LexicalSpan¶
The widest lexical extent of the sections, including headers.
-
name:
Optional[str]¶ The name of the section (i.e.
hospital-course). This field is what’s called thetypein the paper, which is not used sincetypeis a keyword in Python.
- static name_to_header(s)[source]¶
Convert a section name to a section header text. Note that this uses a heuristic method that might generate a string that does not match the original header text.
- Return type:
- property paragraphs: Tuple[FeatureDocument, ...]¶
The list of paragraphs, each as as a feature document, of this section’s body text.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, body_line_limit=9223372036854775807, norm_line_limit=9223372036854775807, par_limit=0, sent_limit=0, include_header=True, include_id_name=True, include_header_spans=False, include_body_span=False)[source]¶
Write a note section’s name, original body, normalized body and sentences with respective sentence entities.
- Parameters:
body_line_limit (
int) – the number of line of the section’s body to outputnorm_line_limit (
int) – the number of line of the section’s normalized (parsed) body to outputpar_limit (
int) – the number of paragraphs to outputsent_limit (
int) – the number of sentences to outputinclude_header (
bool) – whether to include the headerinclude_id_name (
bool) – whether to write the section ID and name
- class zensols.mimic.note.SectionAnnotatorType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumThe type of
Sectionannotator forNoteinstances. The MedSecId project adds thehumanandmodel:- See:
- NONE = 1¶
Default for those without section identifiers.
- REGULAR_EXPRESSION = 2¶
Sections are automatically assigned by regular expressions.
- class zensols.mimic.note.SectionContainer[source]¶
Bases:
DictableA note like container base class that has sections. Note based classes extend this base class. Sections in order of their position in the document are produced when using this class as an iterable.
-
DEFAULT_SECTION_NAME:
ClassVar[str] = 'default'¶ The name of the singleton section when none the note is not sectioned.
- __init__()¶
- static category_to_id(s)[source]¶
Convert a category string (i.e.
Discharge summary) to a category ID (i.e.discharge-summary).- Return type:
- static id_to_category(s)[source]¶
Convert a category ID (i.e.
discharge-summary) to a category string (i.e.Discharge summary).- Return type:
- property section_dataframe: DataFrame¶
A Pandas dataframe containing the section’s name, header and body offset spans.
- property sections_by_name: Dict[str, Tuple[Section, ...]]¶
A map from the name of a section (i.e. history of present illness in discharge notes) to a note section.
- property sections_ordered: Tuple[Section, ...]¶
Sections returned in order as they appear in the note.
- write(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write this instance as either a
Writableor as aDictable. If class attribute_DICTABLE_WRITABLE_DESCENDANTSis set asTrue, then use thewrite()method on children instead of writing the generated dictionary. Otherwise, write this instance by first creating adictrecursively usingasdict(), then formatting the output.If the attribute
_DICTABLE_WRITE_EXCLUDESis set, those attributes are removed from what is written in thewrite()method.Note that this attribute will need to be set in all descendants in the instance hierarchy since writing the object instance graph is done recursively.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writable
- write_by_format(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, note_format=<enum 'NoteFormat'>)[source]¶
Write the note in the specified format.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writablenote_format (
NoteFormat) – the format to use for the output
- write_fields(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Write note header fields such as the
row_idandcategory.
- write_full(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, note_line_limit=9223372036854775807, section_line_limit=9223372036854775807, section_sent_limit=9223372036854775807, include_section_header=True, sections=None, include_fields=True, include_note_divider=True, include_section_divider=True)[source]¶
Write the custom parts of the note.
- Parameters:
note_line_limit (
int) – the number of lines to write from the note textsection_line_limit (
int) – the number of line of the section’s body and number of sentences to outputpar_limit – the number of paragraphs to output
include_section_header (
bool) – whether to include the headerinclude_fields (
bool) – whether to write the note fieldsinclude_note_divider (
bool) – whether to write dividers between notesinclude_section_divider (
bool) – whether to write dividers between sections
- write_human(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, normalize=False)[source]¶
Generates a human readable version of the annotation. This calls the following methods in order:
write_fields()andwrite_sections().- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writablenormalize (
bool) – whether to use the paragraphs’ normalized (:obj:~zensols.nlp.TokenContainer.norm`) or text
- write_markdown(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, normalize=False)[source]¶
Generates markdown version of the annotation.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writablenormalize (
bool) – whether to use the paragraphs’ normalized (:obj:~zensols.nlp.TokenContainer.norm`) or text
- write_sections(depth=0, writer=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, normalize=False)[source]¶
Writes the sections of the container.
- Parameters:
depth (
int) – the starting indentation depthwriter (
TextIOBase) – the writer to dump the content of this writablenormalize (
bool) – whether to use the paragraphs’ normalized (:obj:~zensols.nlp.TokenContainer.norm`) or text
-
DEFAULT_SECTION_NAME:
zensols.mimic.parafac module¶
Paragraph factories.
- class zensols.mimic.parafac.ChunkingParagraphFactory(min_sent_len, min_list_norm_matches, max_sent_list_len, include_section_headers, filter_sent_text)[source]¶
Bases:
ParagraphFactoryA paragraph factory that uses
zensols.nlp.chunkerchunking to split paragraphs and MIMIC lists.-
MIMIC_SPAN_PATTERN:
ClassVar[Pattern] = re.compile('(.+?)(?:(?=[\\n.]{2})|\\Z)', re.MULTILINE|re.DOTALL)¶ MIMIC regular expression adds period, which is used in notes to separate paragraphs.
- __init__(min_sent_len, min_list_norm_matches, max_sent_list_len, include_section_headers, filter_sent_text)¶
-
max_sent_list_len:
int¶ The maximum lenght a sentence can be to keep it chunked as a list. Otherwise very long sentences form from what appear to be front list syntax.
-
MIMIC_SPAN_PATTERN:
- class zensols.mimic.parafac.WhitespaceParagraphFactory[source]¶
Bases:
ParagraphFactoryA simple paragraph factory that splits on whitespace.
zensols.mimic.persist module¶
Persisters for the MIMIC-III database.
- class zensols.mimic.persist.AdmissionPersister(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)[source]¶
Bases:
DataClassDbPersisterManages instances of
Admission.- __init__(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)¶
- get_admission_counts(limit=9223372036854775807)[source]¶
Return the counts of subjects for each hospital admission.
- class zensols.mimic.persist.DiagnosisPersister(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)[source]¶
Bases:
DataClassDbPersisterManages instances of
Diagnosis.- __init__(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)¶
- class zensols.mimic.persist.NoteDocumentStash(doc_parser=None, note_db_persister=None)[source]¶
Bases:
ReadOnlyStashReads
noteeventsfrom the database and returns parsed documents.- __init__(doc_parser=None, note_db_persister=None)¶
-
doc_parser:
FeatureDocumentParser= None¶ NER+L medical domain natural langauge parser.
- exists(name)[source]¶
Return
Trueif data with keynameexists.Implementation note: This
Stash.exists()method is very inefficient and should be overriden.- Return type:
- load(row_id)[source]¶
Load a data value from the pickled data with key
name. Semantically, this method loads the using the stash’s implementation. For exampleDirectoryStashloads the data from a file if it exists, but factory type stashes will always re-generate the data.- See:
get()- Return type:
-
note_db_persister:
DbPersister= None¶ Fetches the note text by key from the DB.
- class zensols.mimic.persist.NoteEventPersister(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None, mimic_note_context=None, hadm_row_chunk_size=None)[source]¶
Bases:
DataClassDbPersisterManages instances of
NoteEvent.- __init__(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None, mimic_note_context=None, hadm_row_chunk_size=None)¶
- get_discharge_reports(limit=9223372036854775807)[source]¶
Return discharge reports (as apposed to addendums).
- get_notes_by_category(category, limit=9223372036854775807)[source]¶
Return notes by what the category to which they belong.
-
hadm_row_chunk_size:
int= None¶ The number of note IDs for each round trip to the DB in
get_hadm_ids().
- class zensols.mimic.persist.PatientPersister(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)[source]¶
Bases:
DataClassDbPersisterManages instances of
Patient.- __init__(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)¶
- class zensols.mimic.persist.ProcedurePersister(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)[source]¶
Bases:
DataClassDbPersisterManages instances of
Procedure.- __init__(conn_manager, sql_file=None, row_factory='tuple', select_name=None, select_by_id_name=None, select_exists_name=None, insert_name=None, update_name=None, delete_name=None, keys_name=None, count_name=None, bean_class=None)¶
zensols.mimic.regexnote module¶
Regular expression note parsing
- class zensols.mimic.regexnote.ConsultNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNoteContains sections for the discharge summary. There should be only one of these per hospital admission.
- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.DischargeSummaryNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNoteContains sections for the discharge summary. There should be only one of these per hospital admission.
- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.EchoNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNote- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.NursingOtherNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNote- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.PhysicianNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNote- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.RadiologyNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
RegexNote- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
- class zensols.mimic.regexnote.RegexNote(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)[source]¶
Bases:
NoteBase class used to collect subclass regular expressions captures and create sections from them.
- __init__(row_id, subject_id, hadm_id, chartdate, charttime, storetime, category, description, cgid, iserror, text, context)¶
zensols.mimic.tokenizer module¶
Modify the spaCy parser configuration to deal with the MIMIC-III dataset.
- class zensols.mimic.tokenizer.MimicTokenDecorator(token_entities=((re.compile('^First Name'), 'FIRSTNAME', 'PERSON'), (re.compile('^Last Name'), 'LASTNAME', 'PERSON'), (re.compile('^21\\\\d{2}-\\\\d{1,2}-\\\\d{1,2}$'), 'DATE', 'DATE')), token_replacements=())[source]¶
Bases:
FeatureTokenDecoratorContains the MIMIC-III regular expressions and other patterns to annotate and normalized feature tokens. The class finds mask tokens and separators (such as a long string of dashes or asterisks).
Attribute
onto_mappingis a mapping from the MIMIC symbol intoken_entities(2nd value in tuple) to Onto Notes 5, which is used as the NER symbol in spaCy.-
MASK_TOKEN_FEATURE:
ClassVar[str] = 'mask'¶ The value given from entity
TOKEN_FEATURE_IDfor mask tokens (i.e.[**First Name**]).
-
ONTO_FEATURE_ID:
ClassVar[str] = 'onto_'¶ The feature ID to use for the Onto Notes 5 (
onto_mapping).
-
SEPARATOR_TOKEN_FEATURE:
ClassVar[str] = 'separator'¶ The value name of separators defined by
SEP_REGEX.
-
SEP_REGEX:
ClassVar[Pattern] = re.compile('(_{5,}|[*]{5,}|[-]{5,})')¶ Matches text based separators such as a long string of dashes.
-
UNKNOWN_ENTITY:
ClassVar[str] = '<UNKNOWN>'¶ The mask nromalized token form for unknown MIMIC entity text (i.e. First Name).
- __init__(token_entities=((re.compile('^First Name'), 'FIRSTNAME', 'PERSON'), (re.compile('^Last Name'), 'LASTNAME', 'PERSON'), (re.compile('^21\\\\d{2}-\\\\d{1,2}-\\\\d{1,2}$'), 'DATE', 'DATE')), token_replacements=())¶
-
token_entities:
Tuple[Tuple[Union[Pattern,str]],str,Optional[str]] = ((re.compile('^First Name'), 'FIRSTNAME', 'PERSON'), (re.compile('^Last Name'), 'LASTNAME', 'PERSON'), (re.compile('^21\\d{2}-\\d{1,2}-\\d{1,2}$'), 'DATE', 'DATE'))¶ A list of psuedo token patterns and a string to replace with the respective match.
-
MASK_TOKEN_FEATURE:
- class zensols.mimic.tokenizer.MimicTokenizerComponent(name, pipe_name=None, pipe_config=None, pipe_add_kwargs=<factory>, modules=(), initializers=())[source]¶
Bases:
ComponentModifies the spacCy tokenizer to split on colons (
:) to capture more MIMIC-III mask tokens.- __init__(name, pipe_name=None, pipe_config=None, pipe_add_kwargs=<factory>, modules=(), initializers=())¶
- init(model, parser)[source]¶
Initialize the component and add it to the NLP pipe line. This base class implementation loads the
module, then callsLanguage.add_pipe().- Parameters:
model (
Language) – the model to add the spaCy model (nlpin their parlance)parser (
FeatureDocumentParser) – the owning parser of this component instance