Frequency Bias in MLM-trained BERT Embeddings for Medical Codes
Keywords:
Masked Language Modeling, Embeddings, BERT, Medical AI, Electronic Health RecordsAbstract
Transformers are deep networks that operate on loosely structured data such as natural language and electronic medical records. Transformers learn embedding vectors that represent discrete inputs (e.g. words; medical codes). Ideally, a transformer should learn similar embedding vectors for two codes with similar medical meanings, as this will help the network make similar inferences given either of these codes. Previous work has suggested that they do so, but this has not been analysed in detail, and work with transformers in other domains suggests that unwanted biases can occur. We trained a Bidirectional Encoder Representations from Transformers (BERT) network with clinical diagnostic codes and analyzed the learned embeddings. The analysis shows that the transformer can learn an undesirable frequency-related bias in embedding similarities, failing to reflect true similarity relationships between medical codes. This is especially true for codes that are infrequently used. It will be important to mitigate this issue in future applications of deep networks to electronic health records.