Distinguishing 19th Century British Novels by Women Authors Using Natural Language Processing
Keywords:
Authorship, Jane Austen, Mary Shelley, Mary Brunton, Natural Language Processing, BERT model, Binary Logistic RegressionAbstract
This paper utilized the BERT model and binary logistic regression to distinguish books written by 19th-century British women, specifically exploring AI’s ability to determine author differences and keywords in each book. Two books each by Jane Austen, Mary Shelley, and Mary Brunton were divided into uniformly sized sections to train and test the BERT model. Its task was to analyze the author-labeled training set, and then assign author labels to the separate testing set. The results showed that the model achieved 84.44% accuracy. A z-test yielded a z-score of 35.63 and a negligibly small p-value approaching 0. Binary logistic regression was then utilized to pinpoint the most distinctive words from each book, helping to understand the differences between the books.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Phoebe M. Xu (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.