{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "code", "source": [ "!wget https://tufts.box.com/shared/static/325sgkodnq30ez61ugazvctif6r24hsu.csv -O daf.csv" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "V22N0lxEwCAe", "outputId": "e46bb7ca-930f-4e08-981b-7513a306c9ec" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "--2024-05-23 19:14:49-- https://tufts.box.com/shared/static/325sgkodnq30ez61ugazvctif6r24hsu.csv\n", "Resolving tufts.box.com (tufts.box.com)... 74.112.186.144\n", "Connecting to tufts.box.com (tufts.box.com)|74.112.186.144|:443... connected.\n", "HTTP request sent, awaiting response... 301 Moved Permanently\n", "Location: /public/static/325sgkodnq30ez61ugazvctif6r24hsu.csv [following]\n", "--2024-05-23 19:14:49-- https://tufts.box.com/public/static/325sgkodnq30ez61ugazvctif6r24hsu.csv\n", "Reusing existing connection to tufts.box.com:443.\n", "HTTP request sent, awaiting response... 301 Moved Permanently\n", "Location: https://tufts.app.box.com/public/static/325sgkodnq30ez61ugazvctif6r24hsu.csv [following]\n", "--2024-05-23 19:14:49-- https://tufts.app.box.com/public/static/325sgkodnq30ez61ugazvctif6r24hsu.csv\n", "Resolving tufts.app.box.com (tufts.app.box.com)... 74.112.186.144\n", "Connecting to tufts.app.box.com (tufts.app.box.com)|74.112.186.144|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://public.boxcloud.com/d/1/b1!__djTgdqeAMSWVZAg0PpbJ5coT6qH9qp1oRU3oQs4OvTQfLc6hiYwi_2znhaL08Uui5ohPuZyjCEg9OQ8P7mzE4ytSv6sAFYcKJy5hdyG85aijzViv8AV59rlbtk54UhWIXbSXp0Te1app5gSHQef6IZaf6GVPDzhNnxrYGwfBVbT51tlW0QitsSUHjywLoRPDIzZsw1H4rV_hTxbeJP-i_wo4vmd9fPQj9Bc0DS4Hueq7mtHnyrd0vuwzuKheDpFWUIJuQvnQpEQGRYlkfjBDVh80Hh7WFhnh_6n5H_oVj22BZC3vU0ZCBKwfuilD9myuoPH051oRCGCME93i6gzM1Qfn0k__XelmpUUpJJ4A8q0RCyy9l4oNs_LzcUZdhe2cSI_5lO8yagBaUrmrHsWCev8qwd4i-of3o-MDFFvaJ-DXQ3pOxKBJCml-t-TbqGbWoS5ZUzC4M8YDN4wT8aZ04kdjaDtvBZW9vu4z-H5P7sC5x_6o0FDR702a8w2Ph3BCRXKQ5e0Ojtui_JeGnz61F-wZ14okyvSZk6rhDhtuzxFY-SZLBkxZrkzpB8VpcOG1-57IcFsREpOfP9VG0vBQ2yTcl0tg7v7K2qkowx2LuSkhzCft3lv45s4QDu6e_ckR0m68lxXQlwwM5DZuh4ML0ylFOSSxyxjJp2XqrDg3GmL5-XrI9_IQaY_6_F61z_duOsQCGbMB1VNlEa6OKGt3iQHm9qw5D032hKJ9wueULYgA_1XGrFM58-9BkbxVQq8LpBLOf2jWtFTKHSTORiKvXd6FYD97_QtA1elPdeJDUcLMo3okWGR0p82MBLH2CSTd7u7eirtczTtX7qRIS7eT3m6vj8AuPZpkg-CuBE-PNoMHXU-CJxvLMGw0cw0fua1Kv4TQw5JRRWMZMzNFPJUZKrO7z-oDKpZ8yBAQCF8zssXiBEQiW86dlQKEp4Gq8l2F9dFyR-mN3_cqm1RZSDkQ4BtPHvV9VMZVu4oqT4uomlA4Sxyj6A3Z3FrVkHhGzCada1Su6VFDMursSQSiBdH0lY1-TxJi0JhoWYCE0E6ar46CZjLk7W3oanYrf5BjaEa1PMaNFRqORf82jXvVngWb95Djnvdm6K37lJhsseAXuql6gIQ3hGB2nPKoDrjLcF4y7Q86k-VF0n5ogE5f88KiVMAfBiHyuLg5PCCzJUSjIBQPiH3QH9_Qy9oQu1LCJ0mYpn-7OPrlNBDFuY21hJ29XpbTSZsgYicYVMe5uc2VUfttSwpkNE8zkkRcpxEvvyOYWaBjY7OA9Lu1GQWg8Dy8EAaX2MW5rYu5LZiukxSC1HL6PN3PbJ4VuxH9f-HzJxTgc43mG5ux3bwwlcK9Btrgk4axk_TV3wQaWivk8./download [following]\n", "--2024-05-23 19:14:50-- https://public.boxcloud.com/d/1/b1!__djTgdqeAMSWVZAg0PpbJ5coT6qH9qp1oRU3oQs4OvTQfLc6hiYwi_2znhaL08Uui5ohPuZyjCEg9OQ8P7mzE4ytSv6sAFYcKJy5hdyG85aijzViv8AV59rlbtk54UhWIXbSXp0Te1app5gSHQef6IZaf6GVPDzhNnxrYGwfBVbT51tlW0QitsSUHjywLoRPDIzZsw1H4rV_hTxbeJP-i_wo4vmd9fPQj9Bc0DS4Hueq7mtHnyrd0vuwzuKheDpFWUIJuQvnQpEQGRYlkfjBDVh80Hh7WFhnh_6n5H_oVj22BZC3vU0ZCBKwfuilD9myuoPH051oRCGCME93i6gzM1Qfn0k__XelmpUUpJJ4A8q0RCyy9l4oNs_LzcUZdhe2cSI_5lO8yagBaUrmrHsWCev8qwd4i-of3o-MDFFvaJ-DXQ3pOxKBJCml-t-TbqGbWoS5ZUzC4M8YDN4wT8aZ04kdjaDtvBZW9vu4z-H5P7sC5x_6o0FDR702a8w2Ph3BCRXKQ5e0Ojtui_JeGnz61F-wZ14okyvSZk6rhDhtuzxFY-SZLBkxZrkzpB8VpcOG1-57IcFsREpOfP9VG0vBQ2yTcl0tg7v7K2qkowx2LuSkhzCft3lv45s4QDu6e_ckR0m68lxXQlwwM5DZuh4ML0ylFOSSxyxjJp2XqrDg3GmL5-XrI9_IQaY_6_F61z_duOsQCGbMB1VNlEa6OKGt3iQHm9qw5D032hKJ9wueULYgA_1XGrFM58-9BkbxVQq8LpBLOf2jWtFTKHSTORiKvXd6FYD97_QtA1elPdeJDUcLMo3okWGR0p82MBLH2CSTd7u7eirtczTtX7qRIS7eT3m6vj8AuPZpkg-CuBE-PNoMHXU-CJxvLMGw0cw0fua1Kv4TQw5JRRWMZMzNFPJUZKrO7z-oDKpZ8yBAQCF8zssXiBEQiW86dlQKEp4Gq8l2F9dFyR-mN3_cqm1RZSDkQ4BtPHvV9VMZVu4oqT4uomlA4Sxyj6A3Z3FrVkHhGzCada1Su6VFDMursSQSiBdH0lY1-TxJi0JhoWYCE0E6ar46CZjLk7W3oanYrf5BjaEa1PMaNFRqORf82jXvVngWb95Djnvdm6K37lJhsseAXuql6gIQ3hGB2nPKoDrjLcF4y7Q86k-VF0n5ogE5f88KiVMAfBiHyuLg5PCCzJUSjIBQPiH3QH9_Qy9oQu1LCJ0mYpn-7OPrlNBDFuY21hJ29XpbTSZsgYicYVMe5uc2VUfttSwpkNE8zkkRcpxEvvyOYWaBjY7OA9Lu1GQWg8Dy8EAaX2MW5rYu5LZiukxSC1HL6PN3PbJ4VuxH9f-HzJxTgc43mG5ux3bwwlcK9Btrgk4axk_TV3wQaWivk8./download\n", "Resolving public.boxcloud.com (public.boxcloud.com)... 74.112.186.130\n", "Connecting to public.boxcloud.com (public.boxcloud.com)|74.112.186.130|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 4643634 (4.4M) [text/csv]\n", "Saving to: ‘daf.csv’\n", "\n", "daf.csv 100%[===================>] 4.43M --.-KB/s in 0.09s \n", "\n", "2024-05-23 19:14:50 (49.2 MB/s) - ‘daf.csv’ saved [4643634/4643634]\n", "\n" ] } ] }, { "cell_type": "markdown", "source": [ "# Topic Modeling using Traditional Machine Learning\n", "\n", "In this workshop, we'll be learning how to conduct topic modelling on text using `sci-kit learn`.\n", "\n", "This workshop builds on what we learned about TF-IDF in the `Textual Feature Extraction using Traditional Machine Learning`. In that notebook, we saw how we could take sections of text from a larger work and turn them into a numerial representations of that text. We also saw how we might begin to manipulate this numerical data, for example using linear regression. In this workshop, we'll see a more complex transformation of the numerical data. That said, the underlying concept remains the same: we can split up our corpus into several texts and then we can using TF-IDF to transform this text into a matrix of numbers. Instead of using the dot product to determine similarity between chapters, though, we'll see how we can find similar word usage acorss different chapters.\n", "\n", "**Topic modelling seeks to group together words which have a similar usage. These groups constitute a topic**. As a result, topic modelling can be particularly useful if you don't know what is in a text, but you know that it has distinct parts. As we will see, there are two different approaches to topic modelling, Non-negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA), but the results will be very similar. " ], "metadata": { "id": "veUtEzJSBtsW" } }, { "cell_type": "markdown", "source": [ "## Data\n", "\n", "For this example, like in `Textual Feature Extraction using Traditional Machine Learning`, we'll be using Edward Gibbon's *Decline and Fall of the Roman Empire* as our example text. I like using this source for topic modelling because, providing a history of Western Europe from the 200s to the 1450s AD, it's incredibly long and multifacetted. These are the sorts of texts for which topic modelling can be most useful, though feel free to use any other text instead." ], "metadata": { "id": "B3w71Ny8FkwP" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "EKCrvbj9ugkc" }, "outputs": [], "source": [ "import pandas as pd\n", "import pprint\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.decomposition import NMF, LatentDirichletAllocation\n", "import matplotlib.pyplot as plt\n", "\n", "random_state = 1337 # will be using later" ] }, { "cell_type": "code", "source": [ "daf = pd.read_csv('daf.csv')[['title','text']]\n", "daf" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "4VfO4dZgxFMz", "outputId": "42675afd-5204-4943-dbca-d431b9e49f40" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " title \\\n", "0 The Extent Of The Empire In The Age Of The Ant... \n", "1 The Extent Of The Empire In The Age Of The Ant... \n", "2 The Extent Of The Empire In The Age Of The Ant... \n", "3 The Internal Prosperity In The Age Of The Anto... \n", "4 The Internal Prosperity In The Age Of The Anto... \n", ".. ... \n", "291 Final Settlement Of The Ecclesiastical State.—... \n", "292 Final Settlement Of The Ecclesiastical State.—... \n", "293 Final Settlement Of The Ecclesiastical State.—... \n", "294 Prospect Of The Ruins Of Rome In The Fifteenth... \n", "295 Prospect Of The Ruins Of Rome In The Fifteenth... \n", "\n", " text \n", "0 Introduction. The Extent And Military Fo... \n", "1 It was an ancient tradition, that when the Cap... \n", "2 The camp of a Roman legion presented the appea... \n", "3 Of The Union And Internal Prosperity Of The Ro... \n", "4 Till the privileges of Romans had been progres... \n", ".. ... \n", "291 Never perhaps has the energy and effect of a s... \n", "292 Without drawing his sword, count Pepin restore... \n", "293 The royal prerogative of coining money, which ... \n", "294 Prospect Of The Ruins Of Rome In The Fifteenth... \n", "295 These general observations may be separately a... \n", "\n", "[296 rows x 2 columns]" ], "text/html": [ "\n", "
\n", " | title | \n", "text | \n", "
---|---|---|
0 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "Introduction. The Extent And Military Fo... | \n", "
1 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "It was an ancient tradition, that when the Cap... | \n", "
2 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "The camp of a Roman legion presented the appea... | \n", "
3 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Of The Union And Internal Prosperity Of The Ro... | \n", "
4 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Till the privileges of Romans had been progres... | \n", "
... | \n", "... | \n", "... | \n", "
291 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Never perhaps has the energy and effect of a s... | \n", "
292 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Without drawing his sword, count Pepin restore... | \n", "
293 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "The royal prerogative of coining money, which ... | \n", "
294 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "
295 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "These general observations may be separately a... | \n", "
296 rows × 2 columns
\n", "NMF(n_components=12, random_state=1337)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
NMF(n_components=12, random_state=1337)
LatentDirichletAllocation(n_components=12, random_state=1337)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LatentDirichletAllocation(n_components=12, random_state=1337)
\n", " | chapter_number | \n", "title | \n", "text | \n", "
---|---|---|---|
0 | \n", "1 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "Introduction. The Extent And Military Fo... | \n", "
1 | \n", "2 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "It was an ancient tradition, that when the Cap... | \n", "
2 | \n", "3 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "The camp of a Roman legion presented the appea... | \n", "
3 | \n", "4 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Of The Union And Internal Prosperity Of The Ro... | \n", "
4 | \n", "5 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Till the privileges of Romans had been progres... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
291 | \n", "292 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Never perhaps has the energy and effect of a s... | \n", "
292 | \n", "293 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Without drawing his sword, count Pepin restore... | \n", "
293 | \n", "294 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "The royal prerogative of coining money, which ... | \n", "
294 | \n", "295 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "
295 | \n", "296 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "These general observations may be separately a... | \n", "
296 rows × 3 columns
\n", "\n", " | chapter_number | \n", "title | \n", "text | \n", "Topic 1 | \n", "Topic 2 | \n", "Topic 3 | \n", "Topic 4 | \n", "Topic 5 | \n", "Topic 6 | \n", "Topic 7 | \n", "Topic 8 | \n", "Topic 9 | \n", "Topic 10 | \n", "Topic 11 | \n", "Topic 12 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "Introduction. The Extent And Military Fo... | \n", "0.140357 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.138453 | \n", "0.000000 | \n", "0.008540 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "
1 | \n", "2 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "It was an ancient tradition, that when the Cap... | \n", "0.109082 | \n", "0.013866 | \n", "0.000000 | \n", "0.004476 | \n", "0.000000 | \n", "0.000000 | \n", "0.116163 | \n", "0.000000 | \n", "0.007720 | \n", "0.000000 | \n", "0.0 | \n", "0.006961 | \n", "
2 | \n", "3 | \n", "The Extent Of The Empire In The Age Of The Ant... | \n", "The camp of a Roman legion presented the appea... | \n", "0.016352 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.009661 | \n", "0.000000 | \n", "0.327493 | \n", "0.000000 | \n", "0.016435 | \n", "0.000000 | \n", "0.0 | \n", "0.027077 | \n", "
3 | \n", "4 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Of The Union And Internal Prosperity Of The Ro... | \n", "0.059518 | \n", "0.143894 | \n", "0.000000 | \n", "0.004308 | \n", "0.000000 | \n", "0.000000 | \n", "0.038118 | \n", "0.000000 | \n", "0.000000 | \n", "0.011437 | \n", "0.0 | \n", "0.000000 | \n", "
4 | \n", "5 | \n", "The Internal Prosperity In The Age Of The Anto... | \n", "Till the privileges of Romans had been progres... | \n", "0.063011 | \n", "0.020749 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.096147 | \n", "0.025652 | \n", "0.000000 | \n", "0.078776 | \n", "0.0 | \n", "0.007686 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
291 | \n", "292 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Never perhaps has the energy and effect of a s... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.005997 | \n", "0.002712 | \n", "0.040459 | \n", "0.000000 | \n", "0.299737 | \n", "0.005080 | \n", "0.000000 | \n", "0.0 | \n", "0.008144 | \n", "
292 | \n", "293 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "Without drawing his sword, count Pepin restore... | \n", "0.000000 | \n", "0.000000 | \n", "0.002737 | \n", "0.002752 | \n", "0.000000 | \n", "0.001784 | \n", "0.000000 | \n", "0.432059 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.002417 | \n", "
293 | \n", "294 | \n", "Final Settlement Of The Ecclesiastical State.—... | \n", "The royal prerogative of coining money, which ... | \n", "0.015235 | \n", "0.000000 | \n", "0.000000 | \n", "0.013029 | \n", "0.007963 | \n", "0.033410 | \n", "0.000000 | \n", "0.296186 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.004897 | \n", "
294 | \n", "295 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "0.000000 | \n", "0.047675 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.044810 | \n", "0.085804 | \n", "0.106584 | \n", "0.000000 | \n", "0.055540 | \n", "0.0 | \n", "0.000000 | \n", "
295 | \n", "296 | \n", "Prospect Of The Ruins Of Rome In The Fifteenth... | \n", "These general observations may be separately a... | \n", "0.000000 | \n", "0.001118 | \n", "0.006289 | \n", "0.000000 | \n", "0.014871 | \n", "0.000000 | \n", "0.090771 | \n", "0.171290 | \n", "0.000000 | \n", "0.036981 | \n", "0.0 | \n", "0.000000 | \n", "
296 rows × 15 columns
\n", "\n", " | chapter_number | \n", "Topic 1 | \n", "Topic 2 | \n", "Topic 3 | \n", "Topic 4 | \n", "Topic 5 | \n", "Topic 6 | \n", "Topic 7 | \n", "Topic 8 | \n", "Topic 9 | \n", "Topic 10 | \n", "Topic 11 | \n", "Topic 12 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0.140357 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.138453 | \n", "0.000000 | \n", "0.008540 | \n", "0.000000 | \n", "0.0 | \n", "0.000000 | \n", "
1 | \n", "2 | \n", "0.109082 | \n", "0.013866 | \n", "0.000000 | \n", "0.004476 | \n", "0.000000 | \n", "0.000000 | \n", "0.116163 | \n", "0.000000 | \n", "0.007720 | \n", "0.000000 | \n", "0.0 | \n", "0.006961 | \n", "
2 | \n", "3 | \n", "0.016352 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.009661 | \n", "0.000000 | \n", "0.327493 | \n", "0.000000 | \n", "0.016435 | \n", "0.000000 | \n", "0.0 | \n", "0.027077 | \n", "
3 | \n", "4 | \n", "0.059518 | \n", "0.143894 | \n", "0.000000 | \n", "0.004308 | \n", "0.000000 | \n", "0.000000 | \n", "0.038118 | \n", "0.000000 | \n", "0.000000 | \n", "0.011437 | \n", "0.0 | \n", "0.000000 | \n", "
4 | \n", "5 | \n", "0.063011 | \n", "0.020749 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.096147 | \n", "0.025652 | \n", "0.000000 | \n", "0.078776 | \n", "0.0 | \n", "0.007686 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
291 | \n", "292 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.005997 | \n", "0.002712 | \n", "0.040459 | \n", "0.000000 | \n", "0.299737 | \n", "0.005080 | \n", "0.000000 | \n", "0.0 | \n", "0.008144 | \n", "
292 | \n", "293 | \n", "0.000000 | \n", "0.000000 | \n", "0.002737 | \n", "0.002752 | \n", "0.000000 | \n", "0.001784 | \n", "0.000000 | \n", "0.432059 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.002417 | \n", "
293 | \n", "294 | \n", "0.015235 | \n", "0.000000 | \n", "0.000000 | \n", "0.013029 | \n", "0.007963 | \n", "0.033410 | \n", "0.000000 | \n", "0.296186 | \n", "0.000000 | \n", "0.000000 | \n", "0.0 | \n", "0.004897 | \n", "
294 | \n", "295 | \n", "0.000000 | \n", "0.047675 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.044810 | \n", "0.085804 | \n", "0.106584 | \n", "0.000000 | \n", "0.055540 | \n", "0.0 | \n", "0.000000 | \n", "
295 | \n", "296 | \n", "0.000000 | \n", "0.001118 | \n", "0.006289 | \n", "0.000000 | \n", "0.014871 | \n", "0.000000 | \n", "0.090771 | \n", "0.171290 | \n", "0.000000 | \n", "0.036981 | \n", "0.0 | \n", "0.000000 | \n", "
296 rows × 13 columns
\n", "\n", " | chapter_number | \n", "Topic 1 | \n", "Topic 2 | \n", "Topic 3 | \n", "Topic 4 | \n", "Topic 5 | \n", "Topic 6 | \n", "Topic 7 | \n", "Topic 8 | \n", "Topic 9 | \n", "... | \n", "Topic 3 Normalized | \n", "Topic 4 Normalized | \n", "Topic 5 Normalized | \n", "Topic 6 Normalized | \n", "Topic 7 Normalized | \n", "Topic 8 Normalized | \n", "Topic 9 Normalized | \n", "Topic 10 Normalized | \n", "Topic 11 Normalized | \n", "Topic 12 Normalized | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0.140357 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.138453 | \n", "0.000000 | \n", "0.008540 | \n", "... | \n", "0.000000 | \n", "0.001344 | \n", "0.000564 | \n", "0.000000 | \n", "0.142437 | \n", "0.000003 | \n", "8.718205e-03 | \n", "0.000063 | \n", "0.0 | \n", "0.003643 | \n", "
1 | \n", "2 | \n", "0.109082 | \n", "0.013866 | \n", "0.000000 | \n", "0.004476 | \n", "0.000000 | \n", "0.000000 | \n", "0.116163 | \n", "0.000000 | \n", "0.007720 | \n", "... | \n", "0.000000 | \n", "0.002038 | \n", "0.002339 | \n", "0.000000 | \n", "0.169615 | \n", "0.000118 | \n", "9.620701e-03 | \n", "0.000977 | \n", "0.0 | \n", "0.009399 | \n", "
2 | \n", "3 | \n", "0.016352 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.009661 | \n", "0.000000 | \n", "0.327493 | \n", "0.000000 | \n", "0.016435 | \n", "... | \n", "0.000000 | \n", "0.002129 | \n", "0.003854 | \n", "0.000000 | \n", "0.181594 | \n", "0.001526 | \n", "8.937603e-03 | \n", "0.007353 | \n", "0.0 | \n", "0.012958 | \n", "
3 | \n", "4 | \n", "0.059518 | \n", "0.143894 | \n", "0.000000 | \n", "0.004308 | \n", "0.000000 | \n", "0.000000 | \n", "0.038118 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.001997 | \n", "0.002338 | \n", "0.000000 | \n", "0.128979 | \n", "0.007939 | \n", "4.593752e-03 | \n", "0.027757 | \n", "0.0 | \n", "0.009463 | \n", "
4 | \n", "5 | \n", "0.063011 | \n", "0.020749 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.096147 | \n", "0.025652 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.001226 | \n", "0.000522 | \n", "0.000000 | \n", "0.091597 | \n", "0.018277 | \n", "1.645441e-03 | \n", "0.053861 | \n", "0.0 | \n", "0.007585 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
291 | \n", "292 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.005997 | \n", "0.002712 | \n", "0.040459 | \n", "0.000000 | \n", "0.299737 | \n", "0.005080 | \n", "... | \n", "0.000663 | \n", "0.003762 | \n", "0.001514 | \n", "0.020996 | \n", "0.001240 | \n", "0.340196 | \n", "2.026700e-03 | \n", "0.000701 | \n", "0.0 | \n", "0.004099 | \n", "
292 | \n", "293 | \n", "0.000000 | \n", "0.000000 | \n", "0.002737 | \n", "0.002752 | \n", "0.000000 | \n", "0.001784 | \n", "0.000000 | \n", "0.432059 | \n", "0.000000 | \n", "... | \n", "0.001120 | \n", "0.005702 | \n", "0.002651 | \n", "0.021545 | \n", "0.005116 | \n", "0.343234 | \n", "1.229256e-03 | \n", "0.003204 | \n", "0.0 | \n", "0.004120 | \n", "
293 | \n", "294 | \n", "0.015235 | \n", "0.000000 | \n", "0.000000 | \n", "0.013029 | \n", "0.007963 | \n", "0.033410 | \n", "0.000000 | \n", "0.296186 | \n", "0.000000 | \n", "... | \n", "0.001030 | \n", "0.006188 | \n", "0.004192 | \n", "0.026838 | \n", "0.026079 | \n", "0.276297 | \n", "2.742841e-04 | \n", "0.015608 | \n", "0.0 | \n", "0.002978 | \n", "
294 | \n", "295 | \n", "0.000000 | \n", "0.047675 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.044810 | \n", "0.085804 | \n", "0.106584 | \n", "0.000000 | \n", "... | \n", "0.002009 | \n", "0.003330 | \n", "0.006341 | \n", "0.026441 | \n", "0.061476 | \n", "0.190100 | \n", "2.251461e-05 | \n", "0.033348 | \n", "0.0 | \n", "0.001352 | \n", "
295 | \n", "296 | \n", "0.000000 | \n", "0.001118 | \n", "0.006289 | \n", "0.000000 | \n", "0.014871 | \n", "0.000000 | \n", "0.090771 | \n", "0.171290 | \n", "0.000000 | \n", "... | \n", "0.004043 | \n", "0.000775 | \n", "0.009997 | \n", "0.015227 | \n", "0.083571 | \n", "0.160644 | \n", "6.798822e-07 | \n", "0.040140 | \n", "0.0 | \n", "0.000298 | \n", "
296 rows × 25 columns
\n", "