Economic texts have long been a valuable source of information for understanding the theories and ideas that have shaped our economic systems. In this project, I will analyze five influential economic texts written by some of the most prominent economists of all time: John Maynard Keynes, Adam Smith, and David Ricardo. Specifically, I will explore correlations and word frequencies, use TF-IDF analysis, and generate N-grams to identify patterns and insights within these texts.
Based on the analysis of these texts, I believe that the works of Keynes, Smith, and Ricardo will reveal significant differences in their economic theories and ideologies. I expect that their writings will contain distinct vocabulary and themes, which I will identify through the previous mention text mining techniques. Additionally, I anticipate that these analysis will uncover correlations between specific words and ideas that are unique to each author’s work. Through this project, I hope to deepen our understanding of the contributions these economists made to the field of economics.
Load libraries that were used for this text analysis
library(tidyverse)
library(tidytext)
library(gutenbergr)
library(ggplot2)
library(scales)
library(igraph)
library(ggraph)
library(widyr)
# Set random seed for reproducibility
set.seed(12938)
Load books from Gutenberg Project with their respective IDs
keynes <- gutenberg_download(c(65278, 15776))
smith <- gutenberg_download(3300)
ricardo <- gutenberg_download(33310)
Preparation for analysis: tokenize books by word, filter stopwords and count word frequencies. Filtering of the stop words comes from list in tidytext stop_words using 3 different lexicons (SMART; snowball and onix)
Add some stopwords that appear in the texts:
stopwords2 <- tibble(word = c("d", "s", "th", "a", "1","st","I","|","l"))
Do the same process for each author
tidy_keynes <- keynes %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
anti_join(stopwords2)%>%
count(word, sort = TRUE)
tidy_keynes
## # A tibble: 8,840 × 2
## word n
## <chr> <int>
## 1 germany 450
## 2 war 329
## 3 money 295
## 4 gold 281
## 5 german 260
## 6 exchange 242
## 7 currency 203
## 8 rate 199
## 9 cent 177
## 10 economic 176
## # … with 8,830 more rows
For Keynes, the top 10 most frequent words are “Germany”, “war”, “money”, “gold”, “German”, “exchange”, “currency”, “rate”, “cent”, and “economic”. This suggests that Keynes texts talk about Germany and war in the context of economics, particularly in relation to money, gold, exchange rates, and currencies
Plot
tidy_keynes %>%
filter(n > 150) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word)) +
geom_col() +
labs(y = NULL)
tidy_smith <- smith %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
anti_join(stopwords2) %>%
count(word, sort = TRUE)
tidy_smith
## # A tibble: 9,712 × 2
## word n
## <chr> <int>
## 1 price 1264
## 2 country 1240
## 3 labour 1011
## 4 trade 970
## 5 produce 944
## 6 quantity 797
## 7 people 777
## 8 money 770
## 9 land 720
## 10 revenue 691
## # … with 9,702 more rows
For Adam Smith, the top 10 most frequent words are “price”, “country”, “labour”, “trade”, “produce”, “quantity”, “people”, “money”, “land”, and “revenue”. This suggests that Smith’s writing focuses on the economics of trade, labor, and production, with a particular emphasis on prices, quantity, and revenue. Additionally, his writing also seems to touch upon the relationship between people and the economy, as well as the role of money and land in economic systems.
Plot
tidy_smith %>%
filter(n > 600) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word)) +
geom_col() +
labs(y = NULL)
tidy_ricardo <- ricardo %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
anti_join(stopwords2) %>%
count(word, sort = TRUE)
tidy_ricardo
## # A tibble: 4,753 × 2
## word n
## <chr> <int>
## 1 price 1032
## 2 labour 629
## 3 produce 595
## 4 capital 593
## 5 corn 565
## 6 rent 545
## 7 quantity 527
## 8 commodities 514
## 9 money 507
## 10 profits 502
## # … with 4,743 more rows
For David Ricardo, the top 10 most frequent words are “price”, “labour”, “produce”, “capital”, “corn”, “rent”, “quantity”, “commodities”, “money”, and “profits”. This suggests that Ricardo’s writing focuses on the economics of production, particularly in relation to labor, capital, and commodities such as corn. Additionally, his writing also seems to touch upon the role of prices, quantity, rent, money, and profits in economic systems.
Plot
tidy_ricardo %>%
filter(n > 350) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(n, word)) +
geom_col() +
labs(y = NULL)
I will now use the Pearson Correlation Coefficient to measure correlations.
John Maynard Keynes and Adam Smith
cor.test(data = frequency[frequency$author == "Adam Smith",],
~ proportion + `John Maynard Keynes`)
##
## Pearson's product-moment correlation
##
## data: proportion and John Maynard Keynes
## t = 7.6852, df = 3799, p-value = 1.932e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.0922994 0.1549119
## sample estimates:
## cor
## 0.1237288
John Maynard Keynes and David Ricardo
cor.test(data = frequency[frequency$author == "David Ricardo",],
~ proportion + `John Maynard Keynes`)
##
## Pearson's product-moment correlation
##
## data: proportion and John Maynard Keynes
## t = 9.1589, df = 2446, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1435114 0.2201221
## sample estimates:
## cor
## 0.1820931
These correlation coefficients suggest a weak positive correlation between Keynes and Ricardo, and an even weaker positive correlation between Keynes and Smith. However these authors had different perspectives in economics and the texts used for the analysis also affect the similarity between them.
In the next Section I conducted a TF-IDF analysis to look at words that make the texts distinctive.
I joined the authors dataframes into one global dataframe with all the books and adding a new column to differentiate between authors.
tidy_ricardo = tidy_ricardo %>%
mutate(
author = "David Ricardo"
)
tidy_smith = tidy_smith %>%
mutate(
author = "Adam Smith"
)
tidy_keynes = tidy_keynes %>%
mutate(
author = "John Maynard Keynes"
)
economists = tidy_keynes %>%
rbind(tidy_ricardo, tidy_smith)
I created a column with the total of words by each author, then added a term_frequency column with the word/total and finally using the function bind_tf_idf from the tidytext package the TF-IDF for each author.
total_economists <- economists %>%
group_by(author) %>%
summarize(total = sum(n))
economists_words <- economists %>%
left_join(total_economists) %>%
mutate(term_frequency = n/total)
economists_tf_idf <- economists_words %>%
bind_tf_idf(word, author, n)%>%
select(-total) %>%
arrange(desc(tf_idf))
economists_tf_idf
## # A tibble: 23,305 × 7
## word n author term_frequency tf idf tf_idf
## <chr> <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 corn 565 David Ricardo 0.0130 0.0130 0.405 0.00528
## 2 economic 176 John Maynard Keynes 0.00349 0.00349 1.10 0.00384
## 3 1919 97 John Maynard Keynes 0.00193 0.00193 1.10 0.00212
## 4 german 260 John Maynard Keynes 0.00516 0.00516 0.405 0.00209
## 5 germany's 95 John Maynard Keynes 0.00189 0.00189 1.10 0.00207
## 6 4_l 73 David Ricardo 0.00168 0.00168 1.10 0.00185
## 7 100_l 69 David Ricardo 0.00159 0.00159 1.10 0.00175
## 8 1922 72 John Maynard Keynes 0.00143 0.00143 1.10 0.00157
## 9 inflation 66 John Maynard Keynes 0.00131 0.00131 1.10 0.00144
## 10 1923 64 John Maynard Keynes 0.00127 0.00127 1.10 0.00140
## # … with 23,295 more rows
economists_tf_idf %>%
group_by(author) %>%
arrange(desc(tf_idf)) %>%
slice_head(n = 10) %>%
ggplot(aes(reorder(word, tf_idf), tf_idf, fill = author)) +
geom_col(show.legend = FALSE) +
labs(x = NULL, y = "TF-IDF") +
coord_flip() +
facet_wrap(~author, ncol = 2, scales = "free")
There are a lot of unconvenient words which make the analysis a little bit clunky so in the next part going to fiter them and make the analysis again without these words.
economists_stopwords <- tibble(word = c("d", "s", "th", "a", "1","st","I","8vo", "8_s","4_I" ,"1000_I","100_I","1923","1922","1921","1919","1913","1920","4_l","100_l","1000_l","10_s","qrs","720_l","3_l","50_l","10,000_l","6_d","2000_l","1914","1918","pre","_r_","_k_","_n_","_k","2","0","8","4","6", "5", "4", "vol", "180", "170", "ii", "10"))
total_economists <- economists %>%
group_by(author) %>%
summarize(total = sum(n))
economists_words <- economists %>%
left_join(total_economists) %>%
mutate(term_frequency = n/total)
economists_tf_idf <- economists_words %>%
bind_tf_idf(word, author, n)%>%
select(-total) %>%
anti_join(economists_stopwords, by = "word") %>%
arrange(desc(tf_idf))
economists_tf_idf %>%
anti_join(economists_stopwords, by = "word") %>%
group_by(author) %>%
arrange(desc(tf_idf)) %>%
slice_head(n = 15) %>%
ggplot(aes(reorder(word, tf_idf), tf_idf, fill = author)) +
geom_col(show.legend = FALSE) +
labs(x = NULL, y = "TF-IDF") +
coord_flip() +
facet_wrap(~author, ncol = 2, scales = "free")
Based on the TF-IDF analysis we can make some general observations about the topics and themes present in the texts by Smith, Keynes, and Ricardo.
For Adam Smith, the top keywords are related to land, agriculture, and government policies related to production and trade. This is in line with Smith’s focus on the importance of free markets and competition in driving economic growth.
For Keynes, the top keywords are related to the economic and political issues surrounding World War I and its aftermath, particularly in relation to Germany, inflation, and international relations. This reflects Keynes’ view that government intervention in the economy can be necessary to address economic crises.
For Ricardo, the top keywords are related to land, agriculture, and the relationships between landlords, laborers, and consumers. This is consistent with Ricardo’s focus on the role of rent, wages, and profits in economic systems.
Now, in this next section an N-grams analysis is done. First 2 functions are created in order to make a bigrams (2 words) analysis.
The count_bigrams function:
The visualize_bigrams function:
Takes the output from the count_bigrams function as input and creates a graph to visualize the relationships between the bigrams.
count_bigrams <- function(dataset) {
dataset %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(word1, word2, sort = TRUE)
}
visualize_bigrams <- function(bigrams) {
set.seed(2016)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
bigrams %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
}
Keynes bigrams
keynes_bigrams <- keynes %>%
count_bigrams() %>%
drop_na()
keynes_bigrams
## # A tibble: 10,970 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 purchasing power 91
## 2 pre war 82
## 3 reparation commission 46
## 4 price level 44
## 5 power parity 41
## 6 bank rate 33
## 7 note issue 29
## 8 gold standard 28
## 9 german government 27
## 10 united kingdom 25
## # … with 10,960 more rows
Based on these results, we can see that Keynes’ writings contain a number of bigrams related to economic concepts and policy, including “purchasing power,” “reparation commission,” “price level,” and “power parity.” The presence of bigrams related to pre-war and post-war periods, such as “pre-war” and “reparation commission,” suggest that Keynes was writing in the context of a changing economic and political landscape. Additionally, the presence of bigrams related to gold standard and bank rate suggest that Keynes was concerned with monetary policy and exchange rates.
Smith bigrams
smith_bigrams <- smith %>%
count_bigrams() %>%
drop_na()
smith_bigrams
## # A tibble: 14,144 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 annual produce 149
## 2 foreign trade 102
## 3 money price 89
## 4 home market 85
## 5 rude produce 75
## 6 0 0 65
## 7 productive labour 60
## 8 surplus produce 60
## 9 thousand pounds 60
## 10 east indies 56
## # … with 14,134 more rows
These results indicate that Adam Smith’s writings frequently reference economic concepts related to trade, production, and labor. The most frequent bigram in Smith’s writings is “annual produce,” which may refer to the total output of goods and services in an economy. Other frequent bigrams, such as “foreign trade” and “home market,” suggest that Smith was concerned with international trade and the domestic market. Bigrams related to money and prices, such as “money price” and “thousand pounds,” suggest that Smith was also interested in monetary policy and exchange rates. The mention of the “East Indies” may also indicate a focus on trade with Asia.
Ricardo Bigrams
ricardo_bigrams <- ricardo %>%
count_bigrams() %>%
drop_na()
ricardo_bigrams
## # A tibble: 5,584 × 3
## word1 word2 n
## <chr> <chr> <int>
## 1 raw produce 152
## 2 natural price 98
## 3 adam smith 97
## 4 market price 52
## 5 money price 37
## 6 precious metals 36
## 7 foreign trade 33
## 8 capital employed 31
## 9 fixed capital 27
## 10 dr smith 26
## # … with 5,574 more rows
These results suggest that David Ricardo’s writings frequently reference economic concepts related to production, prices, and trade. The most frequent bigram in Ricardo’s writings is “raw produce,” which may refer to the output of natural resources and agricultural goods.The presence of bigrams related to prices, such as “natural price” and “market price,” suggest that Ricardo was interested in theories of value and pricing. The mention of Adam Smith, one of Ricardo’s predecessors and influences, also suggests that Ricardo’s writings are in dialogue with the economic ideas of his time.Finally, the mention of “precious metals” and “foreign trade” suggest that Ricardo was interested in international trade and monetary policy. The presence of bigrams related to capital, such as “capital employed” and “fixed capital,” suggest that Ricardo also considered issues related to investment and capital accumulation.
This plot comes from the visualize_bigrams function defined before. It shows the bigrams as nodes in the graph, with lines (edges) connecting related bigrams. The thickness and color of the lines indicate the frequency of the bigram pairs.
Keynes
keynes_bigrams %>%
filter(n > 10,
!str_detect(word1, "\\d"),
!str_detect(word2, "\\d")) %>%
visualize_bigrams()
We can observe more clearly all the economic concepts and institutions how they are related: For example: federal reserve board or purchasing power parity. There is a clear appearance of politic institutions in Germany and the USA with a mix of economic terms and theories.
Smith
smith_bigrams %>%
filter(n > 30,
!str_detect(word1, "\\d"),
!str_detect(word2, "\\d")) %>%
visualize_bigrams()
Here we can see how the ideas are more in groups like for example the term trade appears many times with carrying, colonyand foreign. Also money and produce can be interpreted as central nodes which goes in line to what I have been saying about Adam Smith ideas of international trade, money and production.
Ricardo
ricardo_bigrams %>%
filter(n > 10,
!str_detect(word1, "\\d"),
!str_detect(word2, "\\d")) %>%
visualize_bigrams()
For Ricardo, it can be seen that price is the most related word which goes in line with the idea I have been saying about his theories of value and pricing. Also it can be observed the mention of Adam Smith inside his texts and finally the idea of value can be interpreted with the capital and produce words.
Now, I created a data frame with a new column that shows in which line each word appeared in the tex. This will be done for 2 reasons: First to see how many times a word appear together with another word and then to take the correlation (taking into account when they appear and not appear together).
Keynes
keynes_section_words <- keynes %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word) %>%
filter(!word %in% stopwords2$word) %>%
anti_join(economists_stopwords)
keynes_section_words
## # A tibble: 49,190 × 3
## gutenberg_id section word
## <int> <dbl> <chr>
## 1 15776 1 preface
## 2 15776 2 writer
## 3 15776 2 book
## 4 15776 2 temporarily
## 5 15776 2 attached
## 6 15776 2 british
## 7 15776 2 treasury
## 8 15776 2 war
## 9 15776 2 official
## 10 15776 2 representative
## # … with 49,180 more rows
Using the function pairwise count to count the # of times a word and another word appear together.
Keynes Count
keynes_word_pairs <- keynes_section_words %>%
pairwise_count(word, section, sort = TRUE)
keynes_word_pairs
## # A tibble: 1,184,544 × 3
## item1 item2 n
## <chr> <chr> <dbl>
## 1 germany war 89
## 2 war germany 89
## 3 germany german 82
## 4 german germany 82
## 5 allies germany 75
## 6 germany allies 75
## 7 purchasing power 74
## 8 power purchasing 74
## 9 germany treaty 66
## 10 treaty germany 66
## # … with 1,184,534 more rows
It seems like the analysis on Keynes’ text is focused on the aftermath of World War I and the Treaty of Versailles, with words like “Germany,” “war,” “allies,” “reparation,” “treaty,” and “France” appearing frequently. There are also mentions of economic concepts like “purchasing power” and “currency exchange.”
Smith
smith_section_words <- smith %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word) %>%
filter(!word %in% stopwords2$word) %>%
anti_join(economists_stopwords)
smith_section_words
## # A tibble: 129,159 × 3
## gutenberg_id section word
## <int> <dbl> <chr>
## 1 3300 1 contents
## 2 3300 1 introduction
## 3 3300 1 plan
## 4 3300 1 book
## 5 3300 1 improvement
## 6 3300 1 productive
## 7 3300 1 powers
## 8 3300 1 labour
## 9 3300 1 produce
## 10 3300 1 naturally
## # … with 129,149 more rows
Smith Count
smith_word_pairs <- smith_section_words %>%
pairwise_count(word, section, sort = TRUE)
smith_word_pairs
## # A tibble: 1,982,406 × 3
## item1 item2 n
## <chr> <chr> <dbl>
## 1 silver gold 245
## 2 gold silver 245
## 3 land produce 230
## 4 produce land 230
## 5 country produce 218
## 6 produce country 218
## 7 produce labour 216
## 8 labour produce 216
## 9 price market 200
## 10 market price 200
## # … with 1,982,396 more rows
For Smith, we see a focus on the importance of land and the production of goods in a country, as well as the relationship between market price and the quantity of labor and trade. This is consistent with Smith’s emphasis on the role of markets in driving economic growth and the importance of factors of production like land and labor.
Ricardo
ricardo_section_words <- ricardo %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word) %>%
filter(!word %in% stopwords2$word) %>%
anti_join(economists_stopwords)
ricardo_section_words
## # A tibble: 42,502 × 3
## gutenberg_id section word
## <int> <dbl> <chr>
## 1 33310 1 principles
## 2 33310 1 political
## 3 33310 1 economy
## 4 33310 2 taxation
## 5 33310 2 david
## 6 33310 2 ricardo
## 7 33310 2 esq
## 8 33310 2 london
## 9 33310 2 john
## 10 33310 2 murray
## # … with 42,492 more rows
Ricardo Count
ricardo_word_pairs <- ricardo_section_words %>%
pairwise_count(word, section, sort = TRUE)
ricardo_word_pairs
## # A tibble: 590,612 × 3
## item1 item2 n
## <chr> <chr> <dbl>
## 1 price corn 204
## 2 corn price 204
## 3 price produce 186
## 4 produce price 186
## 5 rise price 184
## 6 price rise 184
## 7 price commodities 181
## 8 commodities price 181
## 9 price labour 173
## 10 labour price 173
## # … with 590,602 more rows
In Ricardo’s analysis, we see a focus on the relationship between the prices of precious metals and the market price of goods, as well as the importance of fixed capital and the employment of capital in the production process. This is consistent with Ricardo’s emphasis on the importance of international trade and the role of capital in driving economic growth.
In general, the n-gram analysis allows us to identify specific words and phrases that are frequently used by each economist, giving us a better understanding of their areas of focus and their key ideas.
A better measure taking into account the number of times the words appear is correlation which indicates the frequency of words appearing together in comparison to their frequency of appearing separately.
To evaluate this correlation, I used the Phi coefficient, which is comparable to the Pearson Correlation. The Phi coefficient determines the likelihood of two words appearing together in a corpus by considering the individual probabilities of each word appearing alone.
Keynes
keynes_word_cors <- keynes_section_words %>%
group_by(word) %>%
filter(n() >= 20) %>%
pairwise_cor(word, section, sort = TRUE)
keynes_word_cors
## # A tibble: 242,556 × 3
## item1 item2 correlation
## <chr> <chr> <dbl>
## 1 silesia upper 0.953
## 2 upper silesia 0.953
## 3 board federal 0.898
## 4 federal board 0.898
## 5 lorraine alsace 0.884
## 6 alsace lorraine 0.884
## 7 minister prime 0.796
## 8 prime minister 0.796
## 9 nineteenth century 0.771
## 10 century nineteenth 0.771
## # … with 242,546 more rows
The pair “upper silesia” has the highest correlation coefficient of 0.9527372, suggesting that these two terms appear together very frequently in the analyzed text. Other pairs with high correlation coefficients include “federal board” (0.8984791), “alsace lorraine” (0.8840521), and “prime minister” (0.7959321). These pairs of words might represent specific concepts or events that were frequently discussed in the text. It can also be observed some economic terms like “power purschasing” and geopolitical issues like “austria hungary”.
Smith
smith_word_cors <- smith_section_words %>%
group_by(word) %>%
filter(n() >= 20) %>%
pairwise_cor(word, section, sort = TRUE)
smith_word_cors
## # A tibble: 1,626,900 × 3
## item1 item2 correlation
## <chr> <chr> <dbl>
## 1 butcher’s meat 0.920
## 2 meat butcher’s 0.920
## 3 forts garrisons 0.832
## 4 garrisons forts 0.832
## 5 answering demands 0.722
## 6 demands answering 0.722
## 7 silver gold 0.700
## 8 gold silver 0.700
## 9 barrel herrings 0.686
## 10 herrings barrel 0.686
## # … with 1,626,890 more rows
For Smith, the pairs with the highest correlation seem to be related to specific goods, such as “butcher’s meat”, “forts garrisons”, and “barrel herrings”. Other pairs include “creditor debtor” and “receipts receipt”, which are related to financial concepts.
Ricardo
ricardo_word_cors <- ricardo_section_words %>%
group_by(word) %>%
filter(n() >= 20) %>%
pairwise_cor(word, section, sort = TRUE)
ricardo_word_cors
## # A tibble: 142,506 × 3
## item1 item2 correlation
## <chr> <chr> <dbl>
## 1 adam smith 0.820
## 2 smith adam 0.820
## 3 precious metals 0.768
## 4 metals precious 0.768
## 5 fish game 0.689
## 6 game fish 0.689
## 7 net gross 0.567
## 8 gross net 0.567
## 9 fixed circulating 0.548
## 10 circulating fixed 0.548
## # … with 142,496 more rows
The correlations found in the analysis of Ricardo’s texts suggest that he frequently discussed topics related to economics and commerce, as well as the work of Adam Smith. Additionally, Ricardo seems to have shown interest in precious metals, fish and game, as well as concepts such as net and gross, fixed and circulating capital, and maintenance funds. Finally, it appears that Ricardo also had a particular focus on Portugal, discussing topics such as wine and cloth in relation to the country
To better show our analysis, I filtered by relevant words and plot it
Keynes
keynes_word_cors %>%
filter(item1 %in% c("federal", "power", "nations", "materials")) %>%
group_by(item1) %>%
slice_max(correlation, n = 6) %>%
ungroup() %>%
mutate(item2 = reorder(item2, correlation)) %>%
ggplot(aes(item2, correlation, fill=item1)) +
geom_bar(stat = "identity") +
facet_wrap(~ item1, scales = "free") +
coord_flip()
In this graph, the correlation with specific words can be observed. In the materials section probably it has something to do with examples Keynes gives in his book and uses those materials. Nations and federal are both words which relate to other politic topics and power with more general economic terms.
Smith
smith_word_cors %>%
filter(item1 %in% c("wheat", "creditor", "coinage", "hand")) %>%
group_by(item1) %>%
slice_max(correlation, n = 6) %>%
ungroup() %>%
mutate(item2 = reorder(item2, correlation)) %>%
ggplot(aes(item2, correlation, fill=item1)) +
geom_bar(stat = "identity") +
facet_wrap(~ item1, scales = "free") +
coord_flip()
In this graph, there are two type of words that relate to economic terms: coinage and creditor while hand relates more with characteristics of the labor force and human work. Finally, wheat probably appears a lot as an example of a product and it is related with money, prices and quantities.
Ricardo
ricardo_word_cors %>%
filter(item1 %in% c("adam", "land", "foreign", "bank")) %>%
group_by(item1) %>%
slice_max(correlation, n = 6) %>%
ungroup() %>%
mutate(item2 = reorder(item2, correlation)) %>%
ggplot(aes(item2, correlation, fill=item1)) +
geom_bar(stat = "identity") +
facet_wrap(~ item1, scales = "free") +
coord_flip()
In this final graph I wanted to include Adam so the correlation between words associated with Adam Smith can be observed. Land is also an important word in the literature of Ricardo and can see how it is correlated with agricultural prodcution. Foreign includes a mix of economic and political terms. Finally, bank includes types of currencies and values.
For this last part, in the same way that I employed ggraph to represent bigrams visually, I used it to portray the connections and groups of words.
Keynes
keynes_word_cors %>%
filter(correlation > 0.40) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()
Highly correlated words can be observed in this graph and it can be seen a group of months, political terms, economic terms, currencies and values, and international issues.
Smith
smith_word_cors %>%
filter(correlation > 0.45) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()
For Smith, a group of words related to fish can be observed, financial terms, raw materials like precious metals, territories like India and Asia and other economic terms like demands, seignorage, selling, buying, stationary, etc.
Ricardo
ricardo_word_cors %>%
filter(correlation > 0.30) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()
For Ricardo, a lower threshold for correlation was used (0.30) which can be due to the more variety of words and concepts that he uses in his text. It can be seen that a big group of words around coin is created with connected words related to currencies and precious metals. Also a section of labor economics is seen with words like wages, laborers and rise. Places can also be seen, raw materials and finally there is an Adam Smith group.
For all three economists, their most frequently occurring words are related to economic concepts such as “price”, “produce”, “trade”, “labour”, “currency”, “capital”, and “market”. In terms of pairs of words, the most frequent pairs for Keynes include “Germany war”, “purchasing power”, and “currency exchange”, while for Smith the most frequent pairs are “silver gold”, “land produce”, and “country produce”. For Ricardo, the most frequent pairs include “Adam Smith”, “precious metals”, and “fish game”.
Looking at correlations between pairs of words, Keynes has high correlations for pairs such as “silesia upper”, “purchasing power”, and “league nations”, while for Smith the highest correlations are for pairs such as “butcher’s meat”, “forts garrisons”, and “silver gold”. For Ricardo, high correlations were found for pairs such as “Adam Smith”, “precious metals”, and “fish game”. It’s interesting to note that while all three economists were writing about economics, their frequently occurring words, pairs of words, and correlations differ. This suggests that they had different areas of focus and interest within the field of economics.