Introduction

Economic texts have long been a valuable source of information for understanding the theories and ideas that have shaped our economic systems. In this project, I will analyze five influential economic texts written by some of the most prominent economists of all time: John Maynard Keynes, Adam Smith, and David Ricardo. Specifically, I will explore correlations and word frequencies, use TF-IDF analysis, and generate N-grams to identify patterns and insights within these texts.

Hypothesis

Based on the analysis of these texts, I believe that the works of Keynes, Smith, and Ricardo will reveal significant differences in their economic theories and ideologies. I expect that their writings will contain distinct vocabulary and themes, which I will identify through the previous mention text mining techniques. Additionally, I anticipate that these analysis will uncover correlations between specific words and ideas that are unique to each author’s work. Through this project, I hope to deepen our understanding of the contributions these economists made to the field of economics.

Books and Gutenberg Identifier

  • A Tract on Monetary Reform by John Maynard Keynes 65278
  • The Economic Consequences of the Peace by John Maynard Keynes 15776
  • An Inquiry into the Nature and Causes of the Wealth of Nations by Adam Smith 3300
  • On The Principles of Political Economy, and Taxation by David Ricardo 33310

Loading Libraries and Initial Steps

Load libraries that were used for this text analysis

library(tidyverse)
library(tidytext)
library(gutenbergr)
library(ggplot2)
library(scales)
library(igraph)
library(ggraph)
library(widyr)

# Set random seed for reproducibility
set.seed(12938)

Load books from Gutenberg Project with their respective IDs

keynes <- gutenberg_download(c(65278, 15776))
smith <- gutenberg_download(3300)
ricardo <- gutenberg_download(33310)

Word Frequencies

Preparation for analysis: tokenize books by word, filter stopwords and count word frequencies. Filtering of the stop words comes from list in tidytext stop_words using 3 different lexicons (SMART; snowball and onix)

Add some stopwords that appear in the texts:

stopwords2 <- tibble(word = c("d", "s", "th", "a", "1","st","I","|","l"))

Do the same process for each author

Keynes

tidy_keynes <- keynes %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(stopwords2)%>%
  count(word, sort = TRUE) 

tidy_keynes
## # A tibble: 8,840 × 2
##    word         n
##    <chr>    <int>
##  1 germany    450
##  2 war        329
##  3 money      295
##  4 gold       281
##  5 german     260
##  6 exchange   242
##  7 currency   203
##  8 rate       199
##  9 cent       177
## 10 economic   176
## # … with 8,830 more rows

For Keynes, the top 10 most frequent words are “Germany”, “war”, “money”, “gold”, “German”, “exchange”, “currency”, “rate”, “cent”, and “economic”. This suggests that Keynes texts talk about Germany and war in the context of economics, particularly in relation to money, gold, exchange rates, and currencies

Plot

tidy_keynes %>%
  filter(n > 150) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(y = NULL)

Smith

tidy_smith <- smith %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(stopwords2)  %>%
  count(word, sort = TRUE)

tidy_smith
## # A tibble: 9,712 × 2
##    word         n
##    <chr>    <int>
##  1 price     1264
##  2 country   1240
##  3 labour    1011
##  4 trade      970
##  5 produce    944
##  6 quantity   797
##  7 people     777
##  8 money      770
##  9 land       720
## 10 revenue    691
## # … with 9,702 more rows

For Adam Smith, the top 10 most frequent words are “price”, “country”, “labour”, “trade”, “produce”, “quantity”, “people”, “money”, “land”, and “revenue”. This suggests that Smith’s writing focuses on the economics of trade, labor, and production, with a particular emphasis on prices, quantity, and revenue. Additionally, his writing also seems to touch upon the relationship between people and the economy, as well as the role of money and land in economic systems.

Plot

tidy_smith %>%
  filter(n > 600) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(y = NULL)

Ricardo

tidy_ricardo <- ricardo %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(stopwords2)  %>%
  count(word, sort = TRUE)

tidy_ricardo
## # A tibble: 4,753 × 2
##    word            n
##    <chr>       <int>
##  1 price        1032
##  2 labour        629
##  3 produce       595
##  4 capital       593
##  5 corn          565
##  6 rent          545
##  7 quantity      527
##  8 commodities   514
##  9 money         507
## 10 profits       502
## # … with 4,743 more rows

For David Ricardo, the top 10 most frequent words are “price”, “labour”, “produce”, “capital”, “corn”, “rent”, “quantity”, “commodities”, “money”, and “profits”. This suggests that Ricardo’s writing focuses on the economics of production, particularly in relation to labor, capital, and commodities such as corn. Additionally, his writing also seems to touch upon the role of prices, quantity, rent, money, and profits in economic systems.

Plot

tidy_ricardo %>%
  filter(n > 350) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(y = NULL)

Crossing the authors

I will now compare Keynes, Smith, and Ricardo books by crossing their word frequencies, and put them in a single dataframe called frequency. I will use John Maynard Keynes as the reference author.

frequency <- bind_rows(mutate(tidy_ricardo, author = "David Ricardo"),
                       mutate(tidy_smith, author = "Adam Smith"), 
                       mutate(tidy_keynes, author = "John Maynard Keynes")) %>% 
  mutate(word = str_extract(word, "[a-z]+")) %>%
  count(author, word) %>%
  group_by(author) %>%
  mutate(proportion = n / sum(n)) %>% 
  select(-n) %>% 
  pivot_wider(names_from = author, values_from = proportion) %>%
  pivot_longer(`David Ricardo`:`Adam Smith`,
               names_to = "author", values_to = "proportion") %>%
  anti_join(stopwords2) %>%
  anti_join(stop_words) %>%
  drop_na()

frequency
## # A tibble: 6,249 × 4
##    word       `John Maynard Keynes` author        proportion
##    <chr>                      <dbl> <chr>              <dbl>
##  1 abandon                 0.000113 David Ricardo   0.000210
##  2 abandon                 0.000113 Adam Smith      0.000103
##  3 abandoned               0.000113 David Ricardo   0.000210
##  4 abandoned               0.000113 Adam Smith      0.000103
##  5 abandoning              0.000113 Adam Smith      0.000103
##  6 abated                  0.000113 Adam Smith      0.000103
##  7 abatement               0.000113 David Ricardo   0.000210
##  8 abatement               0.000113 Adam Smith      0.000103
##  9 ability                 0.000113 David Ricardo   0.000210
## 10 ability                 0.000113 Adam Smith      0.000103
## # … with 6,239 more rows

Can observe the proportion of times a word is used by each author and compare them between each other.

Plot to visualize it better.

ggplot(frequency, aes(x = proportion, y = `John Maynard Keynes`, 
                      color = abs(`John Maynard Keynes` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 0.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 0.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), 
                       low = "darkslategray4", high = "gray75") +
  facet_wrap(~author, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "John Maynard Keynes", x = NULL)

In the first graph (Smith with Keynes) it can be observed that words like gold, money, Germany, actual and day appear with a high frequency in both texts but appear more in Keynes literature than in Smith. Words like month, abandon and body appear relatively similar in both texts.

In the second graph (Ricardo with Keynes) it can also be seen that high frequency words like country, gold, absolute and capital have a similar frequency in both texts. Again words like Germany, government and actual tend to appear more in Keynes literature, whereas words like labor and abundant appear more in Ricardo’s literature.

Correlations

I will now use the Pearson Correlation Coefficient to measure correlations.

John Maynard Keynes and Adam Smith

cor.test(data = frequency[frequency$author == "Adam Smith",],
         ~ proportion + `John Maynard Keynes`)
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and John Maynard Keynes
## t = 7.6852, df = 3799, p-value = 1.932e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.0922994 0.1549119
## sample estimates:
##       cor 
## 0.1237288

John Maynard Keynes and David Ricardo

cor.test(data = frequency[frequency$author == "David Ricardo",],
         ~ proportion + `John Maynard Keynes`)
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and John Maynard Keynes
## t = 9.1589, df = 2446, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1435114 0.2201221
## sample estimates:
##       cor 
## 0.1820931

These correlation coefficients suggest a weak positive correlation between Keynes and Ricardo, and an even weaker positive correlation between Keynes and Smith. However these authors had different perspectives in economics and the texts used for the analysis also affect the similarity between them.

TF-IDF

In the next Section I conducted a TF-IDF analysis to look at words that make the texts distinctive.

Joinning Dataframes

I joined the authors dataframes into one global dataframe with all the books and adding a new column to differentiate between authors.

tidy_ricardo = tidy_ricardo %>%
  mutate(
    author = "David Ricardo"
  )

tidy_smith = tidy_smith %>%
  mutate(
    author = "Adam Smith"
  )

tidy_keynes = tidy_keynes %>%
  mutate(
    author = "John Maynard Keynes"
  ) 

economists = tidy_keynes %>%
  rbind(tidy_ricardo, tidy_smith) 

TF-IDF Computation

I created a column with the total of words by each author, then added a term_frequency column with the word/total and finally using the function bind_tf_idf from the tidytext package the TF-IDF for each author.

total_economists <- economists %>% 
  group_by(author) %>% 
  summarize(total = sum(n))

economists_words <- economists %>%
  left_join(total_economists) %>%
    mutate(term_frequency = n/total)

economists_tf_idf <- economists_words %>%
  bind_tf_idf(word, author, n)%>%
  select(-total) %>%
  arrange(desc(tf_idf))

economists_tf_idf
## # A tibble: 23,305 × 7
##    word          n author              term_frequency      tf   idf  tf_idf
##    <chr>     <int> <chr>                        <dbl>   <dbl> <dbl>   <dbl>
##  1 corn        565 David Ricardo              0.0130  0.0130  0.405 0.00528
##  2 economic    176 John Maynard Keynes        0.00349 0.00349 1.10  0.00384
##  3 1919         97 John Maynard Keynes        0.00193 0.00193 1.10  0.00212
##  4 german      260 John Maynard Keynes        0.00516 0.00516 0.405 0.00209
##  5 germany's    95 John Maynard Keynes        0.00189 0.00189 1.10  0.00207
##  6 4_l          73 David Ricardo              0.00168 0.00168 1.10  0.00185
##  7 100_l        69 David Ricardo              0.00159 0.00159 1.10  0.00175
##  8 1922         72 John Maynard Keynes        0.00143 0.00143 1.10  0.00157
##  9 inflation    66 John Maynard Keynes        0.00131 0.00131 1.10  0.00144
## 10 1923         64 John Maynard Keynes        0.00127 0.00127 1.10  0.00140
## # … with 23,295 more rows

Plot TF-IDF

economists_tf_idf %>%
  group_by(author) %>%
  arrange(desc(tf_idf)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(reorder(word, tf_idf), tf_idf, fill = author)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "TF-IDF") +
  coord_flip() +
  facet_wrap(~author, ncol = 2, scales = "free")

There are a lot of unconvenient words which make the analysis a little bit clunky so in the next part going to fiter them and make the analysis again without these words.

Filtering of unconvenient word

economists_stopwords <- tibble(word = c("d", "s", "th", "a", "1","st","I","8vo", "8_s","4_I" ,"1000_I","100_I","1923","1922","1921","1919","1913","1920","4_l","100_l","1000_l","10_s","qrs","720_l","3_l","50_l","10,000_l","6_d","2000_l","1914","1918","pre","_r_","_k_","_n_","_k","2","0","8","4","6", "5", "4", "vol", "180", "170", "ii", "10"))


total_economists <- economists %>% 
  group_by(author) %>% 
  summarize(total = sum(n))

economists_words <- economists %>%
  left_join(total_economists) %>%
    mutate(term_frequency = n/total)

economists_tf_idf <- economists_words %>%
  bind_tf_idf(word, author, n)%>%
  select(-total) %>%
  anti_join(economists_stopwords, by = "word") %>%
  arrange(desc(tf_idf))

Plot Without Stopwords

economists_tf_idf %>%
  anti_join(economists_stopwords, by = "word") %>%
  group_by(author) %>%
  arrange(desc(tf_idf)) %>%
  slice_head(n = 15) %>%
  ggplot(aes(reorder(word, tf_idf), tf_idf, fill = author)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "TF-IDF") +
  coord_flip() +
  facet_wrap(~author, ncol = 2, scales = "free")

Based on the TF-IDF analysis we can make some general observations about the topics and themes present in the texts by Smith, Keynes, and Ricardo.

For Adam Smith, the top keywords are related to land, agriculture, and government policies related to production and trade. This is in line with Smith’s focus on the importance of free markets and competition in driving economic growth.

For Keynes, the top keywords are related to the economic and political issues surrounding World War I and its aftermath, particularly in relation to Germany, inflation, and international relations. This reflects Keynes’ view that government intervention in the economy can be necessary to address economic crises.

For Ricardo, the top keywords are related to land, agriculture, and the relationships between landlords, laborers, and consumers. This is consistent with Ricardo’s focus on the role of rent, wages, and profits in economic systems.

N-grams

Now, in this next section an N-grams analysis is done. First 2 functions are created in order to make a bigrams (2 words) analysis.

The count_bigrams function:

  1. Splits the text into individual words and then group them into bigrams.
  2. Separates the splited words from bigrams into separate columns for each word.
  3. Filters to remove any stop words from the bigrams.
  4. Counts the frequency of each unique bigram and sorts the results in descending order.
  5. Resulting output is a data frame with columns for each word in the bigram and a frequency count.

The visualize_bigrams function:

Takes the output from the count_bigrams function as input and creates a graph to visualize the relationships between the bigrams.

count_bigrams <- function(dataset) {
  dataset %>%
    unnest_tokens(bigram, text, token = "ngrams", n = 2) %>%
    separate(bigram, c("word1", "word2"), sep = " ") %>%
    filter(!word1 %in% stop_words$word,
           !word2 %in% stop_words$word) %>%
    count(word1, word2, sort = TRUE)
}

visualize_bigrams <- function(bigrams) {
  set.seed(2016)
  a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
  
  bigrams %>%
    graph_from_data_frame() %>%
    ggraph(layout = "fr") +
    geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
    geom_node_point(color = "lightblue", size = 5) +
    geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
    theme_void()
}

Keynes bigrams

keynes_bigrams <- keynes %>%
  count_bigrams() %>%
    drop_na() 

keynes_bigrams
## # A tibble: 10,970 × 3
##    word1      word2          n
##    <chr>      <chr>      <int>
##  1 purchasing power         91
##  2 pre        war           82
##  3 reparation commission    46
##  4 price      level         44
##  5 power      parity        41
##  6 bank       rate          33
##  7 note       issue         29
##  8 gold       standard      28
##  9 german     government    27
## 10 united     kingdom       25
## # … with 10,960 more rows

Based on these results, we can see that Keynes’ writings contain a number of bigrams related to economic concepts and policy, including “purchasing power,” “reparation commission,” “price level,” and “power parity.” The presence of bigrams related to pre-war and post-war periods, such as “pre-war” and “reparation commission,” suggest that Keynes was writing in the context of a changing economic and political landscape. Additionally, the presence of bigrams related to gold standard and bank rate suggest that Keynes was concerned with monetary policy and exchange rates.

Smith bigrams

smith_bigrams <- smith %>%
  count_bigrams() %>%
    drop_na() 

smith_bigrams
## # A tibble: 14,144 × 3
##    word1      word2       n
##    <chr>      <chr>   <int>
##  1 annual     produce   149
##  2 foreign    trade     102
##  3 money      price      89
##  4 home       market     85
##  5 rude       produce    75
##  6 0          0          65
##  7 productive labour     60
##  8 surplus    produce    60
##  9 thousand   pounds     60
## 10 east       indies     56
## # … with 14,134 more rows

These results indicate that Adam Smith’s writings frequently reference economic concepts related to trade, production, and labor. The most frequent bigram in Smith’s writings is “annual produce,” which may refer to the total output of goods and services in an economy. Other frequent bigrams, such as “foreign trade” and “home market,” suggest that Smith was concerned with international trade and the domestic market. Bigrams related to money and prices, such as “money price” and “thousand pounds,” suggest that Smith was also interested in monetary policy and exchange rates. The mention of the “East Indies” may also indicate a focus on trade with Asia.

Ricardo Bigrams

ricardo_bigrams <- ricardo %>%
  count_bigrams() %>%
    drop_na() 

ricardo_bigrams
## # A tibble: 5,584 × 3
##    word1    word2        n
##    <chr>    <chr>    <int>
##  1 raw      produce    152
##  2 natural  price       98
##  3 adam     smith       97
##  4 market   price       52
##  5 money    price       37
##  6 precious metals      36
##  7 foreign  trade       33
##  8 capital  employed    31
##  9 fixed    capital     27
## 10 dr       smith       26
## # … with 5,574 more rows

These results suggest that David Ricardo’s writings frequently reference economic concepts related to production, prices, and trade. The most frequent bigram in Ricardo’s writings is “raw produce,” which may refer to the output of natural resources and agricultural goods.The presence of bigrams related to prices, such as “natural price” and “market price,” suggest that Ricardo was interested in theories of value and pricing. The mention of Adam Smith, one of Ricardo’s predecessors and influences, also suggests that Ricardo’s writings are in dialogue with the economic ideas of his time.Finally, the mention of “precious metals” and “foreign trade” suggest that Ricardo was interested in international trade and monetary policy. The presence of bigrams related to capital, such as “capital employed” and “fixed capital,” suggest that Ricardo also considered issues related to investment and capital accumulation.

Visualize Bigrams

This plot comes from the visualize_bigrams function defined before. It shows the bigrams as nodes in the graph, with lines (edges) connecting related bigrams. The thickness and color of the lines indicate the frequency of the bigram pairs.

Keynes

keynes_bigrams %>%
  filter(n > 10,
         !str_detect(word1, "\\d"),
         !str_detect(word2, "\\d")) %>%
  visualize_bigrams()

We can observe more clearly all the economic concepts and institutions how they are related: For example: federal reserve board or purchasing power parity. There is a clear appearance of politic institutions in Germany and the USA with a mix of economic terms and theories.

Smith

smith_bigrams %>%
  filter(n > 30,
         !str_detect(word1, "\\d"),
         !str_detect(word2, "\\d")) %>%
  visualize_bigrams()

Here we can see how the ideas are more in groups like for example the term trade appears many times with carrying, colonyand foreign. Also money and produce can be interpreted as central nodes which goes in line to what I have been saying about Adam Smith ideas of international trade, money and production.

Ricardo

 ricardo_bigrams %>%
  filter(n > 10,
         !str_detect(word1, "\\d"),
         !str_detect(word2, "\\d")) %>%
  visualize_bigrams()

For Ricardo, it can be seen that price is the most related word which goes in line with the idea I have been saying about his theories of value and pricing. Also it can be observed the mention of Adam Smith inside his texts and finally the idea of value can be interpreted with the capital and produce words.

Words Contained in Lines

Now, I created a data frame with a new column that shows in which line each word appeared in the tex. This will be done for 2 reasons: First to see how many times a word appear together with another word and then to take the correlation (taking into account when they appear and not appear together).

Separate in Lines and Pairwise Count

Keynes

keynes_section_words <- keynes %>%
  mutate(section = row_number() %/% 10) %>%
  filter(section > 0) %>%
  unnest_tokens(word, text) %>%
  filter(!word %in% stop_words$word) %>%
  filter(!word %in% stopwords2$word) %>%
  anti_join(economists_stopwords)

keynes_section_words
## # A tibble: 49,190 × 3
##    gutenberg_id section word          
##           <int>   <dbl> <chr>         
##  1        15776       1 preface       
##  2        15776       2 writer        
##  3        15776       2 book          
##  4        15776       2 temporarily   
##  5        15776       2 attached      
##  6        15776       2 british       
##  7        15776       2 treasury      
##  8        15776       2 war           
##  9        15776       2 official      
## 10        15776       2 representative
## # … with 49,180 more rows

Using the function pairwise count to count the # of times a word and another word appear together.

Keynes Count

keynes_word_pairs <- keynes_section_words %>%
  pairwise_count(word, section, sort = TRUE)

keynes_word_pairs
## # A tibble: 1,184,544 × 3
##    item1      item2          n
##    <chr>      <chr>      <dbl>
##  1 germany    war           89
##  2 war        germany       89
##  3 germany    german        82
##  4 german     germany       82
##  5 allies     germany       75
##  6 germany    allies        75
##  7 purchasing power         74
##  8 power      purchasing    74
##  9 germany    treaty        66
## 10 treaty     germany       66
## # … with 1,184,534 more rows

It seems like the analysis on Keynes’ text is focused on the aftermath of World War I and the Treaty of Versailles, with words like “Germany,” “war,” “allies,” “reparation,” “treaty,” and “France” appearing frequently. There are also mentions of economic concepts like “purchasing power” and “currency exchange.”

Smith

smith_section_words <- smith %>%
  mutate(section = row_number() %/% 10) %>%
  filter(section > 0) %>%
  unnest_tokens(word, text) %>%
  filter(!word %in% stop_words$word) %>%
  filter(!word %in% stopwords2$word) %>%
  anti_join(economists_stopwords)

smith_section_words
## # A tibble: 129,159 × 3
##    gutenberg_id section word        
##           <int>   <dbl> <chr>       
##  1         3300       1 contents    
##  2         3300       1 introduction
##  3         3300       1 plan        
##  4         3300       1 book        
##  5         3300       1 improvement 
##  6         3300       1 productive  
##  7         3300       1 powers      
##  8         3300       1 labour      
##  9         3300       1 produce     
## 10         3300       1 naturally   
## # … with 129,149 more rows

Smith Count

smith_word_pairs <- smith_section_words %>%
  pairwise_count(word, section, sort = TRUE)

smith_word_pairs
## # A tibble: 1,982,406 × 3
##    item1   item2       n
##    <chr>   <chr>   <dbl>
##  1 silver  gold      245
##  2 gold    silver    245
##  3 land    produce   230
##  4 produce land      230
##  5 country produce   218
##  6 produce country   218
##  7 produce labour    216
##  8 labour  produce   216
##  9 price   market    200
## 10 market  price     200
## # … with 1,982,396 more rows

For Smith, we see a focus on the importance of land and the production of goods in a country, as well as the relationship between market price and the quantity of labor and trade. This is consistent with Smith’s emphasis on the role of markets in driving economic growth and the importance of factors of production like land and labor.

Ricardo

ricardo_section_words <- ricardo %>%
  mutate(section = row_number() %/% 10) %>%
  filter(section > 0) %>%
  unnest_tokens(word, text) %>%
  filter(!word %in% stop_words$word) %>%
  filter(!word %in% stopwords2$word) %>%
  anti_join(economists_stopwords)

ricardo_section_words
## # A tibble: 42,502 × 3
##    gutenberg_id section word      
##           <int>   <dbl> <chr>     
##  1        33310       1 principles
##  2        33310       1 political 
##  3        33310       1 economy   
##  4        33310       2 taxation  
##  5        33310       2 david     
##  6        33310       2 ricardo   
##  7        33310       2 esq       
##  8        33310       2 london    
##  9        33310       2 john      
## 10        33310       2 murray    
## # … with 42,492 more rows

Ricardo Count

ricardo_word_pairs <- ricardo_section_words %>%
  pairwise_count(word, section, sort = TRUE)

ricardo_word_pairs
## # A tibble: 590,612 × 3
##    item1       item2           n
##    <chr>       <chr>       <dbl>
##  1 price       corn          204
##  2 corn        price         204
##  3 price       produce       186
##  4 produce     price         186
##  5 rise        price         184
##  6 price       rise          184
##  7 price       commodities   181
##  8 commodities price         181
##  9 price       labour        173
## 10 labour      price         173
## # … with 590,602 more rows

In Ricardo’s analysis, we see a focus on the relationship between the prices of precious metals and the market price of goods, as well as the importance of fixed capital and the employment of capital in the production process. This is consistent with Ricardo’s emphasis on the importance of international trade and the role of capital in driving economic growth.

In general, the n-gram analysis allows us to identify specific words and phrases that are frequently used by each economist, giving us a better understanding of their areas of focus and their key ideas.

Correlating pairs of words

A better measure taking into account the number of times the words appear is correlation which indicates the frequency of words appearing together in comparison to their frequency of appearing separately.

To evaluate this correlation, I used the Phi coefficient, which is comparable to the Pearson Correlation. The Phi coefficient determines the likelihood of two words appearing together in a corpus by considering the individual probabilities of each word appearing alone.

Keynes

keynes_word_cors <- keynes_section_words %>%
  group_by(word) %>%
  filter(n() >= 20) %>%
  pairwise_cor(word, section, sort = TRUE)

keynes_word_cors
## # A tibble: 242,556 × 3
##    item1      item2      correlation
##    <chr>      <chr>            <dbl>
##  1 silesia    upper            0.953
##  2 upper      silesia          0.953
##  3 board      federal          0.898
##  4 federal    board            0.898
##  5 lorraine   alsace           0.884
##  6 alsace     lorraine         0.884
##  7 minister   prime            0.796
##  8 prime      minister         0.796
##  9 nineteenth century          0.771
## 10 century    nineteenth       0.771
## # … with 242,546 more rows

The pair “upper silesia” has the highest correlation coefficient of 0.9527372, suggesting that these two terms appear together very frequently in the analyzed text. Other pairs with high correlation coefficients include “federal board” (0.8984791), “alsace lorraine” (0.8840521), and “prime minister” (0.7959321). These pairs of words might represent specific concepts or events that were frequently discussed in the text. It can also be observed some economic terms like “power purschasing” and geopolitical issues like “austria hungary”.

Smith

smith_word_cors <- smith_section_words %>%
  group_by(word) %>%
  filter(n() >= 20) %>%
  pairwise_cor(word, section, sort = TRUE)

smith_word_cors
## # A tibble: 1,626,900 × 3
##    item1     item2     correlation
##    <chr>     <chr>           <dbl>
##  1 butcher’s meat            0.920
##  2 meat      butcher’s       0.920
##  3 forts     garrisons       0.832
##  4 garrisons forts           0.832
##  5 answering demands         0.722
##  6 demands   answering       0.722
##  7 silver    gold            0.700
##  8 gold      silver          0.700
##  9 barrel    herrings        0.686
## 10 herrings  barrel          0.686
## # … with 1,626,890 more rows

For Smith, the pairs with the highest correlation seem to be related to specific goods, such as “butcher’s meat”, “forts garrisons”, and “barrel herrings”. Other pairs include “creditor debtor” and “receipts receipt”, which are related to financial concepts.

Ricardo

ricardo_word_cors <- ricardo_section_words %>%
  group_by(word) %>%
  filter(n() >= 20) %>%
  pairwise_cor(word, section, sort = TRUE)

ricardo_word_cors
## # A tibble: 142,506 × 3
##    item1       item2       correlation
##    <chr>       <chr>             <dbl>
##  1 adam        smith             0.820
##  2 smith       adam              0.820
##  3 precious    metals            0.768
##  4 metals      precious          0.768
##  5 fish        game              0.689
##  6 game        fish              0.689
##  7 net         gross             0.567
##  8 gross       net               0.567
##  9 fixed       circulating       0.548
## 10 circulating fixed             0.548
## # … with 142,496 more rows

The correlations found in the analysis of Ricardo’s texts suggest that he frequently discussed topics related to economics and commerce, as well as the work of Adam Smith. Additionally, Ricardo seems to have shown interest in precious metals, fish and game, as well as concepts such as net and gross, fixed and circulating capital, and maintenance funds. Finally, it appears that Ricardo also had a particular focus on Portugal, discussing topics such as wine and cloth in relation to the country

Plots of Correlations

Plotting by word

To better show our analysis, I filtered by relevant words and plot it

Keynes

keynes_word_cors %>%
  filter(item1 %in% c("federal", "power", "nations", "materials")) %>%
  group_by(item1) %>%
  slice_max(correlation, n = 6) %>%
  ungroup() %>%
  mutate(item2 = reorder(item2, correlation)) %>%
  ggplot(aes(item2, correlation, fill=item1)) +
  geom_bar(stat = "identity") +
  facet_wrap(~ item1, scales = "free") +
  coord_flip()

In this graph, the correlation with specific words can be observed. In the materials section probably it has something to do with examples Keynes gives in his book and uses those materials. Nations and federal are both words which relate to other politic topics and power with more general economic terms.

Smith

smith_word_cors %>%
  filter(item1 %in% c("wheat", "creditor", "coinage", "hand")) %>%
  group_by(item1) %>%
  slice_max(correlation, n = 6) %>%
  ungroup() %>%
  mutate(item2 = reorder(item2, correlation)) %>%
  ggplot(aes(item2, correlation, fill=item1)) +
  geom_bar(stat = "identity") +
  facet_wrap(~ item1, scales = "free") +
  coord_flip()

In this graph, there are two type of words that relate to economic terms: coinage and creditor while hand relates more with characteristics of the labor force and human work. Finally, wheat probably appears a lot as an example of a product and it is related with money, prices and quantities.

Ricardo

ricardo_word_cors %>%
  filter(item1 %in% c("adam", "land", "foreign", "bank")) %>%
  group_by(item1) %>%
  slice_max(correlation, n = 6) %>%
  ungroup() %>%
  mutate(item2 = reorder(item2, correlation)) %>%
  ggplot(aes(item2, correlation, fill=item1)) +
  geom_bar(stat = "identity") +
  facet_wrap(~ item1, scales = "free") +
  coord_flip()

In this final graph I wanted to include Adam so the correlation between words associated with Adam Smith can be observed. Land is also an important word in the literature of Ricardo and can see how it is correlated with agricultural prodcution. Foreign includes a mix of economic and political terms. Finally, bank includes types of currencies and values.

Plotting correlations in a graph

For this last part, in the same way that I employed ggraph to represent bigrams visually, I used it to portray the connections and groups of words.

Keynes

keynes_word_cors %>%
  filter(correlation > 0.40) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

Highly correlated words can be observed in this graph and it can be seen a group of months, political terms, economic terms, currencies and values, and international issues.

Smith

smith_word_cors %>%
  filter(correlation > 0.45) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

For Smith, a group of words related to fish can be observed, financial terms, raw materials like precious metals, territories like India and Asia and other economic terms like demands, seignorage, selling, buying, stationary, etc.

Ricardo

ricardo_word_cors %>%
  filter(correlation > 0.30) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

For Ricardo, a lower threshold for correlation was used (0.30) which can be due to the more variety of words and concepts that he uses in his text. It can be seen that a big group of words around coin is created with connected words related to currencies and precious metals. Also a section of labor economics is seen with words like wages, laborers and rise. Places can also be seen, raw materials and finally there is an Adam Smith group.

Conclusions

For all three economists, their most frequently occurring words are related to economic concepts such as “price”, “produce”, “trade”, “labour”, “currency”, “capital”, and “market”. In terms of pairs of words, the most frequent pairs for Keynes include “Germany war”, “purchasing power”, and “currency exchange”, while for Smith the most frequent pairs are “silver gold”, “land produce”, and “country produce”. For Ricardo, the most frequent pairs include “Adam Smith”, “precious metals”, and “fish game”.

Looking at correlations between pairs of words, Keynes has high correlations for pairs such as “silesia upper”, “purchasing power”, and “league nations”, while for Smith the highest correlations are for pairs such as “butcher’s meat”, “forts garrisons”, and “silver gold”. For Ricardo, high correlations were found for pairs such as “Adam Smith”, “precious metals”, and “fish game”. It’s interesting to note that while all three economists were writing about economics, their frequently occurring words, pairs of words, and correlations differ. This suggests that they had different areas of focus and interest within the field of economics.