A possible solution. The issue is that the way edgar
output company names is different from others for some reason.
My example data with SEC Filings in 2022
edgar::getMasterIndex(2022, "myemail-2022@gmail.com")
load("Master Indexes/2022master.Rda")
df <- year.master %>%
as_tibble() %>%
filter(form.type == "10-K") %>%
mutate(company.name = company.name %>%
str_squish())
# A tibble: 6,906 x 6
cik company.name form.type date.filed edgar.link quarter
<chr> <chr> <chr> <chr> <chr> <int>
1 1000209 MEDALLION FINANCIAL~ 10-K 2022-03-14 edgar/dat~ 1
2 1000228 HENRY SCHEIN INC 10-K 2022-02-15 edgar/dat~ 1
3 1000229 CORE LABORATORIES N~ 10-K 2022-02-10 edgar/dat~ 1
4 1000298 IMPAC MORTGAGE HOLD~ 10-K 2022-03-11 edgar/dat~ 1
5 1000623 SCHWEITZER MAUDUIT ~ 10-K 2022-03-01 edgar/dat~ 1
6 1000683 BLONDER TONGUE LABO~ 10-K 2022-03-31 edgar/dat~ 1
7 1000694 NOVAVAX INC 10-K 2022-03-01 edgar/dat~ 1
8 1000697 WATERS CORP DE 10-K 2022-02-24 edgar/dat~ 1
9 1000753 INSPERITY INC 10-K 2022-02-11 edgar/dat~ 1
10 1001082 DISH Network CORP 10-K 2022-02-24 edgar/dat~ 1
# ... with 6,896 more rows
Now, download the .Rdata
from my Drive will give you a data frame of stocks, which has company names and their respective tickers
# A tibble: 6,255 x 4
symbol company_name industry market_cap
<chr> <chr> <chr> <chr>
1 A Agilent Technologies Life Sciences Tools &~ 34.97B
2 AA Alcoa Metals & Mining 9.08B
3 AAC Ares Acquisition Blank Check / SPAC 1.23B
4 AACG ATA Creativity Global Diversified Consumer ~ 35.23M
5 AACI Armada Acquisition I Blank Check / SPAC 204.51M
6 AADI Aadi Bioscience Biotechnology 249.63M
7 AAIC Arlington Asset Investment Mortgage Real Estate ~ 96.11M
8 AAL American Airlines Group Airlines 8.77B
9 AAMC Altisource Asset Management Real Estate Managemen~ 22.16M
10 AAME Atlantic American Insurance 55.09M
# ... with 6,245 more rows
Merge stocks
with df
stocks %>%
merge(., df, by.x = "company_name", by.y = "company.name") %>%
as_tibble()
# A tibble: 101 x 9
company_name symbol industry market_cap cik form.type date.filed
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 American Home~ AMH Equity ~ 14.18B 1562~ 10-K 2022-02-25
2 Ameris Bancorp ABCB Banks 2.86B 3515~ 10-K 2022-02-28
3 Aon plc AON Insuran~ 56.42B 3152~ 10-K 2022-02-18
4 Apollo Strate~ APGB Shell C~ 848.70M 1838~ 10-K 2022-03-17
5 Bank of Marin~ BMRC Banks 515.27M 1403~ 10-K 2022-03-15
6 Beam Global BEEM Electri~ 149.82M 1398~ 10-K 2022-03-31
7 Blue Foundry ~ BLFY Banks-R~ 347.40M 1846~ 10-K 2022-03-14
8 California Ba~ CALB Banks 158.12M 1752~ 10-K 2022-03-23
9 CC Neuberger ~ PRPB Shell C~ 1.08B 1812~ 10-K 2022-03-01
10 CC Neuberger ~ PRPC Blank C~ 544.28M 1821~ 10-K 2022-03-31
# ... with 91 more rows, and 2 more variables: edgar.link <chr>,
# quarter <int>
library(tidyquant)
final_df <- merged_df %>%
mutate(date.filed = date.filed %>% as.Date(),
price_before_filing = map2(.x = symbol,
.y = date.filed,
~ tq_get(.x, from = as.Date(.y) - 1) %>%
slice(1) %>%
pull(adjusted)) %>%
as.numeric()) %>%
select(symbol, company_name, price_before_filing, everything())
# A tibble: 101 x 10
symbol company_name price_before_fi~ industry market_cap cik
<chr> <chr> <dbl> <chr> <chr> <chr>
1 AMH American Homes 4 ~ 37.4 Equity ~ 14.18B 1562~
2 ABCB Ameris Bancorp 49.3 Banks 2.86B 3515~
3 AON Aon plc 281. Insuran~ 56.42B 3152~
4 APGB Apollo Strategic ~ 9.75 Shell C~ 848.70M 1838~
5 BMRC Bank of Marin Ban~ 35.1 Banks 515.27M 1403~
6 BEEM Beam Global 18.4 Electri~ 149.82M 1398~
7 BLFY Blue Foundry Banc~ 13.6 Banks-R~ 347.40M 1846~
8 CALB California BanCorp 22.4 Banks 158.12M 1752~
9 PRPB CC Neuberger Prin~ 9.89 Shell C~ 1.08B 1812~
10 PRPC CC Neuberger Prin~ 9.85 Blank C~ 544.28M 1821~
# ... with 91 more rows, and 4 more variables: form.type <chr>,
# date.filed <date>, edgar.link <chr>, quarter <int>