institutions using cik and tags
import os
import requests
import time
import polars as pl
from rich import print as rprint
Will use the List of Institutions endpoint:
https://api.unusualwhales.com/docs#/operations/PublicApi.InstitutionController.list
To collect the basic institutional information for each reporting entity.
uw_token = os.environ['UW_TOKEN']
headers = {'Accept': 'application/json, text/plain', 'Authorization': uw_token}
url = f'https://api.unusualwhales.com/api/institutions'
responses = []
for i in range(0, 20):
params = {'limit': 500, 'page': i}
rsp = requests.get(url, headers=headers, params=params)
if len(rsp.json()['data']) > 0:
responses.append(rsp.json()['data'])
time.sleep(1) # rate limit
else:
break
flat_list_of_institutions = [item for sublist in responses for item in sublist]
rprint(f'Example institution data:')
rprint(flat_list_of_institutions[0])
rprint(f'Total number of institutions found: {len(flat_list_of_institutions)}')
Example institution data:
{ 'buy_value': '76262185029.07', 'call_holdings': '0', 'call_value': '0', 'cik': '0000102909', 'date': '2024-09-30', 'debt_holdings': '0', 'debt_value': '0', 'description': None, 'filing_date': '2024-11-13', 'founder_img_url': None, 'fund_holdings': '423603256', 'fund_value': '53478349756', 'is_hedge_fund': False, 'logo_url': None, 'name': 'VANGUARD GROUP INC', 'name_changes': [], 'people': [], 'pfd_holdings': '54956224', 'pfd_value': '237680679', 'put_holdings': '0', 'put_value': '0', 'sell_value': '-26715899906.65', 'share_holdings': '58456162207', 'share_value': '5530591118453', 'short_name': 'Vanguard', 'tags': [], 'total_value': '5584478889705', 'warrant_holdings': '524820', 'warrant_value': '715335', 'website': None }
Total number of institutions found: 7296
Since the field cik
has been made available for
records returned from this endpoint, we can now use this field
as a distinct identifier for institutions.
Note: we are aware that there are three (3) duplicated institutions in the data set as of Wednesday 2025-01-08:
cik | name
0001328551|SIGULER GUFF ADVISERS, LLC
0001697791|UNITED BANK
0001845943|THRIVE CAPITAL MANAGEMENT, LLC
While engineering is taking care of this, I will remove them
from my working data set by using a built-in
unique()
function:
institutes = []
for inst in flat_list_of_institutions:
institutes.append(
{
'cik': inst.get('cik', None),
'name': inst.get('name', None),
'people': inst.get('people', None),
'is_hedge_fund': inst.get('is_hedge_fund', None),
'tags': inst.get('tags', None),
'total_value': inst.get('total_value', None),
'filing_date': inst.get('filing_date', None),
'logo_url': inst.get('logo_url', None),
'founder_img_url': inst.get('founder_img_url', None),
}
)
raw_df = pl.DataFrame(institutes)
distinct_institutions_df = (
raw_df
.unique(subset=['cik'], keep='first')
.with_columns(
pl.col('total_value').cast(pl.Float64).cast(pl.Int64),
pl.col('filing_date').str.strptime(pl.Date, '%Y-%m-%d'),
)
.sort('total_value', descending=True)
)
distinct_institutions_df
cik | name | people | is_hedge_fund | tags | total_value | filing_date | logo_url | founder_img_url |
---|---|---|---|---|---|---|---|---|
str | str | list[str] | bool | list[str] | i64 | date | str | str |
"0000102909" | "VANGUARD GROUP… | [] | false | [] | 5584478889705 | 2024-11-13 | null | null |
"0002012383" | "BLACKROCK, INC… | [] | false | [] | 4763739178160 | 2024-11-13 | null | null |
"0001677560" | "MEMBERS TRUST … | [] | false | [] | 3084048110000 | 2024-10-15 | null | null |
"0000093751" | "STATE STREET C… | [] | false | [] | 2457609922723 | 2024-11-14 | null | null |
"0000315066" | "FMR LLC" | [] | false | [] | 1643409181326 | 2024-11-13 | null | null |
… | … | … | … | … | … | … | … | … |
"0001279329" | "SHEN NEIL NANP… | [] | false | [] | 124458 | 2024-11-13 | null | null |
"0001594167" | "JABODON PT CO" | [] | false | [] | 81231 | 2024-10-28 | null | null |
"0001279887" | "CINCINNATI LIF… | [] | false | [] | 67500 | 2024-11-07 | null | null |
"0001629996" | "ALKEN ASSET MA… | [] | false | [] | 50017 | 2024-10-11 | null | null |
"0001639101" | "BFAM PARTNERS … | [] | true | ["hedge_fund"] | 44000 | 2024-11-13 | null | null |
Nice, this dataframe contains distinct institutions as indicated
by the cik
field.
Just for fun, let's take a look at the tagged institutions for much more interesting results:
target_tags = [
'13d_activist', 'activist', 'value_investor', 'small_cap', 'biotech',
'credit', 'technology', 'tiger_club', 'energy', 'esg', 'event'
]
high_priority_institutions_df = (
distinct_institutions_df
.filter(
pl.col('tags').list.eval(pl.element().is_in(target_tags)).list.any()
)
)
high_priority_institutions_df
cik | name | people | is_hedge_fund | tags | total_value | filing_date | logo_url | founder_img_url |
---|---|---|---|---|---|---|---|---|
str | str | list[str] | bool | list[str] | i64 | date | str | str |
"0001067983" | "BERKSHIRE HATH… | ["Warren Buffett", "Charlie Munger"] | false | ["activist", "value_investor"] | 266378900503 | 2024-11-14 | "https://storag… | "https://storag… |
"0000200217" | "DODGE & COX" | ["Dana Emery", "Roger Kuo", "David Hoeft"] | true | ["value_investor", "hedge_fund"] | 176833282150 | 2024-11-13 | "" | "" |
"0001348883" | "CLEARBRIDGE IN… | ["Terrence Murphy"] | true | ["activist", "hedge_fund"] | 127163204011 | 2024-11-12 | "https://storag… | "" |
"0001009207" | "D. E. SHAW & C… | ["David Shaw"] | false | ["13d_activist"] | 116492369102 | 2024-11-14 | "https://storag… | "" |
"0001466153" | "ARTISAN PARTNE… | ["Andy Ziegler", "Carlene Ziegler"] | true | ["activist", "hedge_fund"] | 67292375110 | 2024-11-12 | "https://storag… | "" |
… | … | … | … | … | … | … | … | … |
"0001649339" | "SCION ASSET MA… | ["Michael Burry"] | true | ["value_investor", "hedge_fund"] | 129753375 | 2024-11-14 | "" | "" |
"0001513193" | "ARBITER PARTNE… | ["Paul Isaac"] | true | ["value_investor", "hedge_fund"] | 113440845 | 2024-11-13 | "" | "" |
"0001636974" | "THUNDERBIRD PA… | ["David Fear"] | false | ["activist", "tiger_club"] | 58511848 | 2024-11-14 | "" | "" |
"0001835549" | "ENGINE NO. 1 L… | ["Christopher James"] | false | ["activist", "esg"] | 50515737 | 2024-11-14 | "https://storag… | "" |
"0001817187" | "INCLUSIVE CAPI… | ["Jeff Ubben"] | true | ["esg", "hedge_fund"] | 26049270 | 2024-11-14 | "" | "" |