This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.
Key source of "alpha"
Computer science and linguistics tools to interpret human language and text
Allows us to quantify unstructured text
my_phrases = ['Show me the alpha', 'Carthago delenda est',]
[phrase.split(' ') for phrase in my_phrases]
[['Show', 'me', 'the', 'alpha'], ['Carthago', 'delenda', 'est']]
[word for phrase
in my_phrases for word
in zip(phrase.split(' ')[:-1], phrase.split(' ')[1:])]
[('Show', 'me'), ('me', 'the'), ('the', 'alpha'), ('Carthago', 'delenda'), ('delenda', 'est')]
print(w2v.wv['sad'])
[ -1.89863455e+00 -1.54665136e+00 -2.23204970e+00 -1.30877733e+00 -2.30061579e+00 -1.70134628e+00 1.83337653e+00 -2.08741140e+00 3.23724604e+00 1.26184821e+00 -9.99662220e-01 -4.37552959e-01 4.40503418e-01 1.19143569e+00 -2.29179478e+00 -1.86814177e+00 3.11535645e+00 1.62474096e+00 2.66866231e+00 2.83645630e+00 -1.28488052e+00 1.35040748e+00 1.01865172e+00 -4.80658680e-01 4.23388511e-01 -1.65452003e+00 9.91536558e-01 2.40602851e+00 -7.76076317e-01 1.94842303e+00 7.72831738e-01 7.38338292e-01 -2.83442521e+00 6.60653114e-01 1.49878132e+00 6.51400387e-01 -2.40639806e+00 -1.07167780e+00 -1.02165806e+00 9.60173905e-01 2.21353650e+00 -1.42129743e+00 -9.27708030e-01 -3.88164580e-01 1.46669912e+00 8.52385104e-01 5.12198329e-01 -4.31529015e-01 2.94047624e-01 1.53495061e+00 2.58021164e+00 -5.14630191e-02 -1.65423024e+00 1.98876739e+00 -3.05596733e+00 4.68582273e-01 -7.30318785e-01 -2.12546796e-01 2.40485692e+00 2.02279878e+00 1.17719293e+00 2.86357617e+00 7.22466826e-01 2.82972664e-01 -7.58317888e-01 3.47945952e+00 3.73739266e+00 1.42169583e+00 -7.90117681e-01 1.84042037e+00 1.98835433e+00 1.78161597e+00 2.36412417e-03 -4.24685836e-01 -1.56673503e+00 1.27409828e+00 -7.06087649e-01 5.39561808e-01 -2.57677864e-02 -2.28654718e+00 -1.61694837e+00 1.12416410e+00 1.56511533e+00 9.97117162e-01 -1.62680793e+00 -5.11504531e-01 2.35140419e+00 2.85187773e-02 9.96020317e-01 2.01621875e-01 -2.27278137e+00 -8.56919646e-01 -1.24364936e+00 1.93653297e+00 -1.74209273e+00 -2.66480112e+00 -1.75437510e+00 -2.04358786e-01 -3.64142150e-01 2.14454412e+00]
w2v.wv.most_similar('sad')
[('upset', 0.815515398979187), ('depress', 0.7984983921051025), ('bum', 0.7161489725112915), ('devast', 0.7080429792404175), ('disappoint', 0.6943762302398682), ('angri', 0.658515453338623), ('sadd', 0.6404643058776855), ('embarrass', 0.6364743709564209), ('gut', 0.6311415433883667), ('unhappi', 0.6310410499572754)]
plot_embedded_clusters(dim_red, clustered_wv_df.labels_, legend=[1, 8, 12, 32])
wv_df[clustered_wv_df.labels_ == 1].head()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
disappoint | -0.235032 | -1.696048 | -0.039209 | -1.006235 | -0.968943 | 0.117551 | 2.012763 | -1.500448 | 0.371111 | -0.103024 | ... | -1.867966 | 0.366410 | -1.687534 | 1.776099 | -0.001948 | -0.942753 | -0.323758 | 0.418799 | 0.703646 | 0.675984 |
annoy | -0.509595 | -0.234477 | -0.105023 | 0.782759 | -0.494718 | 1.529665 | 0.613385 | -2.956682 | 1.914651 | 1.837232 | ... | 0.357051 | -0.184894 | -0.765359 | 0.194613 | -0.089562 | -0.790999 | -0.715266 | 1.710391 | -0.664831 | 0.446600 |
sad | -1.898635 | -1.546651 | -2.232050 | -1.308777 | -2.300616 | -1.701346 | 1.833377 | -2.087411 | 3.237246 | 1.261848 | ... | -2.272781 | -0.856920 | -1.243649 | 1.936533 | -1.742093 | -2.664801 | -1.754375 | -0.204359 | -0.364142 | 2.144544 |
upset | -2.267228 | -1.130764 | -0.789428 | -0.307305 | -1.913559 | -1.307800 | 1.678527 | -2.597203 | 1.749868 | -0.047062 | ... | -1.734596 | 0.064607 | -2.011217 | 0.364249 | -1.385140 | -1.366344 | -0.929015 | 0.583783 | -0.106717 | 0.374008 |
depress | -1.098669 | -1.022491 | -1.353772 | -2.082869 | -1.199755 | -0.110646 | 1.156238 | -0.887470 | 2.172702 | 0.714289 | ... | -0.327113 | -0.025306 | -0.187666 | 0.595404 | 0.282716 | -2.097348 | -0.750469 | 0.707328 | -0.702710 | 1.245831 |
5 rows × 100 columns
Twitter sentiment dataset
Logistic Regression with Bag of Words
Neural Network with Word Embeddings
\begin{eqnarray} g(X) &=& \alpha + \beta_0 X_0 + \cdots + \beta_n X_N \\ F(X) &=& \frac{1}{1 + e^{-g(X)}} \end{eqnarray}
Friendly, familiar linear model
Easy to interpret and understand
Recurrent neural networks good for text processing
View sentences as sequences of words (similar to time series structure)
In Keras this is simple to implement
input_layer = Input(shape=(MAX_WORDS,))
embedding_layer = Embedding(max_features+1,
embedding_dim,
input_length=MAX_WORDS)(input_layer)
lstm_layer = LSTM(64,
dropout=DROPOUT,
activation='tanh',
return_sequences=True)(embedding_layer)
lstm_layer = LSTM(128,
dropout=DROPOUT,
activation='tanh')(lstm_layer)
output = Dense(1,
activation='sigmoid',
name='sentiment')(lstm_layer)
Hypothesis: POTUS's tweets affect the market
CAPM
$$ r_p = \alpha + \beta_{m} r_{m} $$
Fama-French Factors
$$ r_p = \alpha + \beta_{m} r_{m} + \beta_{hml} r_{hml} + \beta_{smb} r_{smb} $$
Alternative Factors
$$ r_p = \alpha + \beta_0 r_0 + \cdots + \beta_n r_n $$
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.