Part 3: Plotting Results
So we come to the final stage - visualising the results of all this work. I plan to output three graphs:
- A histogram and distribution of the unique words
- A histogram and model distribution of rhyme density
- A scatter plot of both variables
In order to plot a continuous distribution for the first two graphs I'm going to use a Kernel Denisty Estimator. This is a way of modelling a complicated distribution from limited data. It works by summing many small distributions at each data point to get one continuous curve
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors.kde import KernelDensity
import seaborn as sns
def getKDE(data, bandwidth, xmin, xmax, num_intervals=10000):
kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(data.reshape(len(data), 1))
x_space = np.linspace(xmin, xmax, num_intervals)
logdens = kde.score_samples(x_space.reshape(num_intervals, 1))
x_space.reshape(1, num_intervals)
distribution = np.exp(logdens) # fitted density
return distribution, x_space
The above function allows us to obtain this distribution as an array over our x_space. My next function takes this distribution and plots it over a histogram of the data
def plotFit(distribution, x_space, data, title='', xlabel='', bins=40, color='red'):
plt.figure()
plt.hist(data, color=color, bins=bins, alpha=0.5, normed=True)
plt.plot(x_space, distribution, color='black')
data_l = min(data)
data_r = max(data)
data_w = data_r - data_l
dist_t = max(distribution)
axes = plt.gca()
axes.set_xlim(data_l - 0.1 * data_w, data_r + 0.1 * data_w)
axes.axes.get_yaxis().set_visible(False)
plt.title(title)
plt.xlabel(xlabel)
Finally I want a labeled scatter plot
def scatter_labels(dataframe, labels):
plt.figure()
data = dataframe[['Unique Words', 'Rhyme Score']].values
all_labels = dataframe.index.values
plt.scatter(data[:, 0], data[:, 1])
for label, x, y in zip(all_labels, data[:, 0], data[:, 1]):
if label in labels:
plt.annotate(
label,
xy=(x, y), xytext=(-15, 15),
textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.7),
arrowprops=dict(arrowstyle='-', connectionstyle='arc3,rad=0'),)
plt.xlabel('Unique Words')
plt.ylabel('Rhyme Score')
Putting this all together we have
adf = pd.read_csv('/home/ed/Documents/Code/Blog/Scripts/Genuis/artist_ranks.csv')
adf.index = adf['Unnamed: 0']
del adf['Unnamed: 0']
adf.index.name = 'Artist'
labels = ['Aesop Rock', 'Jam Baxter', 'Busdriver', 'Eminem', '2Pac', 'The Notorious B.I.G.', 'Future',
"Lil\' Wayne", 'Drake', 'Billy Woods', 'MF Doom', 'Kool Keith', 'Fetty Wap', '2 Chainz', 'Danny Brown', 'J. Cole']
scatter_labels(adf, labels)
word_data = adf['Unique Words'].values
word_dist, word_x_space = getKDE(word_data, 200, 1000, 8000)
plotFit(word_dist, word_x_space, word_data, xlabel='Unique Words', color='red')
rhyme_data = adf['Rhyme Score'].values
rhyme_dist, rhyme_x_space = getKDE(rhyme_data, 100, 500, 3000)
plotFit(rhyme_dist, rhyme_x_space, rhyme_data, xlabel='Rhyme Score', color='blue')
print(adf.to_string())
plt.show()
Unique Words Rhyme Score Word Rank Rhyme Rank
Artist
Aesop Rock 6045 2541 1 1
Billy Woods 5601 1248 2 125
Busdriver 5594 1880 3 13
Milo 5360 1937 4 11
Jam Baxter 5225 2424 5 2
Roc Marciano 5073 1717 6 22
Canibus 5054 1234 7 133
GZA 5034 1700 8 26
Dan Bull 4999 2221 9 3
El-P 4828 1257 10 121
Chino XL 4806 1482 11 63
U-God 4649 1046 12 167
MF Doom 4565 1662 13 29
Kool A.D. 4535 1510 14 53
R.A. the Rugged Man 4441 1241 15 127
Lloyd Banks 4428 1140 16 151
Kool Keith 4419 1233 17 136
Greydon Square 4402 1468 18 67
Immortal Technique 4401 1335 19 102
RZA 4358 1385 20 83
Pharoahe Monch 4348 1206 21 139
Sean Price 4298 1459 22 71
Lupe Fiasco 4278 1358 23 92
Open Mike Eagle 4261 1656 24 31
Jean Grae 4214 1349 25 100
Sage Francis 4212 1569 26 43
Raekwon 4180 1233 27 136
Action Bronson 4160 1810 28 16
Watsky 4132 1713 29 23
Talib Kweli 4125 1237 30 129
Royce da 5'9_ 4090 1374 31 88
Asher Roth 4052 1545 32 45
Louis Logic 4042 1830 33 15
Joey Badass 4001 944 34 179
Jay Z 3965 1137 35 153
Frank Ocean 3960 1545 36 46
Eminem 3956 1510 37 53
Ghostface Killah 3917 1122 38 160
E-40 3905 1992 39 7
A$AP Rocky 3903 1491 40 57
Method Man 3879 1128 41 156
Denzel Curry 3875 1768 42 17
Ab-Soul 3875 1323 42 105
Schoolboy Q 3874 975 44 178
Obie Trice 3869 1378 45 86
Cage 3846 1467 46 69
Mos Def 3840 984 47 177
Crooked I 3823 1551 48 44
M.I.A. 3812 1492 49 56
Redman 3788 1354 50 95
B.o.B 3775 1479 51 64
Jay Electronica 3774 1590 52 38
Tinie Tempah 3772 1596 53 36
P.O.S. 3762 1374 54 88
Tech N9ne 3749 1422 55 77
Masta Ace 3743 1540 56 47
Jay Rock 3743 1137 56 154
Busta Rhymes 3740 1159 58 147
Common 3724 1163 59 144
Funkmaster Flex 3719 1276 60 115
Drake 3707 812 61 186
LL Cool J 3706 1017 62 170
Brother Ali 3701 1280 63 113
Mac Lethal 3696 1941 64 10
Rakim 3691 1305 65 109
Dr. Dre 3690 1005 66 172
Rick Ross 3682 1608 67 35
Nas 3669 1407 68 80
Kendrick Lamar 3667 1173 69 142
Lowkey 3663 1268 70 117
Astronautalis 3645 1397 71 81
Xzibit 3619 1440 72 75
Big Pun 3608 1159 73 146
Wrekonize 3604 1524 74 51
Chris Webby 3592 1352 75 97
MF Grimm 3591 1257 76 121
Prozak 3577 1005 77 172
Akala 3559 1393 78 82
Danny Brown 3555 1732 79 18
Giggs 3551 1323 80 105
Sho Baraka 3547 1452 81 73
The Notorious B.I.G. 3545 913 82 182
Devlin 3543 1539 83 48
Joell Ortiz 3540 1459 84 71
Killer Mike 3537 1646 85 33
Dizzee Rascal 3536 1711 86 24
Slick Rick 3535 1354 87 95
Mick Jenkins 3530 1264 88 119
Lil' Kim 3517 1529 89 49
Macklemore 3511 1155 90 148
Vince Staples 3504 1285 91 112
The Game 3482 1234 92 133
Ice-T 3481 1122 93 160
Ol' Dirty Bastard 3459 1234 94 133
Kanye West 3429 861 95 185
Jarren Benton 3418 1719 96 21
Wale 3416 1485 97 60
Big Daddy Kane 3414 1126 98 157
Murs 3402 1488 99 58
Sir Mix-a-Lot 3397 1576 100 42
2 Chainz 3388 2053 101 5
A$AP Ferg 3375 1494 102 55
Foxy Brown 3375 892 102 183
Lil Wayne 3365 1680 104 27
Classified 3354 1371 105 90
Eyedea 3350 1412 106 79
Oddisee 3334 1619 107 34
Del the Funky Homosapien 3327 1220 108 138
Tyler, The Creator 3325 1200 109 140
Tyga 3323 1264 110 119
KRS-One 3322 773 111 187
Bubba Sparxxx 3321 1311 112 108
Machine Gun Kelly 3311 1529 113 49
Mac Miller 3305 1477 114 65
J. Cole 3291 1163 115 144
Future 3288 1417 116 78
Travis Scott 3286 1000 117 174
Kevin Gates 3283 1089 118 165
Beastie Boys 3276 1468 119 67
Yelawolf 3273 1974 120 8
Big L 3270 1583 121 41
Nicki Minaj 3268 1255 122 123
Ice Cube 3266 1668 123 28
Childish Gambino 3249 1280 124 113
French Montana 3223 1264 125 119
Will Smith 3207 1140 126 151
Twista 3202 1705 127 25
Missy Elliott 3194 766 128 188
Big K.R.I.T. 3182 1126 129 157
Gucci Mane 3180 1953 131 9
Ace Hood 3180 1488 131 58
Jadakiss 3180 1300 131 111
Chamillionaire 3159 1519 133 52
Waka Flocka Flame 3141 1466 134 70
Hopsin 3141 1116 134 162
Birdman 3131 1358 136 92
Bizzy Bone 3115 934 137 180
Young Jeezy 3110 1662 138 29
Brotha Lynch Hung 3110 1592 138 37
Joe Budden 3100 1168 140 143
Ja Rule 3100 1014 140 171
Skepta 3096 1440 142 75
Kid Cudi 3080 1146 143 149
Krizz Kaliko 3079 1475 144 66
Dizzy Wright 3079 1183 144 141
Ludacris 3077 1351 146 98
Young Thug 3072 1482 147 62
Jme 3071 1317 148 107
Eazy-E 3064 1902 149 12
Benzino 3056 1116 150 162
Scarface 3022 1725 151 20
Coolio 3008 1349 152 100
Big Sean 2993 998 153 176
Wiley 2992 1880 154 13
Kid Ink 2982 1452 155 73
Freddie Gibbs 2972 1651 156 32
Flo Rida 2966 1245 157 126
Nelly 2951 1020 158 169
Pimp C 2945 1381 159 84
CeeLo Green 2907 1123 160 159
Master P 2879 1356 161 94
Twisted Insane 2854 2031 162 6
Meek Mill 2847 1381 163 84
Chief Keef 2845 1237 164 129
Eve 2845 1077 164 166
Bun B 2826 1371 166 90
T.I. 2826 1333 166 103
Juicy J 2781 1585 168 40
2Pac 2773 1586 169 39
Angel Haze 2755 1026 170 168
Gangsta Boo 2747 1237 171 129
50 Cent 2745 1305 172 109
Boosie Badazz 2740 1272 173 116
Snow Tha Product 2733 1254 174 124
Grieves 2722 2104 175 4
Kodak Black 2711 1349 176 100
Lil Uzi Vert 2705 1233 177 136
Logic 2689 934 178 180
Krayzie Bone 2686 883 179 184
T-Pain 2678 1137 180 154
Chris Brown 2605 1237 181 129
Too Short 2602 1485 182 60
Sean Paul 2523 1732 183 18
Wiz Khalifa 2523 1323 183 105
Rich Homie Quan 2496 1374 185 88
DMX 2464 1143 186 150
R. Kelly 2459 999 187 175
Fetty Wap 2359 1108 188 164