Python - Gini coefficient calculation using Numpy -

i'm newbie, first of all, started learning python , i'm trying write code calculate gini index fake country. i've came following:

gdp = (653200000000) = (0.49 * gdp) / 100 # poorest 10% b = (0.59 * gdp) / 100 c = (0.69 * gdp) / 100 d = (0.79 * gdp) / 100 e = (1.89 * gdp) / 100 f = (2.55 * gdp) / 100 g = (5.0 * gdp) / 100 h = (10.0 * gdp) / 100 = (18.0 * gdp) / 100 j = (60.0 * gdp) / 100 # richest 10%  # divide quintiles , total income within each quintile q1 = float(a + b) # lowest quintile q2 = float(c + d) # second quintile q3 = float(e + f) # third quintile q4 = float(g + h) # fourth quintile q5 = float(i + j) # fifth quintile  # calculate percent of total income in each quintile t1 = float((100 * q1) / gdp) / 100 t2 = float((100 * q2) / gdp) / 100 t3 = float((100 * q3) / gdp) / 100 t4 = float((100 * q4) / gdp) / 100 t5 = float((100 * q5) / gdp) / 100  tr = float(t1 + t2 + t3 + t4 + t5)  # calculate cumulative percentage of household income h1 = float(t1) h2 = float(t1+t2) h3 = float(t1+t2+t3) h4 = float(t1+t2+t3+t4) h5 = float(t1+t2+t3+t4+t5)  # magic! using numpy calculate area under lorenz curve. # problem might here? import numpy np  numpy import trapz  # y values. cumulative percentage of incomes y = np.array([q1,q2,q3,q4,q5])  # compute area using composite trapezoidal rule. area_lorenz = trapz(y, dx=5)  # calculate area below perfect equality line. area_perfect = (q5 * h5) / 2  # seems work fine until here.  # manually calculated gini using values given areas above  # turns out @ .58 seems reasonable?  gini = area_perfect - area_lorenz  # prints utter nonsense. print gini

the result of gini = area_perfect - area_lorenz makes no sense. i've took out values given area variables , did math hand , came out ok, when try program it, gives me ??? value (-1.7198...). missing? can point me in right direction?

thanks!

stardust.

your problem isn't numpy.trapz; 1) definition of perfect equality distribution, , 2) normalization of gini coefficient.

first, had defined perfect equality distribution q5*h5/2, half product of fifth quintile's income , cumulative percentage (1.0). i'm not sure number meant represent.

second, have normalize area under perfect equality distribution; i.e.:

gini = (area under perfect equality - area under lorenz)/(area under perfect equality)

you don't have worry if define perfect equality curve have area of 1, it's safeguard in case there's error in definition of perfect equality curve.

to address both of these issues, defined perfect equality curve numpy.linspace. first advantage of can use real distribution's properties define same way. in other words, whether use quartiles or quintiles or deciles, perfect equality cdf (y_pe, below) have right shape. second advantage computing area done numpy.trapz well, bit of parallelism makes code easier read , guards against erroneous calculations.

import numpy np matplotlib import pyplot plt numpy import trapz  gdp = (653200000000) = (0.49 * gdp) / 100 # poorest 10% b = (0.59 * gdp) / 100 c = (0.69 * gdp) / 100 d = (0.79 * gdp) / 100 e = (1.89 * gdp) / 100 f = (2.55 * gdp) / 100 g = (5.0 * gdp) / 100 h = (10.0 * gdp) / 100 = (18.0 * gdp) / 100 j = (60.0 * gdp) / 100 # richest 10%  # divide quintiles , total income within each quintile q1 = float(a + b) # lowest quintile q2 = float(c + d) # second quintile q3 = float(e + f) # third quintile q4 = float(g + h) # fourth quintile q5 = float(i + j) # fifth quintile  # calculate percent of total income in each quintile t1 = float((100 * q1) / gdp) / 100 t2 = float((100 * q2) / gdp) / 100 t3 = float((100 * q3) / gdp) / 100 t4 = float((100 * q4) / gdp) / 100 t5 = float((100 * q5) / gdp) / 100  tr = float(t1 + t2 + t3 + t4 + t5)  # calculate cumulative percentage of household income h1 = float(t1) h2 = float(t1+t2) h3 = float(t1+t2+t3) h4 = float(t1+t2+t3+t4) h5 = float(t1+t2+t3+t4+t5)  # y values. cumulative percentage of incomes y = np.array([h1,h2,h3,h4,h5])  # perfect equality y values. cumulative percentage of incomes. y_pe = np.linspace(0.0,1.0,len(y))  # compute area using composite trapezoidal rule. area_lorenz = np.trapz(y, dx=5)  # calculate area below perfect equality line. area_perfect = np.trapz(y_pe, dx=5)  # seems work fine until here.  # manually calculated gini using values given areas above  # turns out @ .58 seems reasonable?  gini = (area_perfect - area_lorenz)/area_perfect  print gini  plt.plot(y,label='lorenz') plt.plot(y_pe,label='perfect_equality') plt.legend() plt.show()

WIKI

Search This Blog

Python - Gini coefficient calculation using Numpy -

Comments

Post a Comment