Home » Python » python – choosing bandwidth&linspace for kernel density estimation. (why my bandwidth doesn't work?)-Exceptionshub

python – choosing bandwidth&linspace for kernel density estimation. (why my bandwidth doesn't work?)-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

I have followed this link for the application of kernel density estimation. My aim is creating two different groups/clusters or more for an array group. The below code works for every members of array group except this array:

X = np.array([[77788], [77793],[77798], [77803], [92886], [92891], [92896], [92901]])

So my expectation is seeing two different clusters such as:

first_group = ([[77788], [77793],[77798], [77803]])

second_group = ([[92886], [92891], [92896], [92901]])

I have a dynamic list, so I can not fix a value for linspace. Because this array may be 0to 10 or 100000 to 2000000. That’s why I have put max and min points of the array in the linspace.

After all, I could not obtain different clusters even though I tried various bandwidths. My code can be seen below:

a = X.reshape(-1,1)
kde = KernelDensity(kernel='gaussian', bandwidth=8).fit(a)
s = linspace(min(a),max(a))
e = kde.score_samples(s.reshape(-1,1))
plot(s, e)

enter image description here

mi, ma = argrelextrema(e, np.less)[0], argrelextrema(e, np.greater)[0]
print("Minima:", s[mi])  # output: []
print("Maxima:", s[ma])  # output: []

s[mi] and s[ma] values are empty which means there is no two different clusters for this array. In the visualization can be seen that we have at least one minimum point. why can not be seen this value for the s[mi] output?

And I applied the same code for different bandwidths which can be seen below, however, there is no minimum or maximum values for this cluster. so any idea what am I doing wrong?

bandwidth=0.008

enter image description here

bandwidth = 0.00002

enter image description here

How to&Answers:

Try a bandwidth of 10000, or try relying on heuristics for choosing the bandwidth.

To make your code more robusty also split clusters at consecutive minima. Because your problem is that there is no unique minimum here, but an interval.