Home » Php » php – Filter an array based on density

php – Filter an array based on density

Posted by: admin July 12, 2020 Leave a comment

Questions:

I have a sample graph like one below.., which I plotted with set of (x,y) values in an array X.

http://bubblebird.com/images/t.png

As you can see the image has dense peak values between 4000 to 5100

My exact question is can I programmatically find this range where the graph is most dense?

ie.. with Array X how can I find range within which this graph is dense?
for this array it would be 4000 – 5100.

Assume that the array has only one dense region for simplicity.

Thankful if you can suggest a pseudocode/code.

How to&Answers:

You can use the variance of the signal on a moving window.
Here is an example (see the graph attached where the test signal is red, the windowed variance is green and the filtered signal is blue) :

simple example :

test signal generation :

import numpy as np
X = np.arange(200) - 100.  
Y = (np.exp(-(X/10)**2) + np.exp(-((np.abs(X)-50.)/2)**2)/3.) * np.cos(X * 10.)

compute moving window variance :

window_length = 30 # number of point for the window
variance = np.array([np.var(Y[i-window_length / 2.: i+window_length/2.]) for i in range(200)])

get the indices where the variance is high (here I choose the criterion variance superior to half of the maximum variance… you can adapt it to your case) :

idx = np.where(variance > 0.5 * np.max(variance))

X_min = np.min(X[idx])
# -14.0
X_max = np.max(X[idx])
# 15.0

or filter the signal (set to zero the points with low variance)

Y_modified = np.where(variance > 0.5 * np.max(variance), Y, 0)

Answer:

you may calculate the absolute difference between the adjacent values, then maybe smooth things a little with sliding window and then find the regions, where the smoothed absolute difference values are at 50% of maximum value.

using python (you have python in tags) this would look like this:

a = ( 10, 11, 9, 10, 18, 5, 20, 6, 15, 10, 9, 11 )

diffs = [abs(i[0]-i[1]) for i in zip(a,a[1:])]
# [1, 2, 1, 8, 13, 15, 14, 9, 5, 1, 2]
maximum = max(diffs)
# 15
result = [i>maximum/2 for i in diffs]
# [False, False, False, True, True, True, True, True, False, False, False]

Answer:

You could use classification algorithm (for example k-means), to split data into clusters and find the most weighted cluster