Find Closest Float In Array For All Floats In Another Array
Solution 1:
Kd-tree is really overkill here, all you need to do is sort the array and use binary search to find the closest value in the sorted array. I wrote an answer a while back about how to use searchsorted
to find the closet value to a target in an array. You can use the same idea here:
import numpy as np
def find_closest(A, target):
#A must be sorted
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return idx
def random_data(shape):
# Generate some random data.
return np.random.uniform(0., 10., shape)
def main(data, target):
order = data[2, :].argsort()
key = data[2, order]
target = target[(target >= key[0]) & (target <= key[-1])]
closest = find_closest(key, target)
return data[:, order[closest]]
N1 = 1500
array1 = random_data((3, N1))
array2 = random_data(1000)
array2[[10, 20]] = [-1., 100]
array4 = main(array1, array2)
Solution 2:
If you have SciPy, a scipy.spatial.cKDTree
can do the job:
import numpy
import scipy.spatial
array1 = numpy.array(list1)
array2 = numpy.array(list2)
# A tree optimized for nearest-neighbor lookup
tree = scipy.spatial.cKDTree(array1[2, ..., numpy.newaxis])
# The distances from the elements of array2 to their nearest neighbors in
# array1, and the indices of those neighbors.
distances, indices = tree.query(array2[..., numpy.newaxis])
array4 = array1[:, indices]
k-d trees are designed for multidimensional data, so this might not be the fastest solution, but it should be pretty darn fast compared to what you have. The k-d tree expects input in the form of a 2D array of points, where data[i]
is a 1D array representing the i
th point, so the slicing expressions with newaxis
are used to put the data into that format. If you need it to be even faster, you could probably do something with numpy.sort
and numpy.searchsorted
.
If you need to reject data from list2
that falls outside the range of values given by list1[2]
, that can be accomplished by a preprocessing step:
lowbound = array1[2].min()
highbound = array1[2].max()
querypoints = array2[(array2 >= lowbound) & (array2 <= highbound)]
distances, indices = tree.query(querypoints[..., numpy.newaxis])
Post a Comment for "Find Closest Float In Array For All Floats In Another Array"