Skip to content Skip to sidebar Skip to footer

Find Closest Float In Array For All Floats In Another Array

I have a performance issue while 'filtering' an array according to the closest float found in another array. This is a MWE of the problem: import numpy as np def random_data(N):

Solution 1:

Kd-tree is really overkill here, all you need to do is sort the array and use binary search to find the closest value in the sorted array. I wrote an answer a while back about how to use searchsorted to find the closet value to a target in an array. You can use the same idea here:

import numpy as np

def find_closest(A, target):
    #A must be sorted
    idx = A.searchsorted(target)
    idx = np.clip(idx, 1, len(A)-1)
    left = A[idx-1]
    right = A[idx]
    idx -= target - left < right - target
    return idx

def random_data(shape):
    # Generate some random data.
    return np.random.uniform(0., 10., shape)

def main(data, target):
    order = data[2, :].argsort()
    key = data[2, order]
    target = target[(target >= key[0]) & (target <= key[-1])]
    closest = find_closest(key, target)
    return data[:, order[closest]]

N1 = 1500
array1 = random_data((3, N1))
array2 = random_data(1000)
array2[[10, 20]] = [-1., 100]

array4 = main(array1, array2)

Solution 2:

If you have SciPy, a scipy.spatial.cKDTree can do the job:

import numpy
import scipy.spatial

array1 = numpy.array(list1)
array2 = numpy.array(list2)

# A tree optimized for nearest-neighbor lookup
tree = scipy.spatial.cKDTree(array1[2, ..., numpy.newaxis])

# The distances from the elements of array2 to their nearest neighbors in
# array1, and the indices of those neighbors.
distances, indices = tree.query(array2[..., numpy.newaxis])

array4 = array1[:, indices]

k-d trees are designed for multidimensional data, so this might not be the fastest solution, but it should be pretty darn fast compared to what you have. The k-d tree expects input in the form of a 2D array of points, where data[i] is a 1D array representing the ith point, so the slicing expressions with newaxis are used to put the data into that format. If you need it to be even faster, you could probably do something with numpy.sort and numpy.searchsorted.

If you need to reject data from list2 that falls outside the range of values given by list1[2], that can be accomplished by a preprocessing step:

lowbound = array1[2].min()
highbound = array1[2].max()

querypoints = array2[(array2 >= lowbound) & (array2 <= highbound)]
distances, indices = tree.query(querypoints[..., numpy.newaxis])

Post a Comment for "Find Closest Float In Array For All Floats In Another Array"