backbone: Quickly determining using Python whether an image is (fuzzily) in a collection

samedi 21 février 2015

Quickly determining using Python whether an image is (fuzzily) in a collection

Image that some new image X arrives, and I want to know if X is new or has already been encountered before. I have code, below, that shrinks the image and then converts it to a hash code. I can then see via a single hash look-up if I've already encountered an image with the same hash code, so it's very fast.

My question is, is there an efficient way for me to see if a similar image, but one with a different hash code, has already been seen? If was going to title this question something like "Data structure for determining efficiently whether a similar, non-identical item is already contained" but decided that would be an instance of the XY problem.

Here's my current code:


import PIL
seen_images = {} # This would really be a shelf or something

# From http://ift.tt/1B1jNmm
def image_pixel_hash_code(image):
    pixels = list(image.getdata())
    avg = sum(pixels) / len(pixels)
    bits = "".join(map(lambda pixel: '1' if pixel < avg else '0', pixels))  # '00010100...'
    hexadecimal = int(bits, 2).__format__('016x').upper()
    return hexadecimal

def process_image(filepath):
    thumb = PIL.Image.open(filepath).resize((128,128)).convert("L")
    code = image_pixel_hash_code(thumb)
    previous_image = seen_images.get(code, None)
    if code in seen_images:
        print "'{}' already seen as '{}'".format(filepath, previous_image)
    else:
        seen_images[code] = filepath

You can put a path to a bunch of image files into a variable called IMAGE_ROOT and then try my code out with:


import os
for root, dirs, files in os.walk(IMAGE_ROOT):
    for filename in files:
        filepath = os.path.join(root, filename)
        try:                
            process_image(filepath)
        except IOError:
            pass

backbone

samedi 21 février 2015

Quickly determining using Python whether an image is (fuzzily) in a collection

Aucun commentaire:

Enregistrer un commentaire