What’re you eating? Part 3

10 July

So from my last post it was pretty clear that the recipe Optical Character Recognition (OCR) hadn’t gone particularly well. It was slow (doing 1 image at a time is a lot of work and I’m a lazy guy) and the accuracy left a lot to be desired - so I decided to look for an alternative. From my research, Google’s Cloud Vision API would likely be more accurate and be able to chew through all 50+ recipe photos with one click, allowing me to perform my favourite pastime - set-and-forget.

Now it wasn’t as simple as downloading a python library (didn’t stop me crossing my fingers regardless) but the documentation is thorough which let me get there in the end. I had to set up an account, set up a Google Cloud Project, enable billing (they give you some free uses - don’t worry), enable the API, install the CLI which let me initialise something or other and then finally I was able to write my script. I’ll admit that while the documentation was thorough, it was long enough that my attention span betrayed me and so I stopped paying attention to the point/effect of each step in the process, causing it to start to blend together. Obviously I got it working, but it just means that the above ‘guide’ isn’t as detailed as I’d have liked - if I have time I might write a proper explanation!

Onto the python! Now they were nice enough to include an example of a working python script, so I plugged in my first image and looked through the output to get a sense of what the data might look like. Let me tell you - I wasn’t prepared for the level of data it captured. At a glance, the json output had:

A large block of text with formatting codes in-line
Each string captured (space-separated)
Each character captured

Along with each string/character, it gave the coordinates of its 4 corners to show the space the software defines as content - it was just too much for what I wanted (I had visions of sifting through hundreds of thousands of lines of output if I pointed it at all my recipes) so I decided I’d focus on just the text block and work out the formatting myself. With a bit of trial-and-error to work out the boundaries of this block, I came to this solution:

from google.cloud import vision
import os

client = vision.ImageAnnotatorClient()
directory = 'E:\\Documents\\recipes'
recipe_list = []

for filename in os.listdir(directory):
    if filename.endswith('.jpg'):
        with open(os.path.join(directory, filename), "rb") as image_file:
            content = image_file.read()
        image = vision.Image(content=content)
        response = client.text_detection(image=image)
        recipe_list.append({"name": filename, "description": str(response).split('locale: "en"')[1].split('bounding_poly')[0]})
        if response.error.message:
            raise Exception(
                "{}\nFor more info on error messages, check: "
                "https://cloud.google.com/apis/design/errors".format(response.error.message))

with open(os.path.join(directory, "recipes"), "w") as recipe_file:
    for element in recipe_list:
        recipe_file.write(str(element["description"].encode('utf8')))

Put simply, it goes through each jpeg file in my recipes directory and performs the OCR, splitting the output several times (I found that each text block started with ‘locale: “en”’ and ended with ‘bounding_poly’ so I used those to isolate the block) and then saves that in a dictionary along with the filename (for error-checking). I then go through that list of dictionaries and save the outputs into a file for me to process. It was coming up with a few failed characters that were breaking things, so I specified UTF-8 encoding to overcome the errors and decided I’d fix the broken characters later. Essentially, what happened here was certain characters the computer can understand don’t translate properly to characters that a text-editing application can understand (in this case it was superscript fractional measurements that’re quite common in recipes). By specifying the encoding, you can think of it as me telling the script to not translate problem characters - leaving them in a unicode format for me to decipher later.

The result for the flatbread was much more pleasing:

EASY FLATBREAD\\nMAKES: 6 I PREP: 15 MINUTES I COOK: 15 MINUTES\\nYou never knew you could make such incredible flatbreads so easily. And no yeast either! Use for all your flatbread needs,\\nincluding Greek Chicken Gyros (page 24), Chicken Shawarma (page 46) and lunch wraps. I also use these flatbreads as naan\\nfor dunking into curries when I don’t have time to make naan the proper way with yeast.\\n50 g unsalted butter\\n0xC20xBE cup (185 ml) milk\\n2 cups (300 g) plain flour, plus\\n2-3 tbsp extra for dusting\\n0xC20xBD tsp cooking salt*\\nPut the butter and milk in a heatproof jug and microwave for 1 minute or until the butter is\\nmelted. (Or do this on the stove over medium heat.)\\nPut the flour and salt in a bowl and pour in the milk mixture. Mix with a wooden spoon until\\nit mostly comes together into a shaggy dough.\\nSprinkle a work surface with half the extra flour, then turn the dough out. Knead for 3 minutes\\nuntil it becomes a smooth dough. Add extra flour if it's too sticky (but try to keep the flour to\\na minimum, otherwise the flatbread will be dry). Shape into a ball, put back in the bowl, cover\\nwith plastic wrap and leave on the counter for 30 minutes.\\nSprinkle another work area with a bit of extra flour. Cut the dough into six pieces and roll them\\ninto balls with your hands. Roll each ball out into 20 cm wide circles, about 2-3 mm thick.\\nHeat a medium non-stick frying pan over high heat. Cook one flatbread at a time for\\n1 1/2 minutes on the first side until it puffs up dramatically and the underside has lots of\\ngolden splotches. Flip and cook the other side for 45 seconds to 1 minute until the\\nunderside has golden spots and it puffs up again.\\nTransfer to a clean tea towel and loosely wrap the flatbread to keep it warm. This also makes the flatbread soft (rather than crispy), which is what we want. Repeat with the\\nremaining flatbreads

So I’ve got a great way to do the bulk of the work, but there’s still work to be done - and unfortunately the only way I saw the next part happening was manually.

Thanks for taking the time to follow along! In my next blog I’ll go through my approach to formatting all this data!

Patrick Wagner

What’re you eating? Part 3

What’re you eating? Part 4

What're you eating? Part 2