What’re you eating? Part 4

13 July

From where we left off, I was facing off against a bunch of text blocks that needed formatting that looked like this:

EASY FLATBREAD\\nMAKES: 6 I PREP: 15 MINUTES I COOK: 15 MINUTES\\nYou never knew you could make such incredible flatbreads so easily. And no yeast either! Use for all your flatbread needs,\\nincluding Greek Chicken Gyros (page 24), Chicken Shawarma (page 46) and lunch wraps. I also use these flatbreads as naan\\nfor dunking into curries when I don’t have time to make naan the proper way with yeast.\\n50 g unsalted butter\\n0xC20xBE cup (185 ml) milk\\n2 cups (300 g) plain flour, plus\\n2-3 tbsp extra for dusting\\n0xC20xBD tsp cooking salt*\\nPut the butter and milk in a heatproof jug and microwave for 1 minute or until the butter is\\nmelted. (Or do this on the stove over medium heat.)\\nPut the flour and salt in a bowl and pour in the milk mixture. Mix with a wooden spoon until\\nit mostly comes together into a shaggy dough.\\nSprinkle a work surface with half the extra flour, then turn the dough out. Knead for 3 minutes\\nuntil it becomes a smooth dough. Add extra flour if it's too sticky (but try to keep the flour to\\na minimum, otherwise the flatbread will be dry). Shape into a ball, put back in the bowl, cover\\nwith plastic wrap and leave on the counter for 30 minutes.\\nSprinkle another work area with a bit of extra flour. Cut the dough into six pieces and roll them\\ninto balls with your hands. Roll each ball out into 20 cm wide circles, about 2-3 mm thick.\\nHeat a medium non-stick frying pan over high heat. Cook one flatbread at a time for\\n1 1/2 minutes on the first side until it puffs up dramatically and the underside has lots of\\ngolden splotches. Flip and cook the other side for 45 seconds to 1 minute until the\\nunderside has golden spots and it puffs up again.\\nTransfer to a clean tea towel and loosely wrap the flatbread to keep it warm. This also makes the flatbread soft (rather than crispy), which is what we want. Repeat with the\\nremaining flatbreads

The main problem came down to standardisation. Between authors and even within books, there was no way I could programmatically separate out these sections, so it had to be manual - there was just too much variety. I could see the following elements that I wanted to separate out:

Title
Blurb (I don’t want/need this)
Ingredients
Instructions
Notes (not every recipe has these)

I had all of the recipes in a single document which was going to make the process a bit easier, so I threw myself into it. My first step was to separate out the different elements, so I opened it in my preferred editor, Visual Studio Code (VSCode) and got to work, yielding this (deleting the blurbs and making some little changes as I noticed them):

EASY FLATBREAD
MAKES: 6 I PREP: 15 MINUTES I COOK: 15 MINUTES
50 g unsalted butter\\n0xC20xBE cup (185 ml) milk\\n2 cups (300 g) plain flour, plus\\n2-3 tbsp extra for dusting\\n0xC20xBD tsp cooking salt*
Put the butter and milk in a heatproof jug and microwave for 1 minute or until the butter is\\nmelted. (Or do this on the stove over medium heat.)\\nPut the flour and salt in a bowl and pour in the milk mixture. Mix with a wooden spoon until\\nit mostly comes together into a shaggy dough.\\nSprinkle a work surface with half the extra flour, then turn the dough out. Knead for 3 minutes\\nuntil it becomes a smooth dough. Add extra flour if it's too sticky (but try to keep the flour to\\na minimum, otherwise the flatbread will be dry). Shape into a ball, put back in the bowl, cover\\nwith plastic wrap and leave on the counter for 30 minutes.\\nSprinkle another work area with a bit of extra flour. Cut the dough into six pieces and roll them\\ninto balls with your hands. Roll each ball out into 20 cm wide circles, about 2-3 mm thick.\\nHeat a medium non-stick frying pan over high heat. Cook one flatbread at a time for\\n1 1/2 minutes on the first side until it puffs up dramatically and the underside has lots of\\ngolden splotches. Flip and cook the other side for 45 seconds to 1 minute until the\\nunderside has golden spots and it puffs up again.\\nTransfer to a clean tea towel and loosely wrap the flatbread to keep it warm. This also makes the flatbread soft (rather than crispy), which is what we want. Repeat with the\\nremaining flatbreads

Just a quick note of some of the little formatting quirks - just from what you can see here, 2 things stand out. The first is that I have no idea how much cooking salt or milk I need. Now if you remember I chose the UTF-8 encoding to get around this very error - and this unicode string is represented by 2 hexidecimal nibbles (half a byte is a nibble, haha - get it? Wow, these computer people sure do have a sense of humour). Now there are a million ways to convert this string to something more human-friendly (ascii format) but I like to use CyberChef, an awesome data manipulation tool released to the public by GCHQ.

CyberChef working its magic — I can see here that it actually meant 3/4 and the other code is 1/2

Now that I know this, I can ‘Find & Replace’ (F&R) to quickly change out any fractions I may come across (eg. 0xC20xBE > 3/4).

The second quirk I noticed was in this line:

MAKES: 6 I PREP: 15 MINUTES I COOK: 15 MINUTES

Not a huge difference, but the OCR software had interpreted a pipe character as a capitalised ‘i’ (| vs I) not a big deal, but it could cause me problems in the future, so a quick F&R on this too and I’m all set to start properly formatting here.

So now I had slightly more standardised elements to deal with, I can start to potentially script the formatting. I’ve already got an HTML format that I’m happy with, so I can just work on getting the data into that template and I’ll be fine. I did have a decision to make though - did I want a new HTML page for each recipe, or did I want a single HMTL page that used PHP scripting to access a database, holding all the recipe information that would populate as I loaded the page. There are pros and cons of each - method 1 takes up more space (slightly) but is easier to make and I liked the idea of being able to share individual recipe pages, but method 2 (once it’s set up) is a bit of a shortcut. Storing unformatting text and leaving instructions for the server to format it for me. Ultimately I’ve decided to go for method 1, but that may change in the future as my plans/requirements change.

So I get my template set up:

<!DOCTYPE html>
<html>
  <head>
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <style>
    .collapsible {
        background-color: #777;
        color: white;
        cursor: pointer;
        padding: 18px;
        width: 100%;
        border: none;
        text-align: left;
        outline: none;
        font-size: 15px;
    }
    .content {
        padding: 0 18px;
        display: none;
        overflow: hidden;
        background-color:#f1f1f1;
    }
    </style>
  </head>
  <body>
    split_here
    <button type="button" class="collapsible">INGREDIENTS</button>
    <div class="content">
      split_here
    </div>
    <p></p>
    <button type="button" class="collapsible">NOTES</button>
    <div class="content">
      split_here
    </div>
    split_here
    <script>
        var coll = document.getElementsByClassName("collapsible");
        var i;
        for (i = 0; i < coll.length; i++) {
          coll[i].addEventListener("click", function() {
            this.classList.toggle("active");
            var content = this.nextElementSibling;
            if (content.style.display === "block") {
              content.style.display = "none";
            } else {
              content.style.display = "block";
            }
          });
        }
    </script>
  </body>
</html>

Here I’ve defined the classes ‘collapsible’ and ‘content’ and outlined the formatting I want to see for each. I also define the buttons for ingredients and notes and at the end I’ve also defined the script that will make the expandable/collapsible elements work as intended. Between these elements, I included “split_here” to help my python script choose the correct points to insert information. Then I can go about writing a python script that will tackle my big recipe document and sort the values as I want.

file_in = "E:\\Documents\\recipes\\recipes"
file_out = 'E:\\Documents\\recipes\\recipe_dicts'
file_template = 'E:\\Documents\\recipes\\recipe_template.html'
element_list = []
recipe_list = []

with open(file_in, 'r') as infile:    #opens the recipe document
    for line in infile:
        if line == "\n":    #detects if a recipe is finished - a new recipe is marked with an empty line
            try:    #when a recipe is loaded into the list, go through the elements and add them to a dictionary which is then appended to the main list
                recipe_list.append({'title': element_list[0], 'stats': stats, 'ingredients': element_list[2], 'steps': element_list[3], 'notes': element_list[4]})
            except:    #if the above doesn't work, then there aren't any notes for the recipe and it will be marked
                recipe_list.append({'title': element_list[0], 'stats': stats, 'ingredients': element_list[2], 'steps': element_list[3], 'notes': 'none'})
            element_list = []    #resetting the recipe element list
        else:
            element_list.append(line)    #loads the recipe elements into the list

with open(file_out, 'w') as outfile:    #saves the list of dictionaries to disk as a cache
    for element in recipe_list:
        outfile.write(str(element))
        outfile.write("\n")

for element in recipe_list:
    notes_exist = False    #assumes there are no notes unless told otherwise
    #defining recipe-specific HTML code to be inserted into our webpage
    title = '<h2>' + element['title'].replace('\n', '') + '</h2>'    
    stats = '<p>' + element['stats'].replace('\\n', '\n') + '</p>'
    ingredients = element['ingredients'].split('\\\\n')
    steps = element['steps'].replace('\\\\n', '').split('.')
    
    if element['notes'] != 'none':    #if there are notes, add them
        notes = element['notes'].split('\\\\n')
        notes_exist = True
    #defines the HTML page name
    file_recipe = 'E:\\Documents\\recipes\\webpages\\' + element['title'].lower().replace(' ', '_').replace('\n', '') + ".html"

    with open(file_template, "r") as recipe_file_in:    #splits the default page up - ready for insertion
        old = recipe_file_in.read()
        split_list = old.split('split_here')
        pre_title = split_list[0]
        pre_ingredients = split_list[1]
        pre_notes = split_list[2]
        pre_steps = split_list[3]
        pre_end = split_list[4]

    with open(file_recipe, 'w') as recipe_file_out:    #the following is sewing the webpage back together
        recipe_file_out.write(pre_title)
        recipe_file_out.write(title)
        recipe_file_out.write("\n")
        recipe_file_out.write(stats)
        recipe_file_out.write(pre_ingredients)
        for element in ingredients:
            recipe_file_out.write('<p>')
            if element.isupper():    #tries to identify headings
                element = '<b>' + element + '</b>'
            recipe_file_out.write(element)
            recipe_file_out.write('</p>')
            recipe_file_out.write("\n")
        if notes_exist == True:
            recipe_file_out.write(pre_notes)
            for element in notes:
                recipe_file_out.write('<p>')
                if element.isupper():
                    element = '<b>' + element + '</b>'
                recipe_file_out.write(element)
                recipe_file_out.write('</p>')
                recipe_file_out.write("\n")
        recipe_file_out.write(pre_steps)
        x = 1
        for element in steps[:-1]:    #adds numbers to the recipe steps
            recipe_file_out.write('<p>')
            recipe_file_out.write('<b>' + str(x) + '. </b>' + element)
            recipe_file_out.write('</p>')
            recipe_file_out.write("\n")
            x += 1
        recipe_file_out.write(pre_end)

So what we have here (excuse the formatting) is a script that will go through easy line of the recipe output - remembering that each line is a different element of the recipe and an ‘empty’ line indicates a recipe is over - and groups all the recipe elements together in a format that the script can better interpret. After grouping everything with labels and caching the result, the script then goes about defining what the new code should look like, before splitting the template apart and sewing the alterations into the gaps (with some formatting fixes).

This leaves us with a directory full of HTML pages, each with a different recipe. Each one has a expandable sections for ingredients and notes and numbered steps. These still need to be gone through with a fine-tooth comb to correct any further formatting mistakes, but that doesn’t need a blog!

Even though now that I’ve got a collection of recipes (and a good method to get more), I’m not putting it aside completely. Here’s what else I think I want to do with them:

create a database of ingredients/amounts needed, as well as how many serves and timings
add the ability to update notes
make the recipes adaptable to different quantities

But for now, I want to put that aside and work on my pantry database - I want to be able to account for all the food I have and when it expires!

Patrick Wagner

What’re you eating? Part 4

Is it cold outside? Part 1

What’re you eating? Part 3