What’re you eating? Part 12
I’ve spent the last however long trawling through all 50+ of my OCR’d recipes to set them up properly and I’ve finally finished!
As I outlined in the last entry, I identified 4 sections of my recipes: metadata, ingredients, notes, and steps. I’d decided to use tables to organise the recipes rather than paragraphs and had already used a bit of python to programmatically make most of the changes involved in that swap. The problem I now had, though, was that the information I had didn’t fit neatly and there was no guarantee that the text itself was accurate. This meant, unfortunately, that I had to start going through each recipe, line by line, to make sure it all made sense and looked good.
I already had a basic structure for the body laid out:
<h2>[Heading]</h2> <p>[recipe details]</p> <div id="ingredients"> <table> <tr> <td class="ingredientNo"></td> <td class="ingredientDetail"></td> </tr> </table> </div> <div id="notes"> <table> <tr> <td class="noteNo"></td> <td class="noteDetail"></td> </tr> </table> </div> <div id="steps"> <table> <tr> <td class="stepNo"></td> <td class="stepDetail"></td> </tr> </table> </div>
By separating the number from the detail, I was allowing myself to potentially format them differently to make them stand out more, but I was also helping my future self to scan the recipes with a script to pull out information. This was going to be less relevant for the notes and steps, but for the ingredients, it was really important that I be able to compare the amounts of each to what I have in the pantry.
Now I have the 4 sections laid out and I’m confronted with the data formatting issue. Steps were easy: I was able to just split on sentences and that was pretty much dead on (like, 90%) so that didn’t require much work. The same went for the notes but with a bit less accuracy. What made it really difficult was the ingredients and the huge variation within.
I’d initially tried to separate out ingredients by line, but there were some that spanned several lines. On top of this, I saw the following sorts of things:
finely chopped ginger
carrot, thinly sliced
Stock (or cooking wine if you have it)
1/2 large garlic clove
The list goes on but you get the idea - a bit of a mess and I had to tackle it all manually.
I eventually decided I needed to split the ingredients out again, now reflecting:
Quantity
Ingredient
Prep detail
This will allow any future script to generate a shopping list using just the first 2 columns.
After that, I got to work - over a dozen hours spent fixing all of those entries up to a standard I was happy with, but then there was another problem - the table cells looked ridiculous.
The problem being that the cells were expanding to the longest cell in the column, meaning that it only took one long title or ingredient to push everything to the right. To combat this, I added a class to the headings (previously it was just a row with a single cell) and added the following to the CSS:
white-space: nowrap; - to the table
font-weight: bold; - to the heading
width: 1%; - to the 1st & 2nd cells