What’re you eating? Part 13

I’ve been thinking a lot recently about how I’m actually going to enter products into the system and how likely I’ll be to use it. My local supermarkets are only a 10 minute walk away, but after shopping and carrying back the weekly shop to the apartment, the last thing I feel like doing is putting it away, let alone typing in all the fields that I’ve outlined in the form I made earlier. I talked to a few people about it and the consensus was that it’d be a barrier to them using it.

Would I use it because I’m stubborn and I’ve sunk too much time into this project not to? Probably

Would others? Probably not

Since a major part of this project is making something that improves my life, I don’t want to be forcing myself to use it. - so it needs a redesign.

I thought long and hard about different processes and realised that I could frontload a bunch of the data entry. Why would I leave it up to a tired Pat to enter the data in? What if I end up with the same product listed multiple times because of any of the following:

  • Misspelt the name

  • Included/left out an apostrophe

  • Included/left out the brand

  • Got the barcode wrong

You get the idea - it wasn’t the best way to go about things, but I realised that I could enter the details in once and then I could just pick them off a list and update the date. So with that, I had to look carefully at the data I was storing. Currently the table has this sort of information:

id| name        | size | size_type | expiry_date | expiry_type | barcode
1 | Plain Flour | 2    | kg        | 05-12-2023  | bb          | 9300676510454 

So I went through and identified which were going to stay the same and saw that, barring shrinkflation, only the date would - not even the expiry_type field. Then I split the tables out into the following:

products
barcode       | name        | size | size_type | expiry_type | brand
9300676510454 | Plain Flour | 2    | kg        | bb          | White Wings

current_stock
id | barcode       | expiry_date
1  | 9300676510454 | 05-12-2023

By splitting the tables out, I’m able to include more information, such as the brand, which could potentially help down the line in some way. I can join the tables together based on the barcode when I want all the information displayed to me, but for data entry I don’t actually need that, so this is the best solution. I actually decided to make a third table while I’m on a defining-spree:

usage
barcode       | usage
9300676510454 | 1.6 kg

I’m not 100% sure how I’ll use and update this table in practice, but in my read right now I’m thinking that after I mark a recipe as made, then I can just update the table to increase usage of a particular item to help with alerts for restocking

Now I have to fill the new products table - and it’ll be another slog, I can already tell. I actually looked online to see if I could get a dump of barcodes and their corresponding product information. You’ll be shocked to know 2 things:

  • No one who’s taken the time to collate barcodes well is offering it freely

  • The scope for barcode collisions is incredibly high

I tried thieving my way to victory - I identified 2 websites that held barcodes (or EANs) and I tried to scrape them. They had various methods to thwart this including:

  • Limiting the number of requests in a certain time

  • Structuring the website so that pages weren’t easily iterated

  • Requiring JavaScript to be run to display any information

I had mild success against the first 2 mitigations with regular scraping - I had the script wait a random time (around 10 seconds) between requests and one of the sites was a more simple structure that I managed to partially work through - but not so much against the 3rd. When the results didn’t quite scratch my itch, I turned to Selenium.

Put simply - Selenium automates web browsers. You can make it open a browser, go to a page, click around - everything those pesky captchas don’t like me doing. Using this I was able to programmatically access these sites and strip the data to process later, which worked really well until I looked at the data. Remember what I said before about collating the barcodes well? It turns out these sites crowdsource their data - leading to wild inconsistencies in the entry of it, meaning that there was no size, brands were wrong, there were spelling mistakes, etc. So after all that - I had to go through my pantry and do it all myself.

I remembered that just out of high school, I’d tried to help my mum make her shop more efficient. They had a cash register and that was it - which made stocktakes and accounting an horrific process. I bought this handheld scanner from Datalogic and tried to improve things, but it didn’t take and the scanner went into storage.

My first thought was “oh my god, that was a decade ago” and my second was more about the odds of me finding this random piece of tech after any length of time. Surprisingly easy to find, it was even more surprising that it worked perfectly - handily it inserts a new line character after each scan which makes data entry so easy.

The process was pretty straight-forward - scan the item, type in the details, move onto the next item. It was laborious, but I managed to scan in about 120 different grocery items around my place. Now I won’t have to enter them again and if I open this system up to family/friends, then I’ll be able to have a central database and everyone’s contributions will make everyone’s lives easier!

Stay tuned for the next entry, where I work out the next part of this data entry revolution - a live search!

Next
Next

What’re you eating? Part 12