Phone Scrubber
Frontend VS Backend
Most people would jump to making a nextjs app or a sveltekit app. I chose to use HTMX + Django and the results have been pretty good. I only need reactivity for the table updating as the file goes through the process queue and then when finished create a download link for the newly created file. Why have hundreds of node modules for that?
I know that I'm biased towards using the more old-school tech but hear me out. It has a proven track record of being maintainable, stable and scalable without some of the more strange issues you get with node and larger JS frameworks in general. Onward to what was built...
Fan-In Files, pipe to throttled API queue, create download file
There you go, the app. But seriously, dictionaries in Python can be used safely in multithreaded situations assuming that each thread has it's own unique key into that dict. Thus I made a top level dictionary with each user id as a key, and each user has a document upload queue inside it.
What really happens is the file gets processed when uploaded, certain info gets persisted to the DB then the ID of the uploaded file get enqueued as to not cause a ruckus.
The API Queue is also Top level and thread safe but there is only one, the API we call is restricted to 10 requests / sec. So it has to be done in a way where this process is being checked and ran if the queue has an ID passed to it. Always in order because even if we all upload a file, we can only parse one phone at a time.
Obviously you will track duplicates. If you have already had an API call for the number, just reuse the data, save the call for a different phone. Otherwise, let it rip.
Background tasks
I tend to prefer the standard library for things. Just using a single background thread that has been daemonized does the trick of handling the API queue and the file upload queues are also good to go as they trigger anytime an upload happens.
There can still be zombie-phones in the database though. So to circumvent that there is a cron job on the server that calls every so often to enqueue phones that have been flagged for a queue yet have not been called in the API, thus allowing for future duplicates to get parsed. These phones can only get zombied if something crazy happens, like the API server died somehow or our server went down. I'm sure there are other reasons but my brain can't think of it right now.