Basic string replacement issue
I'm trying to make a simple chart that shows my finances, and when I look at the CSV I have from the bank I noticed that a lot of the descriptions are different for effectively the same thing. As an example, a payment made to a Tesco store won't just be labelled as Tesco, it'll be labelled as 'TESCO STORES 2288', where the number changes depending on the store. I'd like to just group those together into one item, and I have managed to do this by using a contains node with a switch node (set up into a subnetwork) so that if it contains 'Tesco' it simply replaces the description with just 'Tesco'. That works fine, but I'm now finding others the same, so Asda, Aldi etc (disclaimer - many other supermarkets are available for shopping!) which I'd like to do the same with. I can set up manually each set of contains/switch subnetwork and that works but it's cumbersome and it occurred to me that I should be able to simply make a text list of the keywords using the 'make_strings' node and then loop through them to replace the description with the relevant word that is found using the contains node. But I'm not getting very far. As soon as I start passing in a list to compare against I'm ending up with duplicates etc, although it does appear to be doing some replacement too. I've tried all sorts of subnetwork settings of value/list on the inputs but I will admit I haven't quite got my head around quite how that works yet (it's feels sort of like dealing with MySQL, but with a bit less control). Is there a way to loop over one list against each single value from another, resulting in a one to one relationship with the original list count? I've attached my saved file containing a working manual setup (the left nodes) against my rather obscure setup where I tried to get the subnetworks to work how I want unsuccessfully (I've probably completely overcomplicated this!).
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by john on 16 Nov, 2023 07:44 PM
Darren,
Interesting problem. I'll take a look at it ASAP.
But first, looks like you forgot to attach your zipped folder with the CSV files. If you could send those (or some equivalent with no sensitive information) it would save me the trouble of creating test files.
John
Support Staff 2 Posted by john on 16 Nov, 2023 09:11 PM
Darren,
I went ahead and made my own simple test table and name list.
The key insight here is that you only want to make one replacement at a time. That is, you may have a store name containing "Tesco" and store name containing "Asda", but not a store name containing both at the same time. (And if you ever did, what would you want to do?)
So now the solution is to break the problem into two steps:
Find which (if any) matches exist
Replace that match
The attached demo shows a simplify node which does this (see screenshot)
It takes three inputs:
Note that the simplify node takes the data one row at a time (value), but takes the name list all at once (list). This means it will fire once for each row in the main table. It produces a full row with each fire, so the output is the complete main table with all values in the key field simplified.
You can peer inside the simplify node to see how I did it. I used my change_col node from the Cartan Node Library to replace the entire row each time.
If the store name is short you might get false matches. Names like "Aldi" might appear inside longer words. It depends on your data. If this is a problem, you could add more logic to the contains test (e.g. only find separate words).
Please try this simplify node on your data and let me know if it works!
John
3 Posted by Darren Crabb on 20 Nov, 2023 04:46 PM
Thanks John, it looks like you've totally cracked what I was trying to do (I did spend quite a bit longer after my post trying to attempt it, to no avail!) . I shall spend some time looking over your nodes to see exactly what you've done there. Apologies for not attaching my files, but I don't think they're of any relevance now. It did occur to me that it would probably be easier to manipulate the CSV in the first place from Excel, but that's cheating isn't it! Of course I could also do something in javascript that would do the same, but I just wanted to see what could be done from within Nodebox.
Support Staff 4 Posted by john on 20 Nov, 2023 08:54 PM
Darren,
You are most welcome.
I actually prefer Nodebox to Excel for data munging tasks like this. Once you get used to it, you can set up networks to do things rather quickly, and the networks are easy to modify and reuse for other projects. Debugging in Nodebox is often much easier than debugging in Excel or VBA or Javascript, especially with multi-step processes.