Counting similar csv cells

o.bubendorff's Avatar

o.bubendorff

12 Feb, 2016 11:19 AM

Hi,

Is there a way to count similar strings from a csv file?
In other words, I would to automatically track the number of repetitions of each same words from a list (see attached file).

Anyone have a solution?

Thanks!

OB.

  1. Support Staff 1 Posted by john on 12 Feb, 2016 12:52 PM

    john's Avatar

    Dear Olivier,

    This is fairly easy. First use a Distinct node and a Lookup to get a list of the distinct items. Then, for each item, filter the original file for that item and do a count.

    The only tricky bit is the "for each item" part. To pull that off you need to make a very simple subnetwork. First wrap the filter and count nodes into a 2-node subnetwork. Then, in order to make sure you read one value at a time against the whole list for each count, select the subnetwork, open the MetaData dialog, set the data port to "list" and the value port to "value".

    I have attached a simple network, Fruit Counter, to demonstrate this. I made a simple csv file with two columns, "Fruit" and "Owner". There are 13 fruits (7 apples, 4 oranges, and 2 bananas). Bob owns a total of 4 fruits (3 apples and an orange), Mary 5 fruits, and Sue 4 fruits.

    You choose which field you want to count by setting the string node on top to either "Fruit" or "Owner". The Distinct and Lookup nodes apply this key to produce a list, either "Apple, Orange, Banana" or "Bob, Mary, Sue". We then feed that list of values (along with the original cvs file and the key) into the "item_counts" subnetwork.

    For "Fruit" the result (as shown in the screenshot) is 7,4,2. For "Owner" the result is 4,5,4. If need be, you could tie the output of the Lookup Node and the output of the item_counts node into a Concatenate node to produce "Apple = 7, Orange = 4, Banana = 2"

    Hope that helps!

    John

  2. Support Staff 2 Posted by john on 22 Feb, 2016 01:01 AM

    john's Avatar

    Update...

    The above item_counts subnetwork is very simple and works fine for small (<1000 row) csv files. But for larger files it may prove inefficient.

    Attached is a more advanced version of item_counts. It uses a completely different algorithm that is more complex but runs up to 100 times faster.

    NOTE: its output is different than the original version. Instead of outputting the raw counts, this node creates a mini-table: column 1 is the name of the field, column 2, "Count", contains the counts for each unique item in that field. ALSO, this table is automatically sorted from most to least frequent. This format makes it easier to compute Top 10 lists.

    Attached is the network and a screenshot showing the output against the same simple csv file from the previous example.

    If you need to get item counts from large tables, feel free to use this version instead.

    Enjoy!

    John

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Already uploaded files

  • counting_items.png 5.82 KB

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

15 Jul, 2024 07:37 PM
11 Jul, 2024 03:48 PM
28 Jun, 2024 01:51 PM
27 Jun, 2024 02:56 PM
27 Jun, 2024 01:57 PM

 

26 Jun, 2024 08:17 PM
26 Jun, 2024 11:35 AM
24 Jun, 2024 02:42 AM
24 Jun, 2024 12:46 AM
23 Jun, 2024 08:07 AM
08 Jun, 2024 09:05 AM