Co-occurrance network?
Is it possible to create a co-occurrance network with Nodebox 3 from a csv file? I am working with text and not numerical data. I have been using Cytoscape and Gephi, but Nodebox looks very interesting!
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by john on 25 Jun, 2016 05:40 AM
Hi Rob,
Yes, you definitely could create a co-occurrance network with Nodebox. NodeBox would have some advantages and disadvantages for such a project.
First, it would be best if your csv file already has individual terms isolated and organized by neighborhood (or whatever) so that you can easily filter them. In theory you could build up organized tables from raw text directly in NodeBox, but doing so would be painful and inefficient.
In my experience, NodeBox can easily handle csv files with thousands of terms, but would choke or slow to a crawl at 100,000 terms; it's not a big data tool. But for making individual charts I assume this would not be an issue.
The main thing about using NodeBox for this purpose would be you would have to start from scratch. There is no pre-existing co-occuance network library (that I know of) to plug and play. This will require a little more initial work but would also give you much more flexibility and control, allowing you to experiment with new and better ways of visualizing a particular dataset.
It is very easy in NodeBox to suck in a few hundred words from a csv file, render each word in the font and size of your choice, draw each word inside its own circle, set the size of each circle, color the circles any way you want, position the circles on the page, and connect them with lines that could be solid or dotted or colored in a variety of ways.
I think the biggest challenge you would face would be automatically positioning those circles in a pleasing and meaningful way. NodeBox does not have a D3-style force layout algorithm built in. And it would be difficult to define your own force layout algorithm because NodeBox doesn't do recursion or L-system generation. You could easily scatter the circles at random, but minimizing (or even detecting) overlap would be tricky.
So you need to think about how to automatically calculate an X,Y position for each term in a more deterministic way. Once you devise the algorithm, implementing it should be straightforward. It would be even easier if the positions were already coded in (or could be easily derived from) your csv file.
One strong advantage of NodeBox is that you can produce high-quality PDFs or animations with elements that can vary in size across seven orders of magnitude. I regularly create 12,000 by 9,000 scale PDFs which produce nice printouts and allow deep zooming to see fine details. It is particularly nice to pan across and zoom into such diagrams on an iPad.
The thing I love about NodeBox is the way it lets you fluidly experiment with ideas, shaping them like clay and instantly seeing the effects of any changes you make. And all of this without writing a single line of code in the normal sense. There is a bit of a learning curve but it's a lot of fun - and the results can be quite impressive.
Good luck! If you decide to give NodeBox a try please don't hesitate to ask for help or share your cool creations.
John
2 Posted by Rob Boss on 25 Jun, 2016 01:43 PM
Hello John,
I would be very happy if I could duplicate the zip map example on https://www.nodebox.net/node/documentation/using/data-visualization.html#a-zipmap-example.
I followed the instructions and duplicated the node configuration in the screenshots, but to no avail. My csv is pretty simple: a Source column and a Target column.
Attached are example files if you want to take a look. (The actual csv file will be about 1100 rows with two columns.) When I try to render the combine node nothing happens
Rob
3 Posted by Rob Boss on 25 Jun, 2016 02:42 PM
Thank you, John. I will keep experimenting with Nodebox.
Support Staff 4 Posted by john on 18 Aug, 2016 11:10 AM
Hi Rob,
Your post 2 just appeared in my inbox, timestamped 2:15 AM August 18 2016. But when I come to this thread, the same comment is timestamped June 25. Did you post this comment last June or just now?
I took a quick look at your example file and found a number of mistakes that explain why you were not getting any output. This is not surprising; that zipmap example is very hard to follow. I'm not on the staff, but if I were I would clean it up.
Are you trying to create a diagram where each source number is randomly positioned and surrounded by its target strings? Do you still need help or did you figure it out on your own?
John
5 Posted by Rob Boss on 18 Aug, 2016 11:50 AM
Hi John,
I posted the comment on June 25.
Yes, if each source number could be randomly positioned and surrounded by its target strings, that would be great. I did not figure it out.
Thanks,
Rob
Support Staff 6 Posted by john on 19 Aug, 2016 09:24 PM
Hi Rob,
Sorry I didn't see your comment in June.
I am attaching a (zipped) network which can create co-occurrance rings from a csv file with two columns, Source and Target. I simplified the confusing example you were trying to follow; you don't actually need a zipmap for this case.
As you can see from the attached screenshot, the network is fairly simple with just 12 nodes. One of these, make_ring, is a subnetwork of 8 additional nodes; I have outlined it in red. Make_ring takes the csv file and, for each Source value, produces a ring of associated words.
The rest of the main network just calculates a random position for each distinct source value and then draws both the source number and the associated word ring at those positions.
When you replace the csv file with your 1100-row file, you will have to make some adjustments so that all the words don't land on top of each other.
- You can increase the overall area of your chart by increasing the height and width of the rect node; if you want to save your chart as a PNG or PDF, be sure to also increase the document size (Document Properties under the File menu).
- The scatter node uses a seed to randomize the positions. You can step through different seed values until you find one that reduces overlaps.
- If it's too hard to find a scattering pattern that doesn't overlap rings, you might consider using a grid node instead to place the rings in evenly-spaced rows and columns. Add a slice node set to the total number of source values if the grid produces more positions than you need.
You can adjust the size and density of the rings by making several other adjustments inside the make_ring subnetwork (control or right click on the node and choose "Edit Children"):
- In the textpath node you and adjust the font size of the words and increase the diameter of each ring by increasing the first (x) position value.
- Adjust the value in the multiply node to increase or decrease how tightly the words wrap around the center of the ring.
I hope this helps. Good luck!
John
7 Posted by Rob Boss on 19 Aug, 2016 09:29 PM
Thank you for your help, John. This is great!
-Rob