Trying to create a Sankey diagram
Hi all,
I am trying to create a Sankey diagram, showing the flow of items between categories. I got as far as the attached image
The left ("from") and right ("to") side items are sorted differently. Trying to link them by their ID (in red on either end) is giving me some rather unexpected results :)
Would anyone know how to link bars with the same ID (in red text in the image)? I have attached the ndbx and csv files.
Thanks in advance,
Rob.
- Link-order-error.png 12.5 KB
- Link-order-error.zip 13.2 KB
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by john on 04 Nov, 2017 03:22 AM
Hi Rob,
Is this what you are trying to do? Screenshot and network attached.
Sankey diagrams are tricky because you have to establish all the anchor points before you link them. But by then it's hard to figure out which anchor goes to which.
The top part of my network draws the overall diagram. You can use the green "horizontal" and "vertical" nodes to adjust the spacing. My anchor subnetwork groups one or more anchors for each city and the join subnetwork joins labels and anchors so each label is properly centered against its anchor(s).
Once the whole thing is done and centered on the page, I ungroup it all back into its constituent parts. Since the anchors are rectangles I can pull them out of this soup by culling every shape that does not have exactly four corners. I now have a list of all the anchors. There will be 2*N where N is the number of links.
The first N shapes will be in the leftmost anchors in the order I drew them from top to bottom. To find the order of the next N anchors I make a little table with two columns, id and slice. ID is the link ID (Nr in this case). Slice is an offset based on the order the anchors were drawn on the right. By taking IDs from the left side and looking up the right slice offsets in this table I can pull out the rightmost anchors in the correct order.
From there it's just a matter of feeding the left and right slices of the anchor list into the link node. I colorize the left anchors and then extract that color and use it color each link.
Please play with the network and let me know if this solves your problem. This should work with any CSV you feed it, even if there are a different number of van and naar cities.
John
2 Posted by Rob van Wees on 04 Nov, 2017 08:49 AM
Hi John,
That's exactly it!
When trying to work this out, I figured that the lookup would automatically "sort" the Zipmap, but it doesn't, apparently.
I'll have a go at your solution with the full data set (and see if I actually understand it :).
Thanks!
Rob.
3 Posted by Rob van Wees on 07 Nov, 2017 12:29 AM
Hi John,
Based on your solution, I found another. I couldn't find an easy way to add labels, and I wanted the black rectangles left and right to be visible.
Then I thought of recombining the data to fit my needs, rather than trying to figure it out in positioning the graphic elements. The resulting node network is larger, but (for me, at least) easier to understand and to fiddle with. Have a look :)
Cheers,
Rob.
Support Staff 4 Posted by john on 07 Nov, 2017 02:45 AM
Rob,
Very nice! I was pleased to see that you not only made ample use of my make_table subnetwork, but used merge_tables as well.
This general approach makes a lot of sense. Build up one master table that has everything you need and then drive placement of elements directly from that.
I would love to see your final product, with more data, if you are able to share it.
John
5 Posted by Rob van Wees on 07 Nov, 2017 10:43 PM
I did :)
After that last version, I noticed the reindexing didn't do what I expected: since the table values are all Strings, link index "110" got sorted before link index "12". I fiddled with the make_table subnet some more and I now have a version with numeric index values by converting the List1 value to a float before the combine. Works like a charm.
Next up: getting the placement of the labels on left- and righthand side to match the groups of links. (Best ignore the righthand side, it is completely off at this stage ;)
I started this chart for a colleague who is doing some analysis on the Dutch criminal court system. The case is about how the execution of sanctions from a criminal court case gets transferred between regions (i.e. the case is prosecuted in Amsterdam, sentence execution is in Rotterdam). At the moment, the data is fake, he is trying to get actual figures. I'll ask him if I can share the eventual graph.
When discussing the solution with my colleague, we also discussed doing the complete data recombination step outside NodeBox (i.e. in Excel or SPSS), and feed in a ready-made dataset. That would cut away the node network from after the CSV import up to and including the merge-table node.
Cheers,
Rob.
6 Posted by Rob van Wees on 09 Nov, 2017 12:57 AM
Further update:
With some heavy data massaging outside of Nodebox, I managed a version with subtotals and properly aligned labels.
Next up: moving all of the subtotalling to NodeBox, with the hint I got from John. I'll put up the nodebox file when I get it working :)
Cheers,
Rob.
Support Staff 7 Posted by john on 09 Nov, 2017 01:28 AM
Coming along nicely, Rob!
You can get rid of all those ".0"s in the subtotals by running them through an integer node. Makes for a cleaner look.
Another thing you could do is sort the left anchors within the links leaving from each "from" city in descending order of size. That way the biggest color band from each city is always on top and the smaller ones are arranged beneath in order. This creates a more obvious separation between each city. Looks cleaner and is much easier to understand.
Depending on how your overall network is designed you might be able to do this by adding another sort node (on amount) to the table which drives everything else. That is, sort links first by city name and then by amount. To do this you feed the table first into a sort by amount, then feed that into a sort by name.
I look forward to seeing the final result!
John
8 Posted by Rob van Wees on 09 Nov, 2017 09:34 PM
Thanks, John :)
In this version, I have all of the data manipulation back in NodeBox, one clean data file required. I also got the no-zeroes and sorting-by-size things in, they do help indeed.
Along the way, I converted (or, perhaps, subverted ;) the subtotals subnetwork to handle last/min/max values. Check the network in the zip file.
This is about as far as I'm going to take this version. John, thanks for your help and advice, much appreciated!
@the Esteemed Colleague who gave me this challenge (and who is also checking this thread): get the latest version below! :)
Said Esteemed Colleage also came up with an interesting next-level challenge: while a linear layout like this one works well for small data sets (i.e. small number of cities, not too many case transfer connections), with larger datasets, a circular layout might be more appropriate.
So: onwards to a circular layouts it is. I'll try and post updates as I go along.
Cheers,
Rob.
Support Staff 9 Posted by john on 09 Nov, 2017 10:05 PM
Hi Rob,
I've enjoyed helping you with this. Thanks for sharing the evolution of your project. I think it has been educational for everyone.
One thing: you said you got the link sorting by size in place, but in your screenshot the links still show as unsorted. Amsterdam's 3rd largest departing link, size 75, shows at the bottom of that group. A 75-size link departing Den Haag appears before the main 500-size link. And so forth.
Is that really your final version or is the screenshot not up to date? Getting this right really does matter for the readability of the chart. With all the thin link lines of similar colors its hard to tell where Limburg ends and Midden-Nederland begins. Also having them in order means you can easily lump all the small fry together at the bottom of each group.
Good luck with your circular Sankey. NodeBox is particularly good at those and you've already solved most of the trickiest problems, so that project should go much faster.
(Circular Sankey's are all the rage these days, but I don't care for them much myself. They are pretty, but I find them very hard to read.)
I look forward to seeing more of your work.
John
10 Posted by Rob van Wees on 09 Nov, 2017 10:52 PM
Hi John,
You are right, well spotted. The sort is actually in the network, but obviously not in the right place :).
As I manually assign positions to the links, rather than use stacking, the sort (actually: sort by number of cases, reverse, sort by city name) needs to happen before calculating positions. However, if I move the sort up to the top of the tree (where the first 'Van' sort happens), I get some rather funky behavior: the links properly come out in descending 'aantal' order, but the left hand labels and black rectangles suddenly grow vertically, sticking out below the links, check the attached image.
I'll dig a bit deeper, I'm expecting a logic/reasoning error somewhere. I'll see if I can fix it.
Cheers,
Rob.
EDIT - sorting the right-hand side by 'aantal' | reverse | 'city name' as well makes the right hand side labels grow below the links as well. Very weird...
11 Posted by Rob van Wees on 10 Nov, 2017 01:48 PM
Yup: logic errors...
It took a few fixes throughout the network, but I managed to get the sorts and the side bars working as expected. Both sides are now properly sorted.
The resulting image has some left-over weirdness from (I think) float-int conversions if you look closely at the alignment between the black side bars and the colored links. I'm going to leave those in. If anyone wants to have a go at that: be my guest. (Please do let me know how you fixed it :)
Proper-latest version of the network attached.
Cheers,
Rob.
12 Posted by Rob van Wees on 10 Nov, 2017 05:33 PM
Perhaps I could (or should) have left it at that, but...
...then I thought: "how would I go about walking through the subsets in the graph..."
Attached a walkthfough-by-frame. Frames 0-19 walk you through the left- and righthand sides of the graph, after that, it shows the full graph.
Now I really think I'm done :)
Cheers,
Rob.
13 Posted by Rob van Wees on 10 Nov, 2017 05:38 PM
From all of this, I also have one for the feature-request list: can we please have "decorations" in the node tree? They do grow quite large quite easily, having decorations would help.
With "decorations", I mean:
Empty frames pretty much work as labels, so I'd consider "labels" a lower-priority nice-to-have. Perhaps coloring is already possible, I saw a green node in one of John's network snippets, but I couldn't figure out how he did that.
As an example of the sort of frames I am thinking of: check out Blender and its built-in node editor (www.blender.org). It works very similar to NodeBox, but generally looks nicer :)
Cheers,
Rob.
14 Posted by Rob van Wees on 11 Nov, 2017 12:43 AM
First try on the circular one.
Not sure yet if this is actually going to add to the storytelling, though.
While I was busy, I made it so you can flip through the data set by frame.
Rob.
Support Staff 15 Posted by john on 11 Nov, 2017 05:42 AM
Rob,
That was fast. A few thoughts...
Hope you find that helpful!
John
16 Posted by Esteemed Collea... on 11 Nov, 2017 11:26 AM
Thanks to Rob, and John aswell! Nice collaboration. Though the data is made up, it is what I think will be sort of the outcome when we get to the real data.
This experiment was for me to see how Nodebox works and what it needs as input. As discussed with Rob I think the data preparation should be done outside nodebox, be it Excel, Spss, R etc. In this case quite a lot is done in Nodebox.
Rob, the 10-test looks like the beginning of the more dynamic/interactive version we discussed! Nice to see.
Considering the circular Sankey. The idea was that a) there are a fairly small amount of from-to options, and b) there is only a small portion leaving the region. So what if we plot the remain (the biggest chunk) on the outside of the ring, and use the inside of the ring for the network . It might give a clearer picture.
Great little project, thanks for your help!
Support Staff 17 Posted by john on 11 Nov, 2017 01:34 PM
I couldn't help myself...
Attached is my first draft of a circular Sankey following the suggestions I made above. I just show net flows, vary link width, and fixed the upside-down labels. I scrambled and brightened the color to make it easier to distinguish the links.
For the city labels I tried following the circle instead of extending the labels like rays. Still needs a little work. This works OK for ten cities, but you would have to switch back to rays if you have hundreds of cities.
One thing I should have mentioned, Rob, is the link node doesn't work well with circular Sankeys. The remedy is to use quad curves instead. To make that work you have to do a little extra work to make sure all the link lines bend inward.
One other minor adjustment: the net flow from Rotterdam to Zeeland was so big relative to the other flows that I had to place an upper limit on link size. You might want to play with that.
I leave it to you to follow the suggestion of your esteemed colleague and use the outside of the ring to show the "remain". By remain I assume you mean the number of cases each city keeps to itself. You could show that by varying the dot size for each city or by extending proportional bars outward from each city dot.
My draft still needs a little polishing, but is far enough along to convey what I meant by only showing the net flows. Screenshot and network attached.
John
18 Posted by Rob van Wees on 11 Nov, 2017 09:08 PM
Hi John,
Wow, that looks a lot more like what I had in mind than my own version :)
As I said, the image I posted was a fist draft which took me about 30 minutes to set up. I made it to see if/how the concept works. It uses quad curves, links (indeed) just look weird in the circular layout.
I was just about to go experiment some more when I read your post. I had also noticed the orientation problem with the city names, thanks for the solution to that!
Thanks for your example, I'm going to have a go at it.
Cheers,
Rob.
19 Posted by Rob van Wees on 12 Nov, 2017 10:50 PM
Variation on a theme :)
I took the 'cases not transferred' out of the linked part in the middle and put them to the left of each of the cities. This shortens the graph vertically and accomodates a larger data set.
It has a slight problem with the black bars on right-hand side, probably still has a logic error somewhere. Also, sizing of the new bars is inconsistent with the link size, having the bars at the same relative height messes up the graph as each city keeps more cases than it transfers.
This is the 'static' version, not culled by frame; I can't yet get the new bars to show only the city viewed in a frame (like in the 10a version).
Will post if I fix these items.
Cheers,
Rob.
20 Posted by Rob van Wees on 12 Nov, 2017 11:08 PM
Oops. I had forgotten the right-hand labels...
21 Posted by Rob van Wees on 22 Jun, 2018 08:10 PM
Hi again! :)
Been a while since my last endeavor with NodeBox. I've moved on to a new project and found a nice 'follow-on' challenge: make the 2-column sankey I did for my Esteemed Colleague into a multicolumn version.
I managed to get it to 3 columns with little effort, adding more should be pretty much trivial. I'm still working on label placement, re-sorting the columns (see ndbx file) throws off the vertical alignment.
See you all later (when I figure it out :),
Rob.
22 Posted by bbarrera on 05 Sep, 2023 11:34 PM
Hello all,
Sorry to bring this from resting, But my other post seem to have been lost. and since this is the base of the code I'm using, I though it might be appropriate, but feel free to delete if it is not.
So I'm having trouble with the colors, I want to make a color in the CVS file go though the whole diagram,
for example,
Abigail to elephant, elephant to angry and angry to the elephant is angry.
Any suggestions on how I can do this?
Thankyou!
Support Staff 23 Posted by john on 06 Sep, 2023 05:51 AM
Hi Bertha!
Good to hear from you again.
If I understood you correctly, the fix was easy to make; you were 99% of the way there.
The trick is to assign the colors INSIDE your anchor subnetworks instead of just below them. If you look inside your anchors, anchors3, and anchors5 subnetworks you will see I added a color lookup after the filter and then used that to colorize the anchor rects.
This is the advantage of using tables to drive things like this. The same filter that allowed you to isolate a particular row to form the anchors also allows you to do another lookup on that same row to color them.
PNG and zipped NDBX file (with data) attached. I cleaned up your network slightly, increased the canvas size, and added a white background. Of course you can change that as you continue working on this.
Now that you have consistent colors, you may want to doublecheck all the links and labels to make sure everything is hooked up correctly. It looked to me like some of the comments on the far right weren't matching the links, and Olivia's angry giraffe doesn't link to its final comment.
Nice Sankey by the way!
John
24 Posted by bbarrera on 06 Sep, 2023 07:20 AM
Thanks John! You understood correctly!
And I do really enjoy playing with nodebox3 for my Phd. So I do keep going back to it, then I just change little things in illustrator once they are 99% completed. And been spreading the word, although some people get a little intimidated when they hear coding.
As things get published I will share here.
So for this diagram, its dummy data but I will check on my data as I place it in this, which I am using as a template.