Trying to create a Sankey diagram

Rob van Wees's Avatar

Rob van Wees

03 Nov, 2017 10:39 PM

Hi all,

I am trying to create a Sankey diagram, showing the flow of items between categories. I got as far as the attached image

The left ("from") and right ("to") side items are sorted differently. Trying to link them by their ID (in red on either end) is giving me some rather unexpected results :)

Would anyone know how to link bars with the same ID (in red text in the image)? I have attached the ndbx and csv files.

Thanks in advance,

Rob.

  1. Support Staff 1 Posted by john on 04 Nov, 2017 03:22 AM

    john's Avatar

    Hi Rob,

    Is this what you are trying to do? Screenshot and network attached.

    Sankey diagrams are tricky because you have to establish all the anchor points before you link them. But by then it's hard to figure out which anchor goes to which.

    The top part of my network draws the overall diagram. You can use the green "horizontal" and "vertical" nodes to adjust the spacing. My anchor subnetwork groups one or more anchors for each city and the join subnetwork joins labels and anchors so each label is properly centered against its anchor(s).

    Once the whole thing is done and centered on the page, I ungroup it all back into its constituent parts. Since the anchors are rectangles I can pull them out of this soup by culling every shape that does not have exactly four corners. I now have a list of all the anchors. There will be 2*N where N is the number of links.

    The first N shapes will be in the leftmost anchors in the order I drew them from top to bottom. To find the order of the next N anchors I make a little table with two columns, id and slice. ID is the link ID (Nr in this case). Slice is an offset based on the order the anchors were drawn on the right. By taking IDs from the left side and looking up the right slice offsets in this table I can pull out the rightmost anchors in the correct order.

    From there it's just a matter of feeding the left and right slices of the anchor list into the link node. I colorize the left anchors and then extract that color and use it color each link.

    Please play with the network and let me know if this solves your problem. This should work with any CSV you feed it, even if there are a different number of van and naar cities.

    John

  2. 2 Posted by Rob van Wees on 04 Nov, 2017 08:49 AM

    Rob van Wees's Avatar

    Hi John,

    That's exactly it!

    When trying to work this out, I figured that the lookup would automatically "sort" the Zipmap, but it doesn't, apparently.

    I'll have a go at your solution with the full data set (and see if I actually understand it :).

    Thanks!

    Rob.

  3. 3 Posted by Rob van Wees on 07 Nov, 2017 12:29 AM

    Rob van Wees's Avatar

    Hi John,

    Based on your solution, I found another. I couldn't find an easy way to add labels, and I wanted the black rectangles left and right to be visible.

    Then I thought of recombining the data to fit my needs, rather than trying to figure it out in positioning the graphic elements. The resulting node network is larger, but (for me, at least) easier to understand and to fiddle with. Have a look :)

    Cheers,

    Rob.

  4. Support Staff 4 Posted by john on 07 Nov, 2017 02:45 AM

    john's Avatar

    Rob,

    Very nice! I was pleased to see that you not only made ample use of my make_table subnetwork, but used merge_tables as well.

    This general approach makes a lot of sense. Build up one master table that has everything you need and then drive placement of elements directly from that.

    I would love to see your final product, with more data, if you are able to share it.

    John

  5. 5 Posted by Rob van Wees on 07 Nov, 2017 10:43 PM

    Rob van Wees's Avatar

    I did :)

    After that last version, I noticed the reindexing didn't do what I expected: since the table values are all Strings, link index "110" got sorted before link index "12". I fiddled with the make_table subnet some more and I now have a version with numeric index values by converting the List1 value to a float before the combine. Works like a charm.

    Next up: getting the placement of the labels on left- and righthand side to match the groups of links. (Best ignore the righthand side, it is completely off at this stage ;)

    I started this chart for a colleague who is doing some analysis on the Dutch criminal court system. The case is about how the execution of sanctions from a criminal court case gets transferred between regions (i.e. the case is prosecuted in Amsterdam, sentence execution is in Rotterdam). At the moment, the data is fake, he is trying to get actual figures. I'll ask him if I can share the eventual graph.

    When discussing the solution with my colleague, we also discussed doing the complete data recombination step outside NodeBox (i.e. in Excel or SPSS), and feed in a ready-made dataset. That would cut away the node network from after the CSV import up to and including the merge-table node.

    Cheers,

    Rob.

  6. 6 Posted by Rob van Wees on 09 Nov, 2017 12:57 AM

    Rob van Wees's Avatar

    Further update:

    With some heavy data massaging outside of Nodebox, I managed a version with subtotals and properly aligned labels.

    Next up: moving all of the subtotalling to NodeBox, with the hint I got from John. I'll put up the nodebox file when I get it working :)

    Cheers,

    Rob.

  7. Support Staff 7 Posted by john on 09 Nov, 2017 01:28 AM

    john's Avatar

    Coming along nicely, Rob!

    You can get rid of all those ".0"s in the subtotals by running them through an integer node. Makes for a cleaner look.

    Another thing you could do is sort the left anchors within the links leaving from each "from" city in descending order of size. That way the biggest color band from each city is always on top and the smaller ones are arranged beneath in order. This creates a more obvious separation between each city. Looks cleaner and is much easier to understand.

    Depending on how your overall network is designed you might be able to do this by adding another sort node (on amount) to the table which drives everything else. That is, sort links first by city name and then by amount. To do this you feed the table first into a sort by amount, then feed that into a sort by name.

    I look forward to seeing the final result!

    John

  8. 8 Posted by Rob van Wees on 09 Nov, 2017 09:34 PM

    Rob van Wees's Avatar

    Thanks, John :)

    In this version, I have all of the data manipulation back in NodeBox, one clean data file required. I also got the no-zeroes and sorting-by-size things in, they do help indeed.

    Along the way, I converted (or, perhaps, subverted ;) the subtotals subnetwork to handle last/min/max values. Check the network in the zip file.

    This is about as far as I'm going to take this version. John, thanks for your help and advice, much appreciated!

    @the Esteemed Colleague who gave me this challenge (and who is also checking this thread): get the latest version below! :)

    Said Esteemed Colleage also came up with an interesting next-level challenge: while a linear layout like this one works well for small data sets (i.e. small number of cities, not too many case transfer connections), with larger datasets, a circular layout might be more appropriate.

    So: onwards to a circular layouts it is. I'll try and post updates as I go along.

    Cheers,

    Rob.

  9. Support Staff 9 Posted by john on 09 Nov, 2017 10:05 PM

    john's Avatar

    Hi Rob,

    I've enjoyed helping you with this. Thanks for sharing the evolution of your project. I think it has been educational for everyone.

    One thing: you said you got the link sorting by size in place, but in your screenshot the links still show as unsorted. Amsterdam's 3rd largest departing link, size 75, shows at the bottom of that group. A 75-size link departing Den Haag appears before the main 500-size link. And so forth.

    Is that really your final version or is the screenshot not up to date? Getting this right really does matter for the readability of the chart. With all the thin link lines of similar colors its hard to tell where Limburg ends and Midden-Nederland begins. Also having them in order means you can easily lump all the small fry together at the bottom of each group.

    Good luck with your circular Sankey. NodeBox is particularly good at those and you've already solved most of the trickiest problems, so that project should go much faster.

    (Circular Sankey's are all the rage these days, but I don't care for them much myself. They are pretty, but I find them very hard to read.)

    I look forward to seeing more of your work.

    John

  10. 10 Posted by Rob van Wees on 09 Nov, 2017 10:52 PM

    Rob van Wees's Avatar

    Hi John,

    You are right, well spotted. The sort is actually in the network, but obviously not in the right place :).

    As I manually assign positions to the links, rather than use stacking, the sort (actually: sort by number of cases, reverse, sort by city name) needs to happen before calculating positions. However, if I move the sort up to the top of the tree (where the first 'Van' sort happens), I get some rather funky behavior: the links properly come out in descending 'aantal' order, but the left hand labels and black rectangles suddenly grow vertically, sticking out below the links, check the attached image.

    I'll dig a bit deeper, I'm expecting a logic/reasoning error somewhere. I'll see if I can fix it.

    Cheers,

    Rob.

    EDIT - sorting the right-hand side by 'aantal' | reverse | 'city name' as well makes the right hand side labels grow below the links as well. Very weird...

  11. 11 Posted by Rob van Wees on 10 Nov, 2017 01:48 PM

    Rob van Wees's Avatar

    Yup: logic errors...

    It took a few fixes throughout the network, but I managed to get the sorts and the side bars working as expected. Both sides are now properly sorted.

    The resulting image has some left-over weirdness from (I think) float-int conversions if you look closely at the alignment between the black side bars and the colored links. I'm going to leave those in. If anyone wants to have a go at that: be my guest. (Please do let me know how you fixed it :)

    Proper-latest version of the network attached.

    Cheers,

    Rob.

  12. 12 Posted by Rob van Wees on 10 Nov, 2017 05:33 PM

    Rob van Wees's Avatar

    Perhaps I could (or should) have left it at that, but...

    ...then I thought: "how would I go about walking through the subsets in the graph..."

    Attached a walkthfough-by-frame. Frames 0-19 walk you through the left- and righthand sides of the graph, after that, it shows the full graph.

    Now I really think I'm done :)

    Cheers,

    Rob.

  13. 13 Posted by Rob van Wees on 10 Nov, 2017 05:38 PM

    Rob van Wees's Avatar

    From all of this, I also have one for the feature-request list: can we please have "decorations" in the node tree? They do grow quite large quite easily, having decorations would help.

    With "decorations", I mean:

    • Labels to document what is going on at some location in the tree
    • Frames (with titles) to show groups of nodes that belong together (nodes in a frame should move when you move the frame)
    • Color-coded nodes

    Empty frames pretty much work as labels, so I'd consider "labels" a lower-priority nice-to-have. Perhaps coloring is already possible, I saw a green node in one of John's network snippets, but I couldn't figure out how he did that.

    As an example of the sort of frames I am thinking of: check out Blender and its built-in node editor (www.blender.org). It works very similar to NodeBox, but generally looks nicer :)

    Cheers,

    Rob.

  14. 14 Posted by Rob van Wees on 11 Nov, 2017 12:43 AM

    Rob van Wees's Avatar

    First try on the circular one.

    Not sure yet if this is actually going to add to the storytelling, though.

    While I was busy, I made it so you can flip through the data set by frame.

    Rob.

  15. Support Staff 15 Posted by john on 11 Nov, 2017 05:42 AM

    john's Avatar

    Rob,

    That was fast. A few thoughts...

    • It looks like your anchors are all about the same size, so the links are all about the same size. This obscures much of the data. You either need to create separately sized anchors for each link, or if you're already doing this adjust your multipliers so 200-case links look ten times thicker than 20-case links. It's OK if the anchors overlap at the city points. If need be you can make the anchors invisible and/or cover them up with a dot.
    • Instead of showing every link, it might be more revealing to show only the NET flow between cities. So if Rotterdam sends 250 cases to Zeeland and Zeeland only sends 10 cases back to Rotterdam don't draw both links; just draw a link of 240 cases from Rotterdam to Zeeland (with the link colored for Rotterdam). This will cut the total number of links in half and make a much more interesting and revealing pattern.
    • The city names on the eastern half of your circle are upside down. This is a familiar problem. I created a fix for it here: http://support.nodebox.net/discussions/show-your-work/213-direction...

    Hope you find that helpful!

    John

  16. 16 Posted by Esteemed Collea... on 11 Nov, 2017 11:26 AM

    Esteemed Colleague's Avatar

    Thanks to Rob, and John aswell! Nice collaboration. Though the data is made up, it is what I think will be sort of the outcome when we get to the real data.

    This experiment was for me to see how Nodebox works and what it needs as input. As discussed with Rob I think the data preparation should be done outside nodebox, be it Excel, Spss, R etc. In this case quite a lot is done in Nodebox.

    Rob, the 10-test looks like the beginning of the more dynamic/interactive version we discussed! Nice to see.

    Considering the circular Sankey. The idea was that a) there are a fairly small amount of from-to options, and b) there is only a small portion leaving the region. So what if we plot the remain (the biggest chunk) on the outside of the ring, and use the inside of the ring for the network . It might give a clearer picture.

    Great little project, thanks for your help!

  17. Support Staff 17 Posted by john on 11 Nov, 2017 01:34 PM

    john's Avatar

    I couldn't help myself...

    Attached is my first draft of a circular Sankey following the suggestions I made above. I just show net flows, vary link width, and fixed the upside-down labels. I scrambled and brightened the color to make it easier to distinguish the links.

    For the city labels I tried following the circle instead of extending the labels like rays. Still needs a little work. This works OK for ten cities, but you would have to switch back to rays if you have hundreds of cities.

    One thing I should have mentioned, Rob, is the link node doesn't work well with circular Sankeys. The remedy is to use quad curves instead. To make that work you have to do a little extra work to make sure all the link lines bend inward.

    One other minor adjustment: the net flow from Rotterdam to Zeeland was so big relative to the other flows that I had to place an upper limit on link size. You might want to play with that.

    I leave it to you to follow the suggestion of your esteemed colleague and use the outside of the ring to show the "remain". By remain I assume you mean the number of cases each city keeps to itself. You could show that by varying the dot size for each city or by extending proportional bars outward from each city dot.

    My draft still needs a little polishing, but is far enough along to convey what I meant by only showing the net flows. Screenshot and network attached.

    John

  18. 18 Posted by Rob van Wees on 11 Nov, 2017 09:08 PM

    Rob van Wees's Avatar

    Hi John,

    Wow, that looks a lot more like what I had in mind than my own version :)

    As I said, the image I posted was a fist draft which took me about 30 minutes to set up. I made it to see if/how the concept works. It uses quad curves, links (indeed) just look weird in the circular layout.

    I was just about to go experiment some more when I read your post. I had also noticed the orientation problem with the city names, thanks for the solution to that!

    Thanks for your example, I'm going to have a go at it.

    Cheers,

    Rob.

  19. 19 Posted by Rob van Wees on 12 Nov, 2017 10:50 PM

    Rob van Wees's Avatar

    Variation on a theme :)

    I took the 'cases not transferred' out of the linked part in the middle and put them to the left of each of the cities. This shortens the graph vertically and accomodates a larger data set.

    It has a slight problem with the black bars on right-hand side, probably still has a logic error somewhere. Also, sizing of the new bars is inconsistent with the link size, having the bars at the same relative height messes up the graph as each city keeps more cases than it transfers.

    This is the 'static' version, not culled by frame; I can't yet get the new bars to show only the city viewed in a frame (like in the 10a version).

    Will post if I fix these items.

    Cheers,

    Rob.

  20. 20 Posted by Rob van Wees on 12 Nov, 2017 11:08 PM

    Rob van Wees's Avatar

    Oops. I had forgotten the right-hand labels...

  21. 21 Posted by Rob van Wees on 22 Jun, 2018 08:10 PM

    Rob van Wees's Avatar

    Hi again! :)

    Been a while since my last endeavor with NodeBox. I've moved on to a new project and found a nice 'follow-on' challenge: make the 2-column sankey I did for my Esteemed Colleague into a multicolumn version.

    I managed to get it to 3 columns with little effort, adding more should be pretty much trivial. I'm still working on label placement, re-sorting the columns (see ndbx file) throws off the vertical alignment.

    See you all later (when I figure it out :),

    Rob.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Already uploaded files

  • Link-order-error.png 12.5 KB
  • Link-order-error.zip 13.2 KB

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

10 Dec, 2018 01:47 PM
05 Dec, 2018 09:37 AM
02 Dec, 2018 07:40 AM
29 Nov, 2018 07:38 AM
20 Nov, 2018 11:21 PM