Amount of Vaccines assigned to Department by Lab - Circular Sankey Diagram
Hi everyone!
I want to visualize the amount of vaccines assigned to each Department by Lab, I have the idea to visualize this information in a "double circular Sankey"(See Image). I have followed this very useful post http://support.nodebox.net/discussions/nodebox-2-3/5977-trying-to-create-a-sankey-diagram shared by Rob and john, and I found a couple of challenges. 1)In the data that I have, some departments("States") have received a massive amount of doses from a given lab at one time(560.000) and others have received a minuscule amount of doses from the same given lab(170). Therefore I haven't been able to find how to keep a reasonable proportion between them, I have tried the increase the number in the divide node but the small numbers just disappear if i divide them too much. 2) I want to have two circles. One inner circle representing the labs and the amount of vaccines per lab(Might be a donut chart) and the outer circle represents each to the 27 departments. How can I make the connection between the inner circle and the outer circle happen, without duplicating the departments ?
Cheers,
Juan
- IMG_2643.jpg 1.36 MB
- Vax_by_dept_and_Lab.zip 10.5 KB
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by john on 09 Sep, 2021 02:25 AM
Juan,
Before tackling this problem you might want to take a step back and think hard about what you are trying to communicate here.
When one Sankey line is 3000 times thicker than another, there is no way to rigidly follow this proportion and make both understandable at the same time. Some possible alternatives:
You could group the smaller dose lines into an "Other" bucket instead of trying to show each dose line separately at the same time. If need be, you can add a separate donut showing how that one "Other" bucket breaks down into a dozen small departments/labs. In other words, just show the major trends in your Sankey and break the edge cases into a separate chart.
Instead of making the thicknesses directly proportional to the totals, you could bucket them into 3 or 4 categories: big, medium, and small. A key would explain that thick lines represent doses of more than 100K, small (but still visible) lines represent doses of less than 1000, and medium lines represent everything in between. This would prevent viewers from seeing the true relative sizes between different departments, but would allow them to clearly see a simpler, more fundamental pattern (e.g. that that are two major departments, a few medium size players, and all the rest negligible.). That may be all you really need to communicate; it depends on your audience and what questions they are trying to answer with your visualization.
A compromise between direct linear scaling and bucket categories is to use a logarithmic scale. Instead of dividing totals by a constant to get thickness, take the square root or (other fractional power) first. And choose settings so that the smallest value is still visible. This way you are still showing relative proportions, but the biggest lines are only 10 or 20 times thicker than the smallest, not 3000 times.
I am skeptical of your idea to wrap sankey lines from each donut wedge to all the many departments. No matter how you do this, there will be too many lines, most of them very thin, and all of them wrapping around and getting on top of each other. You will probably just make a big hairball that is impossible to understand.
Again: what questions will this visualization answer? What patterns are you trying to convey? You should play with the shape of your data and then answer these questions deeply BEFORE deciding on the best visualization technique.
Instead of a winding donut sankey hairball, you might consider a trellis instead. That is, make a chart of small multiples, with maybe a separate lab donut for each department. When you arrange these donuts in a grid with consistent coloring it makes it possible to investigate each department on its own terms while at the same time seeing larger patters between departments.
Another form of trellis would be to make a separate Sankey for each lab and simply arrange those sankeys from top to bottom from bigger labs to smaller. If you use the same thickness scale across all sankeys (maybe using logarithmic
scale) then the overall proportions will be visible at a glance but details of the smaller labs will still be discernible.
Think hard about these possibilities before you commit to one design. Then see how far you get, and if you're still stuck post your draft and I'll see if I can help.
Good luck!
John
2 Posted by juan.rozo23 on 20 Sep, 2021 04:37 PM
Dear John,
Following your advice, I have been working on a variation of this data visualization and, I made two pie charts, one inside the other, the small one represents the proportion of vaccines received per laboratory of origin and, the big one represents the proportion of vaccines received by department, on top of that there are bars that represent the amount of vaccines assigned by department.
Now comes the situation, How can I group the bars that are in top of the piechart located on the outside in their respective slice of the pie? How can I match the bars size with size of the slice that they are placed on? The other situation that have experienced is that the Text on path3 node does not modify properly the text's margin, some of the departments have a long name and the names step into each other.
Cheers,
Juan
Support Staff 3 Posted by john on 20 Sep, 2021 08:44 PM
Hi Juan,
You sent a PNG of a draft double donut chart. But the NodeBox file you attached was not the file that generated that chart. Instead it was a sankey diagram.
Did you attach the wrong NDBX file?
The Nodebox network making the sankey diagram was broken in various places, so it took me awhile to even understand what it was. Once I understood I fixed it up. See screenshot and revised sankey network attached. (The sankey is better now but still needs some work. For one thing, I would increase the gaps between link groups on both sides to make the boundaries easier to see.)
I would be happy to show you how to draw bars on the outside of larger donut, but it would be easier if you could reply and attach the Nodebox file that generated the PNG you sent.
I'll wait to hear back from you with your double donut ndbx file.
Thanks,
John
4 Posted by juan.rozo23 on 21 Sep, 2021 01:57 AM
Dear John,
Yes, my bad! I attached the previous .NDBX project by accident. The one that is attached now is the correct one.
Thank you very much for your time and interest in the project, I'm doing these visualizations because I want to improve my data visualization skills with NodeBox and, without your help I wouldn't be no where near to the understanding that I have now of NodeBox as a data visualization tool.
Cheers,
Juan
Support Staff 5 Posted by john on 21 Sep, 2021 06:32 AM
Hi Juan,
Attached is my modified version of circular viz (see screenshot).
Study it to see how I drew the bars proportionally outside each arc. To resolve the overlapping departments I drew the labels as spokes rather than along the arcs; you have plenty of room for such spokes and this allows us to label even very small arcs with arbitrarily long labels.
I also reordered the coloring scheme on your inner donut. You had the inner donut sorted by decreasing proportion (usually a good idea) while the outer bars are sorted alphabetically for each arc. But since the inner donut effectively serves as a color key for the outer bars, it was discordant having those colors in two different orders. Better to have a consistent coloring scheme throughout.
I made one other minor change: I recolored the gray scale department arcs to alternating dark and light instead of continuous shades of gray. I thought this might make it easier to see the borders between each arc, but I'm still not entirely happy with it. Feel free to change it back if you don't think it works.
I think this works fairly well as a visualization. The outer bars concern me somewhat, though. In a standard bar chart the bars all have the same thickness and the height is directly proportional to the value. By reducing the thickness to fit along smaller arcs we are reducing the surface area in a way that has nothing to do with the value. Users will probably understand this, but it could be misleading for some.
I noticed you also made the bar height proportional not to the actual value, but to its square root - essentially using a logarithmic scale. This makes the overall chart more even while still showing relative proportions, so it's probably a reasonable choice. But again, it could be misleading for some. Using the actual values makes a wilder chart which calls more attention to the biggest values - but this is not necessarily a bad thing. You might try comparing both versions (with and without the sqrt node) and see which you like better. It all depends on what you want to emphasize for your viewers.
This is a fun project - and educational for all of us. Thanks again for sharing!
John