Do G-Cloud categories need a tweak?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Why take a deeper look at G-Cloud categories?
The last blog – “The key to unlocking services on G-Cloud” – touched briefly upon their overlap. And as the concept of categories was newly introduced in the current iteration of G-Cloud (G9), it may be worth taking a deeper look at their impact in advance of the next.
So, in this blog, I want to explore the extent and effects of category overlap to see what insights may be drawn. For example, are some categories of less value than others? Could some suppliers gain an advantage by aligning each service to myriad categories so buyers find them irrespective of their carefully crafted search criteria?
Are all categories useful?
The last blog looked at how 4 categories overlap in a Venn diagram. These plots are great for visualising a small number of sets, but if we want to analyse the 22 categories in Cloud Hosting, for example, then we need a different approach.
The plot below, if displayed in full, would probably need the side of several double-decker buses to do it justice. However, as I’m only concerned, at least in the first instance, with those services which appear in one, and only one, category, we can display just that portion of the plot.
So, what does it tell us? Let’s look at Platform as a Service (PaaS) which is the bottom row. There are over 1,200 PaaS services as represented by the set size to the left. Of these (reading up from the orange dot), close to 500 align only to the PaaS category. Or put another way, if the category were not available, there would be 500 services with no category to align to. This suggests there is real value in this category.
Contrast PaaS though with Intrusion Detection: shown as “id” in the table (eighth from the top). There are more than 250 services (set size), yet the absence of an orange dot tells us that not one of those services aligns to ONLY the Intrusion Detection category. In other words, if we removed this category, there would be no stranded services.
Could suppliers exploit the categories to gain an advantage?
Now let’s focus just on Intrusion Detection to see how extensively it overlaps with other categories. So, this time “id”, as the object of our focus, is the bottom row in the table. Again visualisation of the full permutation of intersections might require the services of London Transport, so I’ve focused on the more-than-100 Intrusion Detection services which each align to TEN categories.
Taking the 20 (intersection size) in the first column above, and inspecting the first of those directly on the Digital Marketplace, we find the service title is: “[Supplier Name] Cloud Backup and Recovery Service”. This would seem to best align to the Archiving Backup and Disaster Recovery category, yet is aligned to nine additional categories including Intrusion Detection.
What can we conclude?
This analysis doesn’t necessarily mean that the Intrusion Detection category has no value. Buyers may still find it useful as an optic around a particular set of services.
Nor does it necessarily mean there aren’t services which genuinely do straddle several categories without neatly aligning to only one.
So what does it mean? The risk is that an overly flexible categorisation mechanism could drive an unwelcome behaviour. The unintended consequence could be to wash out the very benefit the categories were designed to introduce. Suppliers may seek to maximise their chances of being short-listed by aligning each service to a large number of categories. Then the focusing benefit to buyers would be progressively diluted.
What to do? Consider tweaking the list of available categories. Perhaps more tightly define their scope. And maybe limit the number of categories to which any one service may align, for example, adopting an approach of “which up to x categories best describe your service” (setting “x” to an appropriately small number).
R tools used
Packages | Functions | |
---|---|---|
purrr | map_df, reduce | |
rvest | read_html; html_nodes; html_text | |
dplyr | full_join; filter | |
stringr | str_c; str_replace; str_trim | |
tibble | tibble | |
UpSetR | upset; fromList |
Citations
R Development Core Team (2008). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
Contains public sector information licensed under the Open Government Licence v3.0.
Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister, UpSet: Visualization of Intersecting Sets, IEEE Transactions on Visualization and Computer Graphics (InfoVis ’14), vol. 20, no. 12, pp. 1983–1992, 2014. doi:10.1109/TVCG.2014.2346248
The post Do G-Cloud categories need a tweak? appeared first on thinkr.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.