The network plot of Mutations
[This article was first published on My Data Science Journey, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a pet project, I created a network plot in R, to represent mutations and how combinations improved or worsened a mutation. I have tried to document the way I approached this whole problem in this post.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Input
First let’s look at the input data.
An excel sheet with a column of mutations and a column of the Half Life Improvement factors would do for input.
Mutation | HIF |
---|---|
A1B | 5 |
A1B B2C | 6 |
A1B B2C C3D | 3 |
C3D Z25A | 7 |
A1C | 4 |
Since the inputs I had were in xlsx format, I used the XLConnect package to read and write from it.
I had to write some code to clean up the data. For example, sometimes the mutations were separated by ‘+’ instead of a single space and so on. One might have an input file with a lot of irrelevant information, or duplicates.
Many small functions were need to clean this according to the various input file.
Creating Nodes
Then I had to create “nodes” from the list of mutations. For this the code involved getting unique records of “mutations” and I also added a bit of code to count the number of substitutions in the mutation. Now the table would look something like:
Mutation | HIF | NSubs |
---|---|---|
A1B | 5 | 1 |
A1B B2C | 6 | 2 |
A1B B2C C3D | 3 | 3 |
C3D Z25A | 7 | 2 |
A1C | 4 | 1 |
Creating Edges
Now that we have the nodes, we need to make their “edges” or “links”.
Looping through the number of substitutions, I sorted the data by number of substitutions, and then further looping through the mutations, made connections by checking for matching mutations.
Also, I decided to use the “networkD3” package in R, so I need to convert the mutations to a number, and edges defined as “source” and “target”, also as numbers.
Now NetworkD3 is based on d3js. And this being java based, the numbering should start from 0.
Our nodes would now look like:
ID | Mutation | HIF | NSubs |
---|---|---|---|
0 | A1B | 5 | 1 |
1 | A1B B2C | 6 | 2 |
2 | A1B B2C C3D | 3 | 3 |
3 | C3D Z25A | 7 | 2 |
4 | A1C | 4 | 1 |
And the edges would look like:
Source | ID | From | Mutation | HIF | NSubs |
---|---|---|---|---|---|
0 | 0 | A1B | A1B | 5 | 1 |
0 | 1 | A1B | A1B B2C | 6 | 2 |
0 | 2 | A1B | A1B B2C C3D | 3 | 3 |
3 | 3 | C3D Z25A | C3D Z25A | 7 | 2 |
4 | 4 | A1C | A1C | 4 | 1 |
You may want to save this in an excel, with sheets named Node and Edges respectively.
Plotting the graph
As mentioned earlier, I used the networkD3 package. And in that the forceNetwork function. This has a lot more options for effect and hence I used it in my project. There are other types of visualization available under networkD3, all based on the D3.js.
fn <- forceNetwork(Links = links, Nodes = nodes, Source = "Source", Target = "ID", Value = "NSub", NodeID = "Mutation", Nodesize = "HIF", Group = "group", zoom = T, bounded = F, legend = T, opacity = 0.8, fontSize = 16, width = 1600, height = 1200 )
The output was then saved as an HTML file for sharing with end users.
Customizing the results
Then I started needing customized features in the visualization. I found this link giving ideas for a few, and using it as inspiration added a search box among other things.
One can use HTML::onRender to add the javascript code, but what I did instead was to find the package file directly at /usr/local/lib/R/site-library/networkD3/htmlwidgets/ and edited it on sudo mode. To repackage, I used the command:
sudo R CMD INSTALL /usr/local/lib/R/site-library/networkD3
The html code for adding a search box was added to the R code itself, using the browsable tag. I got help for this part, from a question I asked on stack overflow.
The code for adding the search:
fn <- forceNetwork(Links = links, Nodes = nodes, Source = "Source", Target = "ID", Value = "NSub", NodeID = "Mutation", Nodesize = "HIF", Group = "group", zoom = T, bounded = F, legend = T, opacity = 0.8, fontSize = 16, width = 1600, height = 1200 ) browsable( tagList( tags$head( tags$link( href="http://code.jquery.com/ui/1.11.0/themes/smoothness/jquery-ui.css", rel="stylesheet" ) ), HTML( '<script src="http://code.jquery.com/jquery-1.11.0.min.js"></script> <script src="http://code.jquery.com/ui/1.10.3/jquery-ui.js"></script> <style type="text/css"> #modal { position:fixed; left:150px; top:20px; z-index:1; background: white; border: 1px black solid; box-shadow: 10px 10px 5px #888888; display: none; } #content { max-height: 400px; overflow: auto; } #modalClose { position: absolute; top: -0px; right: -0px; z-index: 1; } </style> <script type="text/javascript"> function closeButton() { d3.select("#modal").style("display","none"); } </script> <div class="ui-widget"> <input id="search"> <button type="button">Search</button> HIF <select id="hif-comp"> <option value="lt"><</option> <option value="gt">></option> </select> <input id="hif"> <button type="button" id="smartSearch">SmartSearch</button> </div> <div id="modal"> <div id="content"></div> <button id="modalClose" onclick="closeButton();">X</button> </div> ' ), fn )
Also included is html code for an information box that opens when a node is clicked. The file now begins to look like:
Single clicking a node gives a box with information, double clicking or searching a node highlights it and it's immediate neighbors.
Typical use of this would be by protein designers, who would be able to then see how the substitutions have been working and what direction they can make further substitutions to get the molecule they desire.
There is a lot more that can be done to improve this, but for now, this helps.
To leave a comment for the author, please follow the link and comment on their blog: My Data Science Journey.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.