The New and Improved R Shodan Package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For those not involved with all things “cyber”, let me start with a description of what Shodan is (though visiting the site is probably the best introduction to what secrets it holds).
Shodan is—at it’s core—a search engine. Unlike Google, Shodan indexes what I’ll call “cyber” metadata and content about everything accessible via a public IP address. This means things like
- routers, switches and cable/DSL/FiOS modems (which are the underpinnings of our innternet access)
- internet web, ftp, mail, etc servers
- public (protected or otherwise) CCTV & home surveillance & web camears
- desktops, printers and other things that may end up in public IP space
- gas station pumps and industrial control systems
- VoIP phones & more
Shodan contacts the IP addresses associated with all the devices, sees what ports and protocols might be in use and then tries to retrieve content from those ports and protocols (which could be anything from webcam snapshots to web server HTML to actual header responses from internet servers to banners from routers and switches). It indexes all that metadata and content and makes it available in a search engine and API for securiy researchers (I was so tempted to put that word in quotes).
To give you an idea what it can do, take a look at this query for webcams and/or read this full explanation of what you can do with that data.
While you can have fun with Shodan, it does have real value to security folk and R needed a real API interface to it (I did a half-hearted one a couiple years ago). Hence the rebirth of the shodan package.
The package is brand-new, but it has basic, full coverage of the Shodan API except for the streaming functions. But, a line of code is worth a thousand blatherings, so let’s find all the IIS servers in Maine.
# devtools::install_github("hrbrmstr/shodan") library(shodan) # perform the query for IIS servers in Maine maine_iis <- shodan_search("iis state:me") # get the total number of IIS servers in Maine that Shodan found print(maine_iis$total) ## [1] 2948 # how many did it return in this page of the query? print(nrow(maine_iis$matches)) ## [1] 100 # what else does it know about these servers? print(colnames(maine_iis$matches)) ## [1] "product" "hostnames" "version" "title" "ip" "org" ## [7] "isp" "cpe" "data" "asn" "port" "transport" ## [13] "timestamp" "domains" "ip_str" "os" "_shodan" "location" ## [19] "ssl" "link"
Now, the data frame in maine_iis$matches
is somewhat ugly for the moment. Some columns have lists and data frames since the Shodan REST API returns (like many APIs do) nested JSON. I’m actually looking for collaboration on what would be the most useful format for the returned data structures so hit me up if you have ideas that would benefit your use of it.
I’ll violate my own rule about mapping IP addresses just to show you Shodan also does geolocation for you (and, hey, y’all seem to like maps). We’ll make it a bit more useful and add some metadata about what it found to the location popups:
library(leaflet) library(htmltools) for_map <- cbind.data.frame(maine_iis$matches$location, ip=maine_iis$matches$ip, isp=maine_iis$matches$isp, title=maine_iis$matches$title, org=maine_iis$matches$org, data=maine_iis$matches$data, stringsAsFactors=FALSE) leaflet(for_map, width="450", height="600") %>% addTiles() %>% setView(-69.233328, 45.250556, 7) %>% addCircles(data=for_map, lng=~longitude , lat=~latitude, popup=~sprintf("<b>%s</b><br/>%s, Maine</b><br/>ISP: %s<br/><hr noshade size='1'/><pre>%snn%s", htmlEscape(org), htmlEscape(city), htmlEscape(isp), htmlEscape(title), htmlEscape(data)))
Remember that’s only 100 of ~3,000 servers, but it should give you an idea of the types of data Shodan can return.
The pacakge is up on github for now, and here’s a list of functions it makes available:
account_profile
: Account Profileapi_info
: API Plan Informationhost_count
: Search Shodan without Resultshost_info
: Host Informationmy_ip
: My IP Addressquery_tags
: List the most popular tagsresolve
: DNS Lookupreverse
: Reverse DNS Lookupshodan_api_key
: Get or set SHODAN_API_KEY valueshodan_exploit_search
: Search for Exploitsshodan_exploit_search_count
: Search for Exploits without Resultsshodan_ports
: List all ports that Shodan is crawling on the Internet.shodan_protocols
: List all protocols that can be used when performing on-demand Internet scans via Shodan.shodan_query_list
: List the saved search queriesshodan_query_search
: Search the directory of saved search queries.shodan_scan
: Request Shodan to crawl an IP/ netblockshodan_scan_internet
: Crawl the Internet for a specific port and protocol using Shodanshodan_search
: Search Shodanshodan_search_tokens
: Break the search query into tokensshodan_services
: List all services that Shodan crawls
Each of those maps to the API endpoints described on the official Shodan site.
You are invited to tag along on this package as much or as little as you like. Drop a note in the comments if you find it useful or have suggestions! Please file all feature requests or problems on github. Have fun exporing the API in R!.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.