Modern Honey Network Machinations with R, Python, phantomjs, HTML & JavaScript

[This article was first published on Data Driven Security, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This was (initially) going to be a blog post announcing the new mhn R package (more on what that is in a bit) but somewhere along the way we ended up taking a left turn at Albuquerque (as we often do here at ddsec hq) and had an adventure in a twisty maze of Modern Honey Network passages that we thought we’d relate to everyone.

Episode 0 : The Quest!

We find our intrepid heroes data scientists finally getting around to playing with the Modern Honey Network (MHN) software that they promised Jason Trost they’d do ages ago. MHN makes it easy to [freely] centrally setup, control, monitor and collect data from one or more honeypots. Once you have this data you can generate threat indicator feeds from it and also do analysis on it (which is what we’re interested in eventually doing and what ThreatStream does do with their global network of MHN contributors).

Jason has a Vagrant quickstart version of MHN which lets you kick the tyres locally, safely and securely before venturing out into the enterprise (or internet). You stand up the server (mostly Python-y things), then tell it what type of honeypot you want to deploy. You get a handy cut-and-paste-able string which you paste-and-execute on a system that will become an actual honeypot (which can be a “real” box, a VM or even a RaspberryPi!). When the honeypot is finished installing the necessary components it registers with your MHN server and you’re ready to start catching cyber bad guys.


(cyber bad guy)

Episode 1 : Live! R! Package!

We decided to deploy a test MHN server and series of honeypots on Digital Ocean since they work OK on the smallest droplet size (not recommended for a production MHN setup).

While it’s great to peruse the incoming attacks:

we wanted programmatic access to the data, so we took a look at all the routes in their API and threw together an R package to let us work with it.

library(mhn)

attacks <- sessions(hours_ago=24)$data
tail(attacks)

##                           _id destination_ip destination_port honeypot
## 3325 55d93cb8b5b9843e9bb34c75 111.222.33.111               22      p0f
## 3326 55d93cb8b5b9843e9bb34c74 111.222.33.111               22      p0f
## 3327 55d93d30b5b9843e9bb34c77 111.222.33.111               22      p0f
## 3328 55d93da9b5b9843e9bb34c79           <NA>             6379  dionaea
## 3329 55d93f1db5b9843e9bb34c7b           <NA>             9200  dionaea
## 3330 55d94062b5b9843e9bb34c7d           <NA>               23  dionaea
##                                identifier protocol       source_ip source_port
## 3325 bf7a3c5e-48e7-11e5-9fcf-040166a73101     pcap    45.114.11.23       58621
## 3326 bf7a3c5e-48e7-11e5-9fcf-040166a73101     pcap    45.114.11.23       58621
## 3327 bf7a3c5e-48e7-11e5-9fcf-040166a73101     pcap    93.174.95.81       44784
## 3328 83e2f4e0-4876-11e5-9fcf-040166a73101     pcap 184.105.139.108       43000
## 3329 83e2f4e0-4876-11e5-9fcf-040166a73101     pcap  222.186.34.160        6000
## 3330 83e2f4e0-4876-11e5-9fcf-040166a73101     pcap   113.89.184.24       44028
##                       timestamp
## 3325 2015-08-23T03:23:34.671000
## 3326 2015-08-23T03:23:34.681000
## 3327 2015-08-23T03:25:33.975000
## 3328 2015-08-23T03:27:36.810000
## 3329 2015-08-23T03:33:48.665000
## 3330 2015-08-23T03:39:13.899000

NOTE: that’s not the real destination_ip so don’t go poking since it’s probably someone else’s real system (if it’s even up).

You can also get details about the attackers (this is just one example):

attacker_stats("45.114.11.23")$data

## $count
## [1] 1861
## 
## $first_seen
## [1] "2015-08-22T16:43:59.654000"
## 
## $honeypots
## [1] "p0f"
## 
## $last_seen
## [1] "2015-08-23T03:23:34.681000"
## 
## $num_sensors
## [1] 1
## 
## $ports
## [1] 22

The package makes it really easy (OK, we’re probably a bit biased) to grab giant chunks of time series and associated metadata for further analysis.

While cranking out the API package we noticed that there were no endpoints for the MHN HoneyMap. Yes, they do the “attacks on a map” thing but don’t think too badly of them since most of you seem to want them.

After poking around the MHN source a bit more (and navigating the view-source of the map page) we discovered that they use a Go-based websocket server to push the honeypot hits out to the map. (You can probably see where this is going, but it takes that turn first).

Episode 2 : Hacking the Anti-Hackers

The other thing we noticed is that—unlike the MHN-server proper—the websocket component does not require authentication. Now, to be fair, it’s also not really spitting out seekrit data, just (pretty useless) geocoded attack source/dest and type of honeypot involved.

Still, this got us wondering if we could find other MHN servers out there in the cold, dark internet. So, we fired up RStudio again and took a look using the shodan package:

library(shodan)

# the most obvious way to look for MHN servers is to 
# scour port 3000 looking for content that is HTML
# then look for "HoneyMap" in the <title>

# See how many (if any) there are
host_count('port:3000 title:HoneyMap')$total
## [1] 141

# Grab the first 100
hm_1 <- shodan_search('port:3000 title:HoneyMap')

# Grab the last 41
hm_2 <- shodan_search('port:3000 title:HoneyMap', page=2)

head(hm_1)

##                                           hostnames    title
## 1                                                   HoneyMap
## 2                                  hb.c2hosting.com HoneyMap
## 3                                                   HoneyMap
## 4                                          fxxx.you HoneyMap
## 5            ip-192-169-234-171.ip.secureserver.net HoneyMap
## 6 ec2-54-148-80-241.us-west-2.compute.amazonaws.com HoneyMap
##                    timestamp                isp transport
## 1 2015-08-22T17:14:25.173291               <NA>       tcp
## 2 2015-08-22T17:00:12.872171 Hosting Consulting       tcp
## 3 2015-08-22T16:49:40.392523      Digital Ocean       tcp
## 4 2015-08-22T15:27:29.661104      KW Datacenter       tcp
## 5 2015-08-22T14:01:21.014893   GoDaddy.com, LLC       tcp
## 6 2015-08-22T12:01:52.207879             Amazon       tcp
##                                                                                                                                                                                                       data
## 1 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 2278\r\nContent-Type: text/html; charset=utf-8\r\nLast-Modified: Sun, 02 Nov 2014 21:16:17 GMT\r\nDate: Sat, 22 Aug 2015 17:14:22 GMT\r\n\r\n
## 2 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 2278\r\nContent-Type: text/html; charset=utf-8\r\nLast-Modified: Wed, 12 Nov 2014 18:52:21 GMT\r\nDate: Sat, 22 Aug 2015 17:01:25 GMT\r\n\r\n
## 3 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 2278\r\nContent-Type: text/html; charset=utf-8\r\nLast-Modified: Mon, 04 Aug 2014 18:07:00 GMT\r\nDate: Sat, 22 Aug 2015 16:49:38 GMT\r\n\r\n
## 4 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 2278\r\nContent-Type: text/html; charset=utf-8\r\nDate: Sat, 22 Aug 2015 15:22:23 GMT\r\nLast-Modified: Sun, 27 Jul 2014 01:04:41 GMT\r\n\r\n
## 5 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 2278\r\nContent-Type: text/html; charset=utf-8\r\nLast-Modified: Wed, 29 Oct 2014 17:12:22 GMT\r\nDate: Sat, 22 Aug 2015 14:01:20 GMT\r\n\r\n
## 6 HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nContent-Length: 1572\r\nContent-Type: text/html; charset=utf-8\r\nDate: Sat, 22 Aug 2015 12:06:15 GMT\r\nLast-Modified: Mon, 08 Dec 2014 21:25:26 GMT\r\n\r\n
##   port location.city location.region_code location.area_code location.longitude
## 1 3000          <NA>                 <NA>                 NA                 NA
## 2 3000   Miami Beach                   FL                305           -80.1300
## 3 3000 San Francisco                   CA                415          -122.3826
## 4 3000     Kitchener                   ON                 NA           -80.4800
## 5 3000    Scottsdale                   AZ                480          -111.8906
## 6 3000      Boardman                   OR                541          -119.5290
##   location.country_code3 location.latitude location.postal_code location.dma_code
## 1                   <NA>                NA                 <NA>                NA
## 2                    USA           25.7906                33109               528
## 3                    USA           37.7312                94124               807
## 4                    CAN           43.4236                  N2E                NA
## 5                    USA           33.6119                85260               753
## 6                    USA           45.7788                97818               810
##   location.country_code location.country_name                           ipv6
## 1                  <NA>                  <NA> 2600:3c02::f03c:91ff:fe73:4d8b
## 2                    US         United States                           <NA>
## 3                    US         United States                           <NA>
## 4                    CA                Canada                           <NA>
## 5                    US         United States                           <NA>
## 6                    US         United States                           <NA>
##            domains                org   os module                         ip_str
## 1                                <NA> <NA>   http 2600:3c02::f03c:91ff:fe73:4d8b
## 2    c2hosting.com Hosting Consulting <NA>   http                  199.88.60.245
## 3                       Digital Ocean <NA>   http                104.131.142.171
## 4         fxxx.you      KW Datacenter <NA>   http                  162.244.29.65
## 5 secureserver.net   GoDaddy.com, LLC <NA>   http                192.169.234.171
## 6    amazonaws.com             Amazon <NA>   http                  54.148.80.241
##           ip     asn link uptime
## 1         NA    <NA> <NA>     NA
## 2 3344448757 AS40539 <NA>     NA
## 3 1753452203    <NA> <NA>     NA
## 4 2733907265    <NA> <NA>     NA
## 5 3232361131 AS26496 <NA>     NA
## 6  915689713    <NA> <NA>     NA

Yikes! 141 servers just on the default port (3000) alone! While these systems may be shown as existing in Shodan, we really needed to confirm that they were, indeed, live MHN HoneyMap [websocket] servers.

Episode 3 : Picture [Im]Perfect

Rather than just test for existence of the websocket/data feed we decided to take a screen shot of every server, which is pretty easy to do with a crude-but-effective mashup of R and phantomjs. For this, we made a script which is just a call—for each of the websocket URLs—to the “built-in” phantomjs rasterize.js script that we’ve slightly modified to wait 30 seconds from page open to snapshot creation. We did that in the hopes that we’d see live attacks in the captures.

cat(sprintf("phantomjs rasterize.js http://%s:%s %s.png 800px*600px\n",
            hm_1$matches$ip_str,
            hm_1$matches$port,
            hm_1$matches$ip_str), file="capture.sh")

That makes capture.sh look something like:

phantomjs rasterize.js http://199.88.60.245:3000 199.88.60.245.png 800px*600px
phantomjs rasterize.js http://104.131.142.171:3000 104.131.142.171.png 800px*600px
phantomjs rasterize.js http://162.244.29.65:3000 162.244.29.65.png 800px*600px
phantomjs rasterize.js http://192.169.234.171:3000 192.169.234.171.png 800px*600px
phantomjs rasterize.js http://54.148.80.241:3000 54.148.80.241.png 800px*600px
phantomjs rasterize.js http://95.97.211.86:3000 95.97.211.86.png 800px*600px

Yes, there are far more elegant ways to do this, but the number of URLs was small and we had no time constraints. We could have used a pure phantomjs solution (list of URLs in phantomjs JavaScript) or used GNU parallel to speed up the image captures as well.

Sifting through ~140 images manually to see if any had “hits” would not have been too bad, bit a glance at the directory listing showed that many had the exact same size, meaning those were probably showing a default/blank map. We uniq‘d them by MD5 hash and made an image gallery of them:

It was interesting to see Mexico CERT and OpenDNS in the mix.

Most of the 141 were active/live MHN HoneyMap sites. We can only imagine what a full Shodan search for HoneyMaps on other ports would come back with (mostly since we only have the basic API access and don’t want to burn the credits).

Episode 3 : With “Meh” Data Comes Great Irresponsibility

For those who may not have been with DDSec for it’s entirety, you may not be aware that we have our own attack map (github).

We thought it would be interesting to see if we could mashup MHN HoneyMap data with our creation. We first had to see what the websocket returned. Here’s a bit of Python to do that (the R websockets package was abandoned by it’s creator, but keep an eye out for another @hrbrmstr resurrection):

import websocket
import thread
import time

def on_message(ws, message):
    print message

def on_error(ws, error):
    print error

def on_close(ws):
    print "### closed ###"


websocket.enableTrace(True)
ws = websocket.WebSocketApp("ws://128.199.121.95:3000/data/websocket",
                            on_message = on_message,
                            on_error = on_error,
                            on_close = on_close)
ws.run_forever()

That particular server is very active, hence why we chose to use it.

The output should look something like:

$ python ws.py
--- request header ---
GET /data/websocket HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Host: 128.199.121.95:3000
Origin: http://128.199.121.95:3000
Sec-WebSocket-Key: 07EFbUtTS4ubl2mmHS1ntQ==
Sec-WebSocket-Version: 13


-----------------------
--- response header ---
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: nvTKSyCh+k1Rl5HzxkVNAZjZZUA=
-----------------------
{"city":"Clarks Summit","city2":"San Francisco","countrycode":"US","countrycode2":"US","latitude":41.44860076904297,"latitude2":37.774898529052734,"longitude":-75.72799682617188,"longitude2":-122.41940307617188,"type":"p0f.events"}
{"city":"Clarks Summit","city2":"San Francisco","countrycode":"US","countrycode2":"US","latitude":41.44860076904297,"latitude2":37.774898529052734,"longitude":-75.72799682617188,"longitude2":-122.41940307617188,"type":"p0f.events"}
{"city":null,"city2":"Singapore","countrycode":"US","countrycode2":"SG","latitude":32.78310012817383,"latitude2":1.2930999994277954,"longitude":-96.80670166015625,"longitude2":103.85579681396484,"type":"p0f.events"}

Those are near-perfect JSON records for our map, so we figured out a way to tell iPew/PewPew (whatever folks are calling it these days) to take any accessible MHN HoneyMap as a live data source. For example, to plug this highly active HoneyMap into iPew all you need to do is this:

http://ocularwarfare.com/ipew/?mhnsource=http://128.199.121.95:3000/data/

Once we make the websockets component of the iPew map a bit more resilient we’ll post it to GitHub (you can just view the source to try it on your own now).

Fin

As we stated up front, the main goal of this post is to introduce the mhn package. But, our diversion has us curious. Are the open instances of HoneyMap deliberate or accidental? If any of them are “real” honeypot research or actual production environments, does such an open presence of the MHN controller reduce the utility of the honeypot nodes? Is Greenland paying ThreatStream to use that map projection instead of a better one?

If you use the new package, found this post helpful (or, at least, amusing) or know the answers to any of those questions, drop a note in the comments.

To leave a comment for the author, please follow the link and comment on their blog: Data Driven Security.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)