Site icon R-bloggers

A data geek, an AI guy, and a fintech dude go into a bar…

[This article was first published on r-bloggers – STATWORX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

… some water down the bridge, we are having a Co-Meetup in Frankfurt – kudos to the organizers. Those guys are just awesome. For the past years they have been making an effort to build a Data Science community in Frankfurt – you should check out their Twitter feed. Whenever there is a Meetup – which you should totally check out, by the way – we typically sponsor some beers and some snacks to support the effort those guys are making.

On 24th of January the guys planned something special, though: A Meetup for data scientists, AI developers, and FinTech guys. What can I say, it was lit. The organizers really out-did themselves. We all met at the Frankfurt School of Finance campus. Besides the massive conference room for the talks, we and other companies had little pop-up displays to show interested people what we are doing all day long.

Not only the location but also the speakers were remarkable. Apart from us – I will get to this in a second – Jonathan Masci – one of the founders of NNAISENSE – was talking about their research on deep learning (@Jonathan: I was really hyped by your talk by the way). Yassin Hankir bombared us with memes while giving a talk about his company savedroid which you should really check out if you are broke all the time because you tend to forget to put money aside, ocassionally.

As I said before, we also gave a talk. Led by our fierce CEO Sebastian, Fabian and myself presented one of our newest creations. With only a couple of weeks preparation the entire team got together – sometimes with a couple of beers – and we built a really cool and handy tool. Let me tell you about it:

If you ever had the joy of working on a Machine Learning forecasting project you pretty much know the workflow. At first you are all excited to finally get the data so you can start hacking some models together. Well, it is usually not like that. You pretty much spend your first days with cleaning the data and forcing them into a somewhat machine interpretable shape. After some exhausting, frustrating and coffein intensive hours you are finally done and your dataset is recognizable as such. The next thing you do is, you start thinking about features (variables) that could potentially influence your target. Our world is quite complex – statistically speaking – so there are nearly infinite vairables that could explain some parts of the variation in your target. So you go on and select some of them. You might use some statistical discrimination method, use some sort of logical explanation for why a feature matters, or you just do it randomly. Once you found the formula you were looking for, you will likely start testing various algorithms to see which one predicts your data best. You will probably use some elaborate train-test split with an elaborate cross-validation scheme for model tuning. In the end, you evaluate your models and select the one fitting the best to your data. Now your are pretty much good to go and you can start forecasting.

This of course is tidious work. Since we pretty much go through this entire workflow with every single project we work on, we came up with a handy automation for this workflow. For now we aggreed – not unanimously though – on the name TSBOX. TSBOX is able to take a time series and then automatically produce a forecast. Here some illustration on how it works.

We pretty much follow the logic I described before. However, with a bunch of code we can delegate those tidious tasks to our machines. Like every Data Scientist would do, our program preprocesses the data first. The exhaustive search for potential outliers, NAs or other weird things going on is taken care of by our artifical Data Scientist so to say. Then comes the task that is usually even more coding intensive – feature generation. Just as well taken care of. Selection of features – no problem. Choosing an algorithm that best fits your data – once again, our program is stealing at least my job.

We are still developing our prototype. What you saw at the Data Science Meetup was our first demo, that we hacked together in merely two weeks. With a little more time, we are pretty confident to come up with something really cool. So lets hope it works. If it does, it's going to be lit.

Über den Autor

Lukas Strömsdörfer

Lukas ist im Data Science Team und promoviert gerade extern an der Uni Göttingen. In seiner Freizeit fährt er leidenschaftlich gerne Fahrrad und schaut Serien.

Der Beitrag A data geek, an AI guy, and a fintech dude go into a bar… erschien zuerst auf STATWORX.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers – STATWORX.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.