Site icon R-bloggers

Trade Classification in R with PINstimation

[This article was first published on Stories by PINstimation on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The use of high-frequency data has gained widespread attention and popularity in contemporary financial research thanks to its potential to capture minute fluctuations and provide detailed insights into the behavior of financial markets. However, analyzing such data can pose significant challenges due to the high level of activity observed in current markets, which are largely dominated by high-frequency trading. Typically, high-frequency data is aggregated into discrete intraday periods or daily data using trade classification algorithms. In this article, we will explore how to classify and aggregate high-frequency data using the PINstimation package.

Trade classification algorithms

PINstimation package implements four algorithms for trade classification¹: Tick, Quote, LR, and EMO. Let’s take a closer look at each of these:

Package functions

The package offers two functions that are specifically designed for the classification and aggregation of intraday trades:

★ classify_trades() classifies high-frequency trades using one of the aforementioned algorithms. It has the arguments:

★ aggregate_trades() aggregates high-frequency trades using one of the aforementioned algorithms, and has two additional arguments.

Usage examples

We use a dataset called hfdata included in the package as raw data to aggregate. It is a simulated dataset containing sample timestamp, price, volume, bid, and ask for 100.000 high frequency transactions.

library(PINstimation)
xdata <- hfdata
xdata$volume <- NULL
ctrades <- classify_trades(xdata, algorithm = "EMO", timelag = 500,
 verbose = T)
head(ctrades, 2)
##             timestamp   price     bid     ask isbuy
## 38 2018-10-18 00:13:10 15.4754 15.4568 15.4754  TRUE
## 49 2018-10-18 00:17:52 15.5143 15.5143 15.5236  TRUE
lrtrades <- aggregate_trades(xdata, algorithm = "LR", timelag = 1000,
                              frequency = "min", unit = 15, verbose = TRUE)
qtrades <- aggregate_trades(xdata, algorithm = "Quote", timelag = 1000,
                              frequency = "day", unit = 1, verbose = TRUE)
head(qtrades, 2)
##    b   s
## 1 873 746
## 2 823 793
model <- pin_ea(qtrades)
show(model)
## ----------------------------------
## PIN estimation completed successfully
## ----------------------------------
## [...]
## ==========  ===========
## Variables   Estimates  
## ==========  ===========
## alpha       0.739135   
## delta       0          
## mu          247.46     
## eps.b       548.72     
## eps.s       717.26     
## ----                   
## Likelihood  (1232.001) 
## PIN         0.126237   
## ==========  ===========
## 
## -------
## Running time: 0.789 seconds
model_ea@parameters
## alpha     delta      mu          eps.b       eps.s 
## 0.7499975 0.1333342 1193.5179655 357.2659099 328.6291793

model_ea@pin
## [1] 0.5661721

Conclusion

The PINstimation package is a highly efficient tool for classifying and aggregating high-frequency data. With just a few lines of code, it enables you to quickly classify trades using a range of algorithms and time lags, and aggregate them for virtually any desired frequency. Moreover, the classification process is fast, making it an ideal option for researchers who are working with large datasets. More information about the package and its functions can be found in the package documentation and on the dedicated website.

For more great examples of R in action, check out R-bloggers and R-users.

References

  1. Aktas O, Kryzanowski L (2014) Trade classification accuracy for the bist. Journal of International Financial Markets, Institutions and Money 33:259–282, DOI
  2. Lee CM, Ready MJ (1991) Inferring trade direction from intraday data. The Journal of Finance 46(2):733–746
  3. Ellis K, Michaely R, O’Hara M (2000) The accuracy of trade classification rules: Evidence from nasdaq. The Journal of Financial and Quantitative Analysis 35(4):529, DOI
  4. Easley, N. M. Kiefer, M. O’Hara, and J. B. Paperman. Liquidity, information, and infrequently traded stocks. The Journal of Finance, 51(4):1405, 9 1996. ISSN 00221082. DOI.
  5. Ersan O, Alıcı A (2016) An unbiased computation methodology for estimating the probability of informed
    trading (pin). Journal of International Financial Markets, Institutions and Money 43:74–94, DOI
To leave a comment for the author, please follow the link and comment on their blog: Stories by PINstimation on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version