Hasil "q-fin.ST" - JURNALIN

arXiv Open Access 2024

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Bhaskarjit Sarmah, Benika Hall, Rohan Rao et al.

Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to challenges such as domain specific terminology and complex formats of the documents. We introduce a novel approach based on a combination, called HybridRAG, of the Knowledge Graphs (KGs) based RAG techniques (called GraphRAG) and VectorRAG techniques to enhance question-answer (Q&A) systems for information extraction from financial documents that is shown to be capable of generating accurate and contextually relevant answers. Using experiments on a set of financial earning call transcripts documents which come in the form of Q&A format, and hence provide a natural set of pairs of ground-truth Q&As, we show that HybridRAG which retrieves context from both vector database and KG outperforms both traditional VectorRAG and GraphRAG individually when evaluated at both the retrieval and generation stages in terms of retrieval accuracy and answer generation. The proposed technique has applications beyond the financial domain

en cs.CL, cs.LG

Detail Sumber

arXiv Open Access 2023

Towards reducing hallucination in extracting information from financial reports using Large Language Models

Bhaskarjit Sarmah, Tianjie Zhu, Dhagash Mehta et al.

For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q\&A systems, and empirically demonstrate superiority of our method.

en cs.CL, q-fin.PM

Detail Sumber

arXiv Open Access 2020

Dynamic Portfolio Optimization with Real Datasets Using Quantum Processors and Quantum-Inspired Tensor Networks

Samuel Mugel, Carlos Kuchkovsky, Escolastico Sanchez et al.

In this paper we tackle the problem of dynamic portfolio optimization, i.e., determining the optimal trading trajectory for an investment portfolio of assets over a period of time, taking into account transaction costs and other possible constraints. This problem is central to quantitative finance. After a detailed introduction to the problem, we implement a number of quantum and quantum-inspired algorithms on different hardware platforms to solve its discrete formulation using real data from daily prices over 8 years of 52 assets, and do a detailed comparison of the obtained Sharpe ratios, profits and computing times. In particular, we implement classical solvers (Gekko, exhaustive), D-Wave Hybrid quantum annealing, two different approaches based on Variational Quantum Eigensolvers on IBM-Q (one of them brand-new and tailored to the problem), and for the first time in this context also a quantum-inspired optimizer based on Tensor Networks. In order to fit the data into each specific hardware platform, we also consider doing a preprocessing based on clustering of assets. From our comparison, we conclude that D-Wave Hybrid and Tensor Networks are able to handle the largest systems, where we do calculations up to 1272 fully-connected qubits for demonstrative purposes. Finally, we also discuss how to mathematically implement other possible real-life constraints, as well as several ideas to further improve the performance of the studied methods.

en quant-ph, cs.CE

Detail DOI Sumber

arXiv Open Access 2019

Deep Reinforcement Learning for Foreign Exchange Trading

Yun-Cheng Tsai, Chun-Chieh Wang

Reinforcement learning can interact with the environment and is suitable for applications in decision control systems. Therefore, we used the reinforcement learning method to establish a foreign exchange transaction, avoiding the long-standing problem of unstable trends in deep learning predictions. In the system design, we optimized the Sure-Fire statistical arbitrage policy, set three different actions, encoded the continuous price over a period of time into a heat-map view of the Gramian Angular Field (GAF) and compared the Deep Q Learning (DQN) and Proximal Policy Optimization (PPO) algorithms. To test feasibility, we analyzed three currency pairs, namely EUR/USD, GBP/USD, and AUD/USD. We trained the data in units of four hours from 1 August 2018 to 30 November 2018 and tested model performance using data between 1 December 2018 and 31 December 2018. The test results of the various models indicated that favorable investment performance was achieved as long as the model was able to handle complex and random processes and the state was able to describe the environment, validating the feasibility of reinforcement learning in the development of trading strategies.

en cs.LG, q-fin.ST

Detail Sumber

arXiv Open Access 2019

Bitcoin Price Prediction: An ARIMA Approach

Amin Azari

Bitcoin is considered the most valuable currency in the world. Besides being highly valuable, its value has also experienced a steep increase, from around 1 dollar in 2010 to around 18000 in 2017. Then, in recent years, it has attracted considerable attention in a diverse set of fields, including economics and computer science. The former mainly focuses on studying how it affects the market, determining reasons behinds its price fluctuations, and predicting its future prices. The latter mainly focuses on its vulnerabilities, scalability, and other techno-crypto-economic issues. Here, we aim at revealing the usefulness of traditional autoregressive integrative moving average (ARIMA) model in predicting the future value of bitcoin by analyzing the price time series in a 3-years-long time period. On the one hand, our empirical studies reveal that this simple scheme is efficient in sub-periods in which the behavior of the time-series is almost unchanged, especially when it is used for short-term prediction, e.g. 1-day. On the other hand, when we try to train the ARIMA model to a 3-years-long period, during which the bitcoin price has experienced different behaviors, or when we try to use it for a long-term prediction, we observe that it introduces large prediction errors. Especially, the ARIMA model is unable to capture the sharp fluctuations in the price, e.g. the volatility at the end of 2017. Then, it calls for more features to be extracted and used along with the price for a more accurate prediction of the price. We have further investigated the bitcoin price prediction using an ARIMA model, trained over a large dataset, and a limited test window of the bitcoin price, with length $w$, as inputs. Our study sheds lights on the interaction of the prediction accuracy, choice of ($p,q,d$), and window size $w$.

en cs.SI, q-fin.ST

Detail Sumber

arXiv Open Access 2016

Inter-occurrence times and universal laws in finance, earthquakes and genomes

Constantino Tsallis

A plethora of natural, artificial and social systems exist which do not belong to the Boltzmann-Gibbs (BG) statistical-mechanical world, based on the standard additive entropy $S_{BG}$ and its associated exponential BG factor. Frequent behaviors in such complex systems have been shown to be closely related to $q$-statistics instead, based on the nonadditive entropy $S_q$ (with $S_1=S_{BG}$), and its associated $q$-exponential factor which generalizes the usual BG one. In fact, a wide range of phenomena of quite different nature exist which can be described and, in the simplest cases, understood through analytic (and explicit) functions and probability distributions which exhibit some universal features. Universality classes are concomitantly observed which can be characterized through indices such as $q$. We will exhibit here some such cases, namely concerning the distribution of inter-occurrence (or inter-event) times in the areas of finance, earthquakes and genomes.

en cond-mat.stat-mech, q-bio.GN

Detail DOI Sumber

arXiv Open Access 2015

Detrended fluctuation analysis made flexible to detect range of cross-correlated fluctuations

Jaroslaw Kwapien, Pawel Oswiecimka, Stanislaw Drozdz

The detrended cross-correlation coefficient $ρ_{\rm DCCA}$ has recently been proposed to quantify the strength of cross-correlations on different temporal scales in bivariate, non-stationary time series. It is based on the detrended cross-correlation and detrended fluctuation analyses (DCCA and DFA, respectively) and can be viewed as an analogue of the Pearson coefficient in the case of the fluctuation analysis. The coefficient $ρ_{\rm DCCA}$ works well in many practical situations but by construction its applicability is limited to detection of whether two signals are generally cross-correlated, without possibility to obtain information on the amplitude of fluctuations that are responsible for those cross-correlations. In order to introduce some related flexibility, here we propose an extension of $ρ_{\rm DCCA}$ that exploits the multifractal versions of DFA and DCCA: MFDFA and MFCCA, respectively. The resulting new coefficient $ρ_q$ not only is able to quantify the strength of correlations, but also it allows one to identify the range of detrended fluctuation amplitudes that are correlated in two signals under study. We show how the coefficient $ρ_q$ works in practical situations by applying it to stochastic time series representing processes with long memory: autoregressive and multiplicative ones. Such processes are often used to model signals recorded from complex systems and complex physical phenomena like turbulence, so we are convinced that this new measure can successfully be applied in time series analysis. In particular, we present an example of such application to highly complex empirical data from financial markets. The present formulation can straightforwardly be extended to multivariate data in terms of the $q$-dependent counterpart of the correlation matrices and then to the network representation.

en physics.data-an, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2009

Quantitative features of multifractal subtleties in time series

Stanislaw Drozdz, Jaroslaw Kwapien, Pawel Oswiecimka et al.

Based on the Multifractal Detrended Fluctuation Analysis (MFDFA) and on the Wavelet Transform Modulus Maxima (WTMM) methods we investigate the origin of multifractality in the time series. Series fluctuating according to a qGaussian distribution, both uncorrelated and correlated in time, are used. For the uncorrelated series at the border (q=5/3) between the Gaussian and the Levy basins of attraction asymptotically we find a phase-like transition between monofractal and bifractal characteristics. This indicates that these may solely be the specific nonlinear temporal correlations that organize the series into a genuine multifractal hierarchy. For analyzing various features of multifractality due to such correlations, we use the model series generated from the binomial cascade as well as empirical series. Then, within the temporal ranges of well developed power-law correlations we find a fast convergence in all multifractal measures. Besides of its practical significance this fact may reflect another manifestation of a conjectured q-generalized Central Limit Theorem.

en physics.data-an, nlin.CD

Detail DOI Sumber

arXiv Open Access 2009

Modified detrended fluctuation analysis based on empirical mode decomposition

Xi-Yuan Qian, Wei-Xing Zhou, Gao-Feng Gu

Detrended fluctuation analysis (DFA) is a simple but very efficient method for investigating the power-law long-term correlations of non-stationary time series, in which a detrending step is necessary to obtain the local fluctuations at different timescales. We propose to determine the local trends through empirical mode decomposition (EMD) and perform the detrending operation by removing the EMD-based local trends, which gives an EMD-based DFA method. Similarly, we also propose a modified multifractal DFA algorithm, called an EMD-based MFDFA. The performance of the EMD-based DFA and MFDFA methods is assessed with extensive numerical experiments based on fractional Brownian motion and multiplicative cascading process. We find that the EMD-based DFA method performs better than the classic DFA method in the determination of the Hurst index when the time series is strongly anticorrelated and the EMD-based MFDFA method outperforms the traditional MFDFA method when the moment order $q$ of the detrended fluctuations is positive. We apply the EMD-based MFDFA to the one-minute data of Shanghai Stock Exchange Composite index, and the presence of multifractality is confirmed.

en cond-mat.stat-mech, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2007

On a generalised model for time-dependent variance with long-term memory

Silvio M. Duarte Queiros

The ARCH process (R. F. Engle, 1982) constitutes a paradigmatic generator of stochastic time series with time-dependent variance like it appears on a wide broad of systems besides economics in which ARCH was born. Although the ARCH process captures the so-called "volatility clustering" and the asymptotic power-law probability density distribution of the random variable, it is not capable to reproduce further statistical properties of many of these time series such as: the strong persistence of the instantaneous variance characterised by large values of the Hurst exponent (H > 0.8), and asymptotic power-law decay of the absolute values self-correlation function. By means of considering an effective return obtained from a correlation of past returns that has a q-exponential form we are able to fix the limitations of the original model. Moreover, this improvement can be obtained through the correct choice of a sole additional parameter, $q_{m}$. The assessment of its validity and usefulness is made by mimicking daily fluctuations of SP500 financial index.

en physics.data-an, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2007

A case study of speculative financial bubbles in the South African stock market 2003-2006

Wei-Xing Zhou, Didier Sornette

We tested 45 indices and common stocks traded in the South African stock market for the possible existence of a bubble over the period from Jan. 2003 to May 2006. A bubble is defined by a faster-than-exponential acceleration with significant log-periodic oscillations. The faster-than-exponential acceleration characteristics are tested with several different metrics, including nonlinearity on the logarithm of the price and power law fits. The log-periodic properties are investigated in detail using the first-order log-periodic power-law (LPPL) formula, the parametric detrending method, the $(H,q)$-analysis, and the second-order Weierstrass-type model, resulting in a consistent and robust estimation of the fundamental angular log-frequency $ω_1 =7\pm 2$, in reasonable agreement with previous estimations on many other bubbles in developed and developing markets. Sensitivity tests of the estimated critical times and of the angular log-frequency are performed by varying the first date and the last date of the stock price time series. These tests show that the estimated parameters are robust. With the insight of 6 additional month of data since the analysis was performed, we observe that many of the stocks on the South Africa market experienced an abrupt drop mid-June 2006, which is compatible with the predicted $t_c$ for several of the stocks, but not all. This suggests that the mini-crash that occurred around mid-June of 2006 was only a partial correction, which has resumed into a renewed bubbly acceleration bound to end some times in 2007, similarly to what happened on the S&P500 US market from Oct. 1997 to Aug. 1998.

en physics.soc-ph, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2006

Liquidity and the multiscaling properties of the volume traded on the stock market

Zoltan Eisler, Janos Kertesz

We investigate the correlation properties of transaction data from the New York Stock Exchange. The trading activity f(t) of each stock displays a crossover from weaker to stronger correlations at time scales 60-390 minutes. In both regimes, the Hurst exponent H depends logarithmically on the liquidity of the stock, measured by the mean traded value per minute. All multiscaling exponents tau(q) display a similar liquidity dependence, which clearly indicates the lack of a universal form assumed by other studies. The origin of this behavior is both the long memory in the frequency and the size of consecutive transactions.

en physics.soc-ph, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2003

Multifractal Features in the Foreign Exchange and Stock Markets

Kyungsik Kim, Seong-Min Yoon

The multifractal behavior for tick data of prices is investigated in Korean financial market. Using the rescaled range analysis(R/S analysis), we show the multifractal nature of returns for the won-dollar exchange rate and the KOSPI. We also estimate the Hurst exponent and the generalized $q$th-order Hurst exponent in the unversal multifractal framework. Particularly, our financial market is a persistent process with long-run memory effects, and the statistical value of the Hurst exponents occurs the crossovers at charateristic time scales. It is found that the probability distribution of returns is well consistent with a Lorentz distribution, significantly different from fat-tailed properties.

en cond-mat.stat-mech, q-fin.ST

Detail Sumber

arXiv Open Access 2006

Nonextensive statistical features of the Polish stock market fluctuations

R. Rak, S. Drozdz, J. Kwapien

The statistics of return distributions on various time scales constitutes one of the most informative characteristics of the financial dynamics. Here we present a systematic study of such characteristics for the Polish stock market index WIG20 over the period 04.01.1999 - 31.10.2005 for the time lags ranging from one minute up to one hour. This market is commonly classified as emerging. Still on the shortest time scales studied we find that the tails of the return distributions are consistent with the inverse cubic power-law, as identified previously for majority of the mature markets. Within the time scales studied a quick and considerable departure from this law towards a Gaussian can however be traced. Interestingly, all the forms of the distributions observed can be comprised by the single $q$-Gaussians which provide a satisfactory and at the same time compact representation of the distribution of return fluctuations over all magnitudes of their variation. The corresponding nonextensivity parameter $q$ is found to systematically decrease when increasing the time scales.

en physics.data-an, astro-ph

Detail DOI Sumber

arXiv Open Access 2005

Scaling and memory of intraday volatility return intervals in stock market

Fengzhong Wang, Kazuko Yamasaki, Shlomo Havlin et al.

We study the return interval $τ$ between price volatilities that are above a certain threshold $q$ for 31 intraday datasets, including the Standard & Poor's 500 index and the 30 stocks that form the Dow Jones Industrial index. For different threshold $q$, the probability density function $P_q(τ)$ scales with the mean interval $\barτ$ as $P_q(τ)={\barτ}^{-1}f(τ/\barτ)$, similar to that found in daily volatilities. Since the intraday records have significantly more data points compared to the daily records, we could probe for much higher thresholds $q$ and still obtain good statistics. We find that the scaling function $f(x)$ is consistent for all 31 intraday datasets in various time resolutions, and the function is well approximated by the stretched exponential, $f(x)\sim e^{-a x^γ}$, with $γ=0.38\pm 0.05$ and $a=3.9\pm 0.5$, which indicates the existence of correlations. We analyze the conditional probability distribution $P_q(τ|τ_0)$ for $τ$ following a certain interval $τ_0$, and find $P_q(τ|τ_0)$ depends on $τ_0$, which demonstrates memory in intraday return intervals. Also, we find that the mean conditional interval $<τ|τ_0>$ increases with $τ_0$, consistent with the memory found for $P_q(τ|τ_0)$. Moreover, we find that return interval records have long term correlations with correlation exponents similar to that of volatility records.

en physics.soc-ph, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2005

Volatility, Persistence, and Survival in Financial Markets

M. Constantin, S. Das Sarma

We study the temporal fluctuations in time-dependent stock prices (both individual and composite) as a stochastic phenomenon using general techniques and methods of nonequilibrium statistical mechanics. In particular, we analyze stock price fluctuations as a non-Markovian stochastic process using the first-passage statistical concepts of persistence and survival. We report the results of empirical measurements of the normalized $q$-order correlation functions $f_q(t)$, survival probability $S(t)$, and persistence probability $P(t)$ for several stock market dynamical sets. We analyze both minute-to-minute and higher frequency stock market recordings (i.e., with the sampling time $δt$ of the order of days). We find that the fluctuating stock price is multifractal and the choice of $δt$ has no effect on the qualitative multifractal behavior displayed by the $1/q$-dependence of the generalized Hurst exponent $H_q$ associated with the power-law evolution of the correlation function $f_q(t)\sim t^{H_q}$. The probability $S(t)$ of the stock price remaining above the average up to time $t$ is very sensitive to the total measurement time $t_m$ and the sampling time. The probability $P(t)$ of the stock not returning to the initial value within an interval $t$ has a universal power-law behavior, $P(t)\sim t^{-θ}$, with a persistence exponent $θ$ close to 0.5 that agrees with the prediction $θ=1-H_2$. The empirical financial stocks also present an interesting feature found in turbulent fluids, the extended self-similarity.

en physics.soc-ph, cond-mat.stat-mech

Detail DOI Sumber

arXiv Open Access 2002

Non-Parametric Analyses of Log-Periodic Precursors to Financial Crashes

Wei-Xing Zhou, Didier Sornette

We apply two non-parametric methods to test further the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The analysis using the so-called (H,q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(t_c-t) variable, where t_c is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency $f = 1.02 \pm 0.05$ corresponding to the scaling ratio $λ= 2.67 \pm 0.12$. These values are in very good agreement with those obtained in past works with different parametric techniques.

en cond-mat.stat-mech, q-fin.ST

Detail DOI Sumber

arXiv Open Access 2004

Multifractal Behavior of the Korean Stock-market Index KOSPI

Jae Woo Lee, Kyuoung Eun Lee, Per Arne Rikvold

We investigate multifractality in the Korean stock-market index KOSPI. The generalized $q$th order height-height correlation function shows multiscaling properties. There are two scaling regimes with a crossover time around $t_c =40$ min. We consider the original data sets and the modified data sets obtained by removing the daily jumps, which occur due to the difference between the closing index and the opening index. To clarify the origin of the multifractality, we also smooth the data through convolution with a Gaussian function. After convolution we observe that the multifractality disappears in the short-time scaling regime $t<t_c$, but remains in the long-time scaling regime $t>t_c$, regardless of whether or not the daily jumps are removed. We suggest that multifractality in the short-time scaling regime is caused by the local fluctuations of the stock index. But the multifractality in the long-time scaling regime appears to be due to the intrinsic trading properties, such as herding behavior, information outside the market, the long memory of the volatility, and the nonlinear dynamics of the stock market.

en nlin.CD, cond-mat.other

Detail DOI Sumber

arXiv Open Access 2003

Nonextensive statistical mechanics and economics

Constantino Tsallis, Celia Anteneodo, Lisa Borland et al.

Ergodicity, this is to say, dynamics whose time averages coincide with ensemble averages, naturally leads to Boltzmann-Gibbs (BG) statistical mechanics, hence to standard thermodynamics. This formalism has been at the basis of an enormous success in describing, among others, the particular stationary state corresponding to thermal equilibrium. There are, however, vast classes of complex systems which accomodate quite badly, or even not at all, within the BG formalism. Such dynamical systems exhibit, in one way or another, nonergodic aspects. In order to be able to theoretically study at least some of these systems, a formalism was proposed 14 years ago, which is sometimes referred to as nonextensive statistical mechanics. We briefly introduce this formalism, its foundations and applications. Furthermore, we provide some bridging to important economical phenomena, such as option pricing, return and volume distributions observed in the financial markets, and the fascinating and ubiquitous concept of risk aversion. One may summarize the whole approach by saying that BG statistical mechanics is based on the entropy $S_{BG}=-k \sum_i p_i \ln p_i$, and typically provides {\it exponential laws} for describing stationary states and basic time-dependent phenomena, while nonextensive statistical mechanics is instead based on the entropic form $S_q=k(1-\sum_ip_i^q)/(q-1)$ (with $S_1=S_{BG}$), and typically provides, for the same type of description, (asymptotic) {\it power laws}.

en cond-mat.stat-mech, cs.CE

Detail DOI Sumber

arXiv Open Access 1999

"Nonlinear" covariance matrix and portfolio theory for non-Gaussian multivariate distributions

D. Sornette, P. Simonetti, J. V. Andersen

This paper offers a precise analytical characterization of the distribution of returns for a portfolio constituted of assets whose returns are described by an arbitrary joint multivariate distribution. In this goal, we introduce a non-linear transformation that maps the returns onto gaussian variables whose covariance matrix provides a new measure of dependence between the non-normal returns, generalizing the covariance matrix into a non-linear fractional covariance matrix. This nonlinear covariance matrix is chiseled to the specific fat tail structure of the underlying marginal distributions, thus ensuring stability and good-conditionning. The portfolio distribution is obtained as the solution of a mapping to a so-called phi-q field theory in particle physics, of which we offer an extensive treatment using Feynman diagrammatic techniques and large deviation theory, that we illustrate in details for multivariate Weibull distributions. The main result of our theory is that minimizing the portfolio variance (i.e. the relatively ``small'' risks) may often increase the large risks, as measured by higher normalized cumulants. Extensive empirical tests are presented on the foreign exchange market that validate satisfactorily the theory. For ``fat tail'' distributions, we show that an adequete prediction of the risks of a portfolio relies much more on the correct description of the tail structure rather than on their correlations.

en cond-mat.stat-mech, q-fin.ST

Detail DOI Sumber

Hasil untuk "q-fin.ST"