Guoliang Li , Ge Yu , Jun Yang , Ju Fan
2022, 12(1):1-5. DOI: 10.21655/ijsi.1673-7288.00272
Abstract:Preface to Special Issue on New Technologies of Database Systems
Xiang Yu , Chengliang Chai , Xinning Zhang , Nan Tang , Ji Sun , Guoliang Li
2022, 12(1):7-29. DOI: 10.21655/ijsi.1673-7288.00275
Abstract:Recently, learned query optimizers typically driven by deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art commercial optimizers. A successful learning optimizer often relies on enough high-quality load queries as training data, and poor-quality training will lead to the query failure of learned query optimizers. In this paper, we propose a novel training framework AlphaQO for robust learned query optimizers based on Reinforcement Learning (RL), and the robustness of the optimizers can be improved by finding the bad queries in advance. AlphaQO is a loop system consisting of two main components, namely the query generator and the learned optimizer. A query generator aims at generating ``difficult'' queries (i.e., queries that the learned optimizer provides poor estimates). The learned optimizer will be trained using these generated queries, as well as providing feedback (in terms of numerical rewards) to the query generator for updates. If the generated queries are good, the query generator will get a high reward; otherwise, the query generator will get a low reward. The above process is performed iteratively, with the main goal that within a small budget, the learned optimizer can be trained and generalized well to a wide range of unseen queries. Extensive experiments show that AlphaQO can generate a relatively small number of queries and train a learned optimizer to outperform commercial optimizers. Moreover, learned optimizers require much fewer queries from AlphaQO than randomly generated queries for the quality training of the learned optimizer.
Xingda Wei , Fangming Lu , Rong Chen , Haibo Chen , Binyu Zang
2022, 12(1):31-53. DOI: 10.21655/ijsi.1673-7288.00274
Abstract:The emergency of Hardware Transactional Memory (HTM) has greatly boosted the transaction processing performance in in-memory databases. However, the group commit protocol, aiming at reducing the impact from slow storage devices, leads to high transaction commit latency. Non-Volatile Memory (NVM) opens opportunities for reducing transaction commit latency. However, HTM cannot cooperate with NVM together: flushing data to NVM will always cause HTM to abort. In this paper, we propose a technique called parity version to decouple the process of HTM execution and NVM write. Thus, the transactions can correctly and efficiently use NVM to reduce their commit latency with HTM. We have integrated this technique into DBX, a state-of-the-art HTM-based database, and propose DBXN: a low-latency and high-throughput in-memory transaction processing system. Evaluations using typical OLTP workloads including TPC-C show that it has 99% lower latency and 2.1 times higher throughput than DBX.
Hongyao Zhao , Zhanhao Zhao , Wanqing Yang , Wei Lu , Haixiang Li , Xiaoyong Du
2022, 12(1):55-88. DOI: 10.21655/ijsi.1673-7288.00276
Abstract:The concurrency control algorithm is a key approach for a database system to guarantee the correctness and efficiency of the transaction execution. Thus, substantial effort has been devoted to proposing new concurrency control algorithms in both the database industry and academia. In this paper, we take the lead in summarizing the fundamental ideas of concurrency control algorithms as ``ordering-and-verifying''. We then redescribe and sort out the existing concurrency control algorithms following the ordering-and-verifying paradigm. On the basis of extensive comparative experiments on an open-source main-memory distributed transaction testbed called 3TS, we systematically investigate the advantages and disadvantages of the mainstream concurrency control algorithms and finally summarize the preferable application scenario for each algorithm to provide valuable references for follow-up research on concurrency control algorithms used in main-memory databases.
Xu Zhou , Tongfeng Weng , Zhibang Yang , Boren Li , Ji Zhang , Kenli Li
2022, 12(1):89-105. DOI: 10.21655/ijsi.1673-7288.00277
Abstract:Tip decomposition has a pivotal role in mining cohesive subgraphs in bipartite graphs. It is a popular research topic with wide applications in document clustering, spam group detection, and analysis of affiliation networks. With the explosive growth of the bipartite graph data scale in these scenarios, it is necessary to use distributed methods to realize its effective storage. For this reason, this paper studies the problem of the tip decomposition on a bipartite graph in the distributed environment for the first time. Firstly, a new relay-based communication mode is proposed to realize effective message transmission when the given bipartite graph is decomposed in a distributed environment. Secondly, the Distributed Butterfly Counting (DBC) algorithm and the Distributed Tip Decomposition (DTD) algorithm are designed. In particular, a controllable parallel vertex activation strategy is proposed to solve the problem of memory overflow when DBC decomposes large-scale bipartite graphs. Finally, the message pruning strategy based on vertex priority and message validity pruning strategy are introduced to further improve the efficiency of the algorithm by reducing redundant communication and computing overhead. The experiment is deployed on the high-performance distributed computing platform of the National Supercomputing Center. The effectiveness and efficiency of the proposed algorithms are verified by experiments on several real datasets.
Shuyuan Li , Yudian Ji , Dingyuan Shi , Wangdong Liao , Lipeng Zhang , Yongxin Tong , Ke Xu
2022, 12(1):107-129. DOI: 10.21655/ijsi.1673-7288.00273
Abstract:In the era of big data, data is of great value as an essential factor in production. It is of great significance to implement its analysis, mining, and utilization of large-scale data via data sharing. However, due to the heterogeneous dispersion of data and increasingly rigorous privacy protection regulations, data owners cannot arbitrarily share data, and thus data owners are turned into data silos. Since data federation can achieve collaborative queries while preserving the privacy of data silos, we present in this paper a secure multi-party relational data federation system based on the idea of federated computation that ``data stays, computation moves.'' The system is compatible with a variety of relational databases and can shield users from the heterogeneity of the underlying data from multiple data owners. On the basis of secret sharing, the system implements the secure multi-party operator library supporting the secure multi-party basic operations, and the resulting reconstruction process of operators is optimized with higher execution efficiency. On this basis, the system supports query operations such as Summation (SUM), Averaging (AVG), Minimization/Maximization (MIN/MAX), equi-join, and $\theta $-join and makes full use of multi-party features to reduce data interactions among data owners and security overhead, thus effectively supporting efficient data sharing. Finally, experiments are conducted on the benchmark dataset TPC-H. The experimental results show that the system can support more data owners than the current data federation systems SMCQL and Conclave and has higher execution efficiency in a variety of query operations, exceeding the existing systems by as much as 3.75 times.
Mao Luo , Chumin Li , Xinyun Wu , Shuolin Li , Zhipeng Lv
2022, 12(1):131-151. DOI: 10.21655/ijsi.1673-7288.00278
Abstract:The two most effective branching strategies LRB and VSIDS perform differently on different types of instances. Generally, LRB is more effective on crafted instances, while VSIDS is more effective on application ones. However, distinguishing the types of instances is difficult. To overcome this drawback, we propose a branching strategy selection approach based on the vivification ratio. This approach uses the LRB branching strategy more to solve the instances with a very low vivification ratio. We tested the instances from the main track of SAT competitions in recent years. The results show that the proposed approach is robust and it significantly increases the number of solved instances. It is worth mentioning that, with the help of our approach, the solver Maple_CM can solve additional 16 instances for the benchmark from the 2020 SAT competition.