PosDB in 2023: State of Affairs
George Chernishev
Saint-Petersburg State University
PosDB is a columnar DBMS with SQL support designed for analytic workloads. Unlike many
classic columnar systems, PosDB features fluent data position handling, enabling the query
engine to explicitly use positions during query evaluation. This makes available a number of
optimizations and techniques that offer various benefits for query processing.
Five years ago we presented our system here and discussed its internals (see meeting 202).
Today, we are going to describe what we have achieved in these five years, and, what is more
important, the evolution of our vision.
Since then:
1) We have formulated three query evaluation strategies for our original query processing
model. Then we have benchmarked them vs PostgreSQL and MariaDB Column-Store.
Experiments with the Star Schema Benchmark demonstrated that one of our strategies, ultra-
late materialization, is superior to these systems by 2-4 times in terms of performance.
2) We have reflected on strategies and their properties. Our analysis has demonstrated that
ultra-late materialization has its drawbacks, one of them being out-of-order disk probing. To
counter it, we have devised and implemented a hybrid query processing model, which improved
ultra-late materialization results even further.
3) We have explored a number of query processing aspects, namely: window function
computation, intermediate result caching, data compression, external sort, and recursive query
processing. Positive results achieved in each of these cases are due to the query engine that
can operate on positions.
4) We’ve made several technical improvements: we implemented a disk sub-system with a
proper buffer manager, as well as the distributed join and aggregation operators with data
repartition. We moved away from a columnar catalogue to tabular, as well as developed a
parser and a simple plan generator, which allowed us to create a demo web site which can be
seen below. All this brings PosDB closer to industrial usage.
All these points will be covered during the talk. For each we will overview the paper in which the
result is discussed in detail, and highlight its limitations and the place among the state-of-the-art.
Finally, we will also describe what is in the works right now and present our future plans.
You can try an interactive demo of PosDB here: https://pos-db.com/
Слайды доклада
Видео доклада.
Литература:
- Abadi, D., Boncz, P., Harizopoulos, S.: The design and implementation of modern column-oriented database systems. Now Publishers Inc., Hanover, MA, USA (2013)
- Harizopoulos, S., Abadi, D., Boncz, P.: Column-oriented database systems, vldb 2009 tutorial slides. (2009), nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf
- Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-Store: A column-oriented dbms. In: Proceedings of the 31st International Conference on Very Large Data Bases. p. 553–564. VLDB ’05, VLDB Endowment (2005)
- Abadi, D.J., Myers, D.S., DeWitt, D.J., Madden, S.R.: Materialization strategies in a column-oriented dbms. In: 2007 IEEE 23rd International Conference on Data Engineering. pp. 466–475 (2007).
- Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012), http://sites.computer.org/debull/A12mar/monetdb.pdf
- Shrinivas, L., Bodagala, S., Varadarajan, R., Cary, A., Bharathan, V., Bear, C.: Materialization strategies in the vertica analytic database: Lessons learned. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE). pp. 1196–1207 (2013).
- Chernishev, G.A., Galaktionov, V., Grigorev, V.D., Klyuchikov, E., Smirnov, K.: PosDB: An architecture overview. Program. Comput. Softw. 44(1), 62–74 (2018).
- Mukhaleva, N., Grigorev, V.D., Chernishev, G.A.: Implementing window functions in a column-store with late materialization. In: Schewe, K., Singh, N.K. (eds.) Model and Data Engineering - 9th International Conference, MEDI 2019, Toulouse, France, October 28-31, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11815, pp. 303–313. Springer (2019).
- Galaktionov, V., Klyuchikov, E., Chernishev, G.A.: Position caching in a column-store with late materialization: An initial study. In: Song, I., Hose, K., Romero, O. (eds.) Proceedings of the 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with EDBT/ICDT 2020 Joint Conference, DOLAP@EDBT/ICDT 2020, Copenhagen, Denmark, March 30, 2020. CEUR Workshop Proceedings, vol. 2572, pp. 89–93. CEUR-WS.org (2020)
- Slesarev, A., Klyuchikov, E., Smirnov, K., Chernishev, G.A.: Revisiting data compression in column-stores. In: Attiogbe, J.C., Yahia, S.B. (eds.) Model and Data Engineering - 10th International Conference, MEDI 2021, Tallinn, Estonia, June 21-23, 2021, Proceedings. Lecture Notes in Computer Science, vol. 12732, pp. 279–292. Springer (2021).
- Polyntsov, M., Grigorev, V., Smirnov, K., Chernishev, G.: Implementing the comparison-based external sort. In: Chiusano, S., Cerquitelli, T., Wrembel, R., Norvag, K., Catania, B., Vargas-Solar, G., Zumpano, E. (eds.) New Trends in Database and Information Systems. pp. 500–511. Springer International Publishing, Cham (2022)
- Chernishev, G.A., Galaktionov, V., Grigorev, V.V., Klyuchikov, E., Smirnov, K.: A comprehensive study of late materialization strategies for a disk-based column-store. In: Stefanidis, K., Golab, L. (eds.) Proceedings of the 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) co-located with EDBT/ICDT’22, Edinburgh, UK, March 29, 2022. CEUR Workshop Proceedings, vol. 3130, pp. 21–30. CEUR-WS.org (2022)
|