Quickstart ========== This is a short introduction to FeatherStore basic functionality. For a complete guide to FeatherStores classes, functions, and methods please visit the `API reference `_. Installation ------------ To install FeatherStore, simply use pip .. code-block:: pip install featherstore Starting Up ----------- .. code-block:: python import featherstore as fs To create and connect to a new database simply use: .. code-block:: python fs.create_database('/path/to/database_folder') fs.connect('/path/to/database_folder') You can later disconnect from the database by using ``fs.disconnect()`` Working with Stores ------------------- A database consists of one or more stores. A store is the basic unit for organization and where you can store your tables. .. code-block:: python fs.create_store('store_1') fs.create_store('store_2') fs.list_stores() >> ['store_1', 'store_2'] .. code-block:: python fs.drop_store('store_2') fs.rename_store('store_1', 'example_store') # Connect to store store = fs.Store('example_store') Reading and Writing Tables -------------------------- FeatherStore supports reading and writing of Pandas DataFrames and Series, Polars DataFrames and PyArrow tables. First lets create a DataFrame to store. .. code-block:: python import pandas as pd from numpy.random import randn dates = pd.date_range("2021-01-01", periods=5) df = pd.DataFrame(randn(5, 4), index=dates, columns=list("ABCD")) df >> A B C D 2021-01-01 0.402138 -0.016436 -0.565256 0.520086 2021-01-02 -1.071026 -0.326358 -0.692681 1.188319 2021-01-03 0.777777 -0.665146 1.017527 -0.064830 2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509 2021-01-05 -0.649335 -0.830602 1.191749 0.396745 FeatherStore stores the tables as partitioned Feather files. The size of each partition is defined by using the ``partition_size`` parameter when writing a table. .. code-block:: python PARTITION_SIZE = 128 # bytes store.write_table('example_table', df, partition_size=PARTITION_SIZE) store.list_tables() >> ['example_table'] The advantage with using partitioned Feather files that you can do different operations without loading in the full data. .. code-block:: python # Creating a new DataFrame new_dates = pd.date_range("2021-01-06", periods=1) df1 = pd.DataFrame(randn(1, 4), index=new_dates, columns=list("ABCD")) # Appending to a FeatherStore table only loads in the last partition store.append_table('example_table', df1) FeatherStore uses sorted indices to keep track of which partitions to open during a given operation. We can now read the stored data as Pandas DataFrame, Polars DataFrame or PyArrow Tables. .. code-block:: python store.read_pandas('example_table') # store.read_arrow('example_table') for reading to Arrow Tables # store.read_polars('example_table') for reading to Polars DataFrames >> A B C D 2021-01-01 0.402138 -0.016436 -0.565256 0.520086 2021-01-02 -1.071026 -0.326358 -0.692681 1.188319 2021-01-03 0.777777 -0.665146 1.017527 -0.064830 2021-01-04 -0.835711 -0.575801 -0.650543 -0.411509 2021-01-05 -0.649335 -0.830602 1.191749 0.396745 2021-01-06 -0.408125 -0.420920 0.632606 0.606950 We can also query parts of the data. FeatherStore uses predicate filtering to only load the partitions and columns specified by the query. By using sorted indices, FeatherStore allows for range-queries on rows by using ``{'before': end}``, ``{'after': start}`` and ``{'between': [start, end]}`` .. code-block:: python store.read_pandas('example_table', rows={'after': '2021-01-05'}, cols=['D', 'A']) # All range queries are inclusive >> D A 2021-01-05 0.396745 -0.649335 2021-01-06 0.606950 0.408125 Inserting, Updating and Deleting Data ------------------------------------- First, let's create a new table to work with: .. code-block:: python index = [1, 3, 5, 6] df = pd.DataFrame(randn(4, 2), index=index, columns=list("AB")) df >> A B 1 -0.041727 0.957139 3 -0.272294 -1.758717 5 -0.353684 1.550073 6 1.275938 1.054702 We can use ``Store.select_table()`` to select a ``Table`` object, which contains more features for working with tables. .. code-block:: python table = store.select_table('example_table2') table.exists # False table.write(df) table.exists >> True One of those features is ``Table.insert()``, which allows for adding extra rows into the table. .. note:: You can use ``Table.add_columns()`` to add extra columns. .. code-block:: python df2 = pd.DataFrame(randn(2, 2), index=[4, 2], columns=list("AB")) table.insert(df2) # Must have the same index and col dtypes as the stored df table.read_pandas() # The data will inserted into its sorted index position >> A B 1 -0.041727 0.957139 2 2.163615 -0.708871 3 -0.272294 -1.758717 4 -1.263981 -0.961670 5 -0.353684 1.550073 6 1.275938 1.054702 Other features include ``Table.update()`` and ``Table.drop()`` which updates and deletes data. .. code-block:: python df3 = pd.DataFrame([[0, 2], [1, 3]], index=[1, 2], columns=list("AB")) # A B # 1 0 1 # 2 2 3 table.update(df3) table.drop(rows={'after': 5}) # You can also drop columns using table.drop(cols=['col1', 'col2']) >> A B 1 0.000000 1.000000 2 2.000000 3.000000 3 -0.272294 -1.758717 4 -1.263981 -0.961670