Table#

class featherstore.table.Table(table_name, store_name)[source]#

Bases: object

A class for saving and loading DataFrames as partitioned Feather files.

Tables supports several operations that can be done without loading in the full data:

Partial reading of data
Append data
Insert data
Update data
Drop data
Read metadata (column names, index, table shape, etc)
Changing column types
Changing types

Parameters

table_name (str) – The name of the table.
store_name (str) – The name of the store.

read_arrow(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a PyArrow Table

Parameters

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

pyarrow.Table

read_pandas(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Pandas DataFrame

Parameters

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

pandas.DataFrame or pandas.Series

read_polars(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Polars DataFrame

Parameters

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

polars.DataFrame

write(df, /, index=None, *, partition_size=134217728, errors='raise', warnings='warn')[source]#

Writes a DataFrame to the current table.

The DataFrame index column, if provided, must be either of type int, str, or datetime. FeatherStore sorts the DataFrame by the index before storage.

Parameters

df (Pandas DataFrame or Series, Pyarrow Table, or Polars DataFrame) – The DataFrame to be stored
index (str, optional) – The name of the column to be used as index. Uses current index for Pandas or a standard integer index for Arrow and Polars if index not provided, by default None
partition_size (int, optional) – The size of each partition in bytes. A partition_size value of -1 disables partitioning, by default 128 MB
errors (str, optional) – Whether or not to raise an error if the table already exist. Can be either raise or ignore, ignore overwrites existing table, by default raise
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

append(df, *, warnings='warn')[source]#

Appends data to the current table

Parameters

df (Pandas DataFrame or Series, Pyarrow Table, or Polars DataFrame) – The data to be appended
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

update(df)[source]#

Updates data in the current table.

Note: You can’t use this method to update index values. Updating index values can be accomplished by deleting the old records and inserting new ones with the updated index values.

Parameters: df (Pandas DataFrame or Pandas Series) – The updated data. The index of df is the rows to be updated, while the columns of df are the new values.

insert(df)[source]#

Insert one or more rows into the current table.

Parameters: df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index and column types as the stored data.

add_columns(df, idx=-1)[source]#

Insert one or more columns into the current table.

Parameters

df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index as the stored data.
idx (int) – The position to insert the new column(s). Default is to add columns to the end.

drop(*, cols=None, rows=None)[source]#

Drop specified labels from rows or columns.

Parameters

cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None
rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

Raises

AttributeError – Raised if neither of rows and cols are provided.

drop_rows(rows)[source]#

Drops specified rows from table

Same as Table.drop(rows=value)

Parameters: rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

drop_columns(cols)[source]#

Drops specified rows from table

Same as Table.drop(cols=value)

Parameters: cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

rename_columns(cols, *, to=None)[source]#

Rename one or more columns.

rename_columns supports two different call-syntaxes:

rename_columns({‘c1’: ‘new_c1’, ‘c2’: ‘new_c2’})
rename_columns([‘c1’, ‘c2’], to=[‘new_c1’, ‘new_c2’])

Parameters

cols (Collection) – Either a list of columns to be renamed, or a dict mapping columns to be renamed to new column names
to (Collection[str], optional) – New column names, by default None

property columns[source]#

Fetches the names of the table columns

Returns: The table columns
Return type: list

reorder_columns(cols)[source]#

Reorder the current columns

Parameters: cols (Sequence[str]) – The new column ordering. The column names provided must be the same as the column names used in the table.

property index[source]#

Fetches the table index

Return type: pandas.Index

astype(cols, *, to=None)[source]#

Change data type of one or more columns.

astype supports two different call-syntaxes:

astype({‘c1’: pa.int64(), ‘c2’: pa.int16()})
astype([‘c1’, ‘c2’], to=[pa.int64(), pa.int16()])

Parameters

cols (Sequence[str] or dict) – Either a sequence of columns to have its data types changed, or a dict mapping columns to new column data types.
to (Sequence[Pyarrow DataType], optional) – New column data types, by default None

rename_table(*, to)[source]#

Renames the current table

Parameters: to (str) – The new name of the table.

drop_table()[source]#: Deletes the current table

create_snapshot(path)[source]#

Creates a compressed backup of the table.

The table can later be restored by using snapshot.restore_table().

Parameters: path (str) – The path to the snapshot archive.

repartition(new_partition_size)[source]#

Repartitions a table so that each partition is new_partition_size big.

Parameters: new_partition_size (int) – The size of each partition in bytes. A new_partition_size value of -1 disables partitioning

property shape[source]#

Fetches the shape of the stored table as (rows, columns).

Returns: The shape of the table
Return type: tuple(int, int)

property partition_size[source]#

Fetches the table partition size in bytes.

Returns: The partition size in bytes.
Return type: int

exists()[source]#

property name[source]#