Table#

class featherstore.table.Table(table_name, store_name)[source]#

Bases: object

A class for saving and loading DataFrames as partitioned Feather files.

Tables supports several operations that can be done without loading in the full data:

Partial reading of data
Append data
Insert data
Update data
Drop data
Read metadata (column names, index, table shape, etc)
Changing column types
Changing types

Parameters:

table_name (str) – The name of the table.
store_name (str) – The name of the store.

read_arrow(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a PyArrow Table

Parameters:

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

pyarrow.Table

read_pandas(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Pandas DataFrame or Series

Parameters:

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

pandas.DataFrame or pandas.Series

read_polars(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Polars DataFrame or Series

Parameters:

cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

polars.DataFrame or polars.Series

write(df, /, index=None, *, partition_size=134217728, errors='raise', warnings='warn')[source]#

Writes a DataFrame to the current table.

The DataFrame index column, if provided, must be either of type int, str, or datetime. FeatherStore sorts the DataFrame by the index before storage.

Parameters:

df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The DataFrame to be stored
index (str, optional) – The name of the column to be used as index. Uses current index for Pandas or a standard integer index for Arrow and Polars if index not provided, by default None
partition_size (int, optional) – The size of each partition in bytes. A partition_size value of -1 disables partitioning, by default 128 MB
errors (str, optional) – Whether or not to raise an error if the table already exist. Can be either raise or ignore, ignore overwrites existing table, by default raise
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

append(df, *, warnings='warn')[source]#

Appends data to the current table

Parameters:

df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The data to be appended
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

update(df)[source]#

Updates data in the current table.

Note: You can’t use this method to update index values. Updating index values can be accomplished by deleting the old records and inserting new ones with the updated index values.

Parameters:: df (Pandas DataFrame or Pandas Series) – The updated data. The index of df is the rows to be updated, while the columns of df are the new values.

insert(df)[source]#

Insert one or more rows into the current table.

Parameters:: df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index and column types as the stored data.

add_columns(df, idx=-1)[source]#

Insert one or more columns into the current table.

Parameters:

df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index as the stored data.
idx (int) – The position to insert the new column(s). Default is to add columns to the end.

drop(*, cols=None, rows=None)[source]#

Drop specified labels from rows or columns.

Parameters:

cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None
rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

Raises:

AttributeError – Raised if neither of rows and cols are provided.

drop_rows(rows)[source]#

Drops specified rows from table

Same as Table.drop(rows=value)

Parameters:: rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

drop_columns(cols)[source]#

Drops specified rows from table

Same as Table.drop(cols=value)

Parameters:: cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

rename_columns(cols, *, to=None)[source]#

Rename one or more columns.

rename_columns supports two different call-syntaxes:

rename_columns({‘c1’: ‘new_c1’, ‘c2’: ‘new_c2’})
rename_columns([‘c1’, ‘c2’], to=[‘new_c1’, ‘new_c2’])

Parameters:

cols (Collection) – Either a list of columns to be renamed, or a dict mapping columns to be renamed to new column names
to (Collection[str], optional) – New column names, by default None

property columns[source]#

Fetches the names of the table columns

Returns:: The table columns
Return type:: list

reorder_columns(cols)[source]#

Reorder the current columns

Parameters:: cols (Sequence[str]) – The new column ordering. The column names provided must be the same as the column names used in the table.

property index[source]#

Fetches the table index

Return type:: pandas.Index

astype(cols, *, to=None)[source]#

Change data type of one or more columns.

astype supports two different call-syntaxes:

astype({‘c1’: pa.int64(), ‘c2’: pa.int16()})
astype([‘c1’, ‘c2’], to=[pa.int64(), pa.int16()])

Parameters:

cols (Sequence[str] or dict) – Either a sequence of columns to have its data types changed, or a dict mapping columns to new column data types.
to (Sequence[Pyarrow DataType], optional) – New column data types, by default None

rename_table(*, to)[source]#

Renames the current table

Parameters:: to (str) – The new name of the table.

drop_table()[source]#: Deletes the current table

create_snapshot(path)[source]#

Creates a compressed backup of the table.

The table can later be restored by using snapshot.restore_table().

Parameters:: path (str) – The path to the snapshot archive.

repartition(new_partition_size)[source]#

Repartitions a table so that each partition is new_partition_size big.

Parameters:: new_partition_size (int) – The size of each partition in bytes. A new_partition_size value of -1 disables partitioning

property shape[source]#

Fetches the shape of the stored table as (rows, columns).

Returns:: The shape of the table
Return type:: tuple(int, int)

property partition_size[source]#

Fetches the table partition size in bytes.

Returns:: The partition size in bytes.
Return type:: int

exists()[source]#

property name[source]#