Table#

class featherstore.table.Table(table_name, store_name)[source]#

Bases: object

A class for saving and loading DataFrames as partitioned Feather files.

Tables supports several operations that can be done without loading in the full data:

  • Partial reading of data

  • Append data

  • Insert data

  • Update data

  • Drop data

  • Read metadata (column names, index, table shape, etc)

  • Changing column types

  • Changing types

Parameters:
  • table_name (str) – The name of the table.

  • store_name (str) – The name of the store.

read_arrow(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a PyArrow Table

Parameters:
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

pyarrow.Table

read_pandas(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Pandas DataFrame or Series

Parameters:
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

pandas.DataFrame or pandas.Series

read_polars(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Polars DataFrame or Series

Parameters:
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type:

polars.DataFrame or polars.Series

write(df, /, index=None, *, partition_size=134217728, errors='raise', warnings='warn')[source]#

Writes a DataFrame to the current table.

The DataFrame index column, if provided, must be either of type int, str, or datetime. FeatherStore sorts the DataFrame by the index before storage.

Parameters:
  • df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The DataFrame to be stored

  • index (str, optional) – The name of the column to be used as index. Uses current index for Pandas or a standard integer index for Arrow and Polars if index not provided, by default None

  • partition_size (int, optional) – The size of each partition in bytes. A partition_size value of -1 disables partitioning, by default 128 MB

  • errors (str, optional) – Whether or not to raise an error if the table already exist. Can be either raise or ignore, ignore overwrites existing table, by default raise

  • warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

append(df, *, warnings='warn')[source]#

Appends data to the current table

Parameters:
  • df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The data to be appended

  • warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

update(df)[source]#

Updates data in the current table.

Note: You can’t use this method to update index values. Updating index values can be accomplished by deleting the old records and inserting new ones with the updated index values.

Parameters:

df (Pandas DataFrame or Pandas Series) – The updated data. The index of df is the rows to be updated, while the columns of df are the new values.

insert(df)[source]#

Insert one or more rows into the current table.

Parameters:

df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index and column types as the stored data.

add_columns(df, idx=-1)[source]#

Insert one or more columns into the current table.

Parameters:
  • df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index as the stored data.

  • idx (int) – The position to insert the new column(s). Default is to add columns to the end.

drop(*, cols=None, rows=None)[source]#

Drop specified labels from rows or columns.

Parameters:
  • cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

  • rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

Raises:

AttributeError – Raised if neither of rows and cols are provided.

drop_rows(rows)[source]#

Drops specified rows from table

Same as Table.drop(rows=value)

Parameters:

rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

drop_columns(cols)[source]#

Drops specified rows from table

Same as Table.drop(cols=value)

Parameters:

cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

rename_columns(cols, *, to=None)[source]#

Rename one or more columns.

rename_columns supports two different call-syntaxes:

  • rename_columns({‘c1’: ‘new_c1’, ‘c2’: ‘new_c2’})

  • rename_columns([‘c1’, ‘c2’], to=[‘new_c1’, ‘new_c2’])

Parameters:
  • cols (Collection) – Either a list of columns to be renamed, or a dict mapping columns to be renamed to new column names

  • to (Collection[str], optional) – New column names, by default None

property columns[source]#

Fetches the names of the table columns

Returns:

The table columns

Return type:

list

reorder_columns(cols)[source]#

Reorder the current columns

Parameters:

cols (Sequence[str]) – The new column ordering. The column names provided must be the same as the column names used in the table.

property index[source]#

Fetches the table index

Return type:

pandas.Index

astype(cols, *, to=None)[source]#

Change data type of one or more columns.

astype supports two different call-syntaxes:

  • astype({‘c1’: pa.int64(), ‘c2’: pa.int16()})

  • astype([‘c1’, ‘c2’], to=[pa.int64(), pa.int16()])

Parameters:
  • cols (Sequence[str] or dict) – Either a sequence of columns to have its data types changed, or a dict mapping columns to new column data types.

  • to (Sequence[Pyarrow DataType], optional) – New column data types, by default None

rename_table(*, to)[source]#

Renames the current table

Parameters:

to (str) – The new name of the table.

drop_table()[source]#

Deletes the current table

create_snapshot(path)[source]#

Creates a compressed backup of the table.

The table can later be restored by using snapshot.restore_table().

Parameters:

path (str) – The path to the snapshot archive.

repartition(new_partition_size)[source]#

Repartitions a table so that each partition is new_partition_size big.

Parameters:

new_partition_size (int) – The size of each partition in bytes. A new_partition_size value of -1 disables partitioning

property shape[source]#

Fetches the shape of the stored table as (rows, columns).

Returns:

The shape of the table

Return type:

tuple(int, int)

property partition_size[source]#

Fetches the table partition size in bytes.

Returns:

The partition size in bytes.

Return type:

int

exists()[source]#
property name[source]#