Table#

class featherstore.table.Table(table_name, store_name)[source]#

Bases: object

A class for saving and loading DataFrames as partitioned Feather files.

Tables supports several operations that can be done without loading in the full data:

  • Partial reading of data

  • Append data

  • Insert data

  • Update data

  • Drop data

  • Read metadata (column names, index, table shape, etc)

  • Changing column types

  • Changing types

Parameters
  • table_name (str) – The name of the table.

  • store_name (str) – The name of the store.

read_arrow(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a PyArrow Table

Parameters
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

pyarrow.Table

read_pandas(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Pandas DataFrame

Parameters
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

pandas.DataFrame or pandas.Series

read_polars(*, cols=None, rows=None, mmap=None)[source]#

Reads the data as a Polars DataFrame

Parameters
  • cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.

  • rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.

  • mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.

Return type

polars.DataFrame

write(df, /, index=None, *, partition_size=134217728, errors='raise', warnings='warn')[source]#

Writes a DataFrame to the current table.

The DataFrame index column, if provided, must be either of type int, str, or datetime. FeatherStore sorts the DataFrame by the index before storage.

Parameters
  • df (Pandas DataFrame or Series, Pyarrow Table, or Polars DataFrame) – The DataFrame to be stored

  • index (str, optional) – The name of the column to be used as index. Uses current index for Pandas or a standard integer index for Arrow and Polars if index not provided, by default None

  • partition_size (int, optional) – The size of each partition in bytes. A partition_size value of -1 disables partitioning, by default 128 MB

  • errors (str, optional) – Whether or not to raise an error if the table already exist. Can be either raise or ignore, ignore overwrites existing table, by default raise

  • warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

append(df, *, warnings='warn')[source]#

Appends data to the current table

Parameters
  • df (Pandas DataFrame or Series, Pyarrow Table, or Polars DataFrame) – The data to be appended

  • warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn

update(df)[source]#

Updates data in the current table.

Note: You can’t use this method to update index values. Updating index values can be accomplished by deleting the old records and inserting new ones with the updated index values.

Parameters

df (Pandas DataFrame or Pandas Series) – The updated data. The index of df is the rows to be updated, while the columns of df are the new values.

insert(df)[source]#

Insert one or more rows into the current table.

Parameters

df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index and column types as the stored data.

add_columns(df, idx=-1)[source]#

Insert one or more columns into the current table.

Parameters
  • df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index as the stored data.

  • idx (int) – The position to insert the new column(s). Default is to add columns to the end.

drop(*, cols=None, rows=None)[source]#

Drop specified labels from rows or columns.

Parameters
  • cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

  • rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

Raises

AttributeError – Raised if neither of rows and cols are provided.

drop_rows(rows)[source]#

Drops specified rows from table

Same as Table.drop(rows=value)

Parameters

rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None

drop_columns(cols)[source]#

Drops specified rows from table

Same as Table.drop(cols=value)

Parameters

cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None

rename_columns(cols, *, to=None)[source]#

Rename one or more columns.

rename_columns supports two different call-syntaxes:

  • rename_columns({‘c1’: ‘new_c1’, ‘c2’: ‘new_c2’})

  • rename_columns([‘c1’, ‘c2’], to=[‘new_c1’, ‘new_c2’])

Parameters
  • cols (Collection) – Either a list of columns to be renamed, or a dict mapping columns to be renamed to new column names

  • to (Collection[str], optional) – New column names, by default None

property columns[source]#

Fetches the names of the table columns

Returns

The table columns

Return type

list

reorder_columns(cols)[source]#

Reorder the current columns

Parameters

cols (Sequence[str]) – The new column ordering. The column names provided must be the same as the column names used in the table.

property index[source]#

Fetches the table index

Return type

pandas.Index

astype(cols, *, to=None)[source]#

Change data type of one or more columns.

astype supports two different call-syntaxes:

  • astype({‘c1’: pa.int64(), ‘c2’: pa.int16()})

  • astype([‘c1’, ‘c2’], to=[pa.int64(), pa.int16()])

Parameters
  • cols (Sequence[str] or dict) – Either a sequence of columns to have its data types changed, or a dict mapping columns to new column data types.

  • to (Sequence[Pyarrow DataType], optional) – New column data types, by default None

rename_table(*, to)[source]#

Renames the current table

Parameters

to (str) – The new name of the table.

drop_table()[source]#

Deletes the current table

create_snapshot(path)[source]#

Creates a compressed backup of the table.

The table can later be restored by using snapshot.restore_table().

Parameters

path (str) – The path to the snapshot archive.

repartition(new_partition_size)[source]#

Repartitions a table so that each partition is new_partition_size big.

Parameters

new_partition_size (int) – The size of each partition in bytes. A new_partition_size value of -1 disables partitioning

property shape[source]#

Fetches the shape of the stored table as (rows, columns).

Returns

The shape of the table

Return type

tuple(int, int)

property partition_size[source]#

Fetches the table partition size in bytes.

Returns

The partition size in bytes.

Return type

int

exists()[source]#
property name[source]#