Table#
- class featherstore.table.Table(table_name, store_name)[source]#
Bases:
object
A class for saving and loading DataFrames as partitioned Feather files.
Tables supports several operations that can be done without loading in the full data:
Partial reading of data
Append data
Insert data
Update data
Drop data
Read metadata (column names, index, table shape, etc)
Changing column types
Changing types
- Parameters:
table_name (str) – The name of the table.
store_name (str) – The name of the store.
- read_arrow(*, cols=None, rows=None, mmap=None)[source]#
Reads the data as a PyArrow Table
- Parameters:
cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.
- Return type:
pyarrow.Table
- read_pandas(*, cols=None, rows=None, mmap=None)[source]#
Reads the data as a Pandas DataFrame or Series
- Parameters:
cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.
- Return type:
pandas.DataFrame or pandas.Series
- read_polars(*, cols=None, rows=None, mmap=None)[source]#
Reads the data as a Polars DataFrame or Series
- Parameters:
cols (Collection, optional) – List of column names or, filter-predicates in the form of {‘like’: pattern}. If not provided, all columns are read.
rows (Collection, optional) – List of index values or filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between. If not provided, all rows are read.
mmap (bool, optional) – Use memory mapping when opening table on disk, by default False on Windows and True on other systems.
- Return type:
polars.DataFrame or polars.Series
- write(df, /, index=None, *, partition_size=134217728, errors='raise', warnings='warn')[source]#
Writes a DataFrame to the current table.
The DataFrame index column, if provided, must be either of type int, str, or datetime. FeatherStore sorts the DataFrame by the index before storage.
- Parameters:
df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The DataFrame to be stored
index (str, optional) – The name of the column to be used as index. Uses current index for Pandas or a standard integer index for Arrow and Polars if index not provided, by default None
partition_size (int, optional) – The size of each partition in bytes. A partition_size value of -1 disables partitioning, by default 128 MB
errors (str, optional) – Whether or not to raise an error if the table already exist. Can be either raise or ignore, ignore overwrites existing table, by default raise
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn
- append(df, *, warnings='warn')[source]#
Appends data to the current table
- Parameters:
df (pandas DataFrame or Series, polars DataFrame or Series, or pyarrow Table) – The data to be appended
warnings (str, optional) – Whether or not to warn if a unsorted index is about to get sorted. Can be either warn or ignore, by default warn
- update(df)[source]#
Updates data in the current table.
Note: You can’t use this method to update index values. Updating index values can be accomplished by deleting the old records and inserting new ones with the updated index values.
- Parameters:
df (Pandas DataFrame or Pandas Series) – The updated data. The index of df is the rows to be updated, while the columns of df are the new values.
- insert(df)[source]#
Insert one or more rows into the current table.
- Parameters:
df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index and column types as the stored data.
- add_columns(df, idx=-1)[source]#
Insert one or more columns into the current table.
- Parameters:
df (Pandas DataFrame or Pandas Series) – The data to be inserted. df must have the same index as the stored data.
idx (int) – The position to insert the new column(s). Default is to add columns to the end.
- drop(*, cols=None, rows=None)[source]#
Drop specified labels from rows or columns.
- Parameters:
cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None
rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None
- Raises:
AttributeError – Raised if neither of rows and cols are provided.
- drop_rows(rows)[source]#
Drops specified rows from table
Same as Table.drop(rows=value)
- Parameters:
rows (Collection, optional) – list of index values or, filter-predicates in the form of {keyword: value}, where keyword can be either before, after, or between, by default None
- drop_columns(cols)[source]#
Drops specified rows from table
Same as Table.drop(cols=value)
- Parameters:
cols (Collection, optional) – list of column names or filter-predicates in the form of {‘like’: pattern}, by default None
- rename_columns(cols, *, to=None)[source]#
Rename one or more columns.
rename_columns supports two different call-syntaxes:
rename_columns({‘c1’: ‘new_c1’, ‘c2’: ‘new_c2’})
rename_columns([‘c1’, ‘c2’], to=[‘new_c1’, ‘new_c2’])
- Parameters:
cols (Collection) – Either a list of columns to be renamed, or a dict mapping columns to be renamed to new column names
to (Collection[str], optional) – New column names, by default None
- property columns[source]#
Fetches the names of the table columns
- Returns:
The table columns
- Return type:
list
- reorder_columns(cols)[source]#
Reorder the current columns
- Parameters:
cols (Sequence[str]) – The new column ordering. The column names provided must be the same as the column names used in the table.
- astype(cols, *, to=None)[source]#
Change data type of one or more columns.
astype supports two different call-syntaxes:
astype({‘c1’: pa.int64(), ‘c2’: pa.int16()})
astype([‘c1’, ‘c2’], to=[pa.int64(), pa.int16()])
- Parameters:
cols (Sequence[str] or dict) – Either a sequence of columns to have its data types changed, or a dict mapping columns to new column data types.
to (Sequence[Pyarrow DataType], optional) – New column data types, by default None
- rename_table(*, to)[source]#
Renames the current table
- Parameters:
to (str) – The new name of the table.
- create_snapshot(path)[source]#
Creates a compressed backup of the table.
The table can later be restored by using snapshot.restore_table().
- Parameters:
path (str) – The path to the snapshot archive.
- repartition(new_partition_size)[source]#
Repartitions a table so that each partition is new_partition_size big.
- Parameters:
new_partition_size (int) – The size of each partition in bytes. A new_partition_size value of -1 disables partitioning
- property shape[source]#
Fetches the shape of the stored table as (rows, columns).
- Returns:
The shape of the table
- Return type:
tuple(int, int)