Changelog

1.3.0

Polars DataFrame Support

add_table() and get_table() now accept Polars DataFrames in addition to pandas DataFrames.
Serialization now routes through PyArrow for both libraries, preserving exact column types such as nullable Int64 and struct fields that were previously degraded via pandas.to_parquet.
PandasHandler renamed to DataframeHandler; PandasHandler remains as a backward-compatible alias.

1.2.0

Generic Path Methods

get_data_path(name) refactored into a universal dispatcher: works for all item types (tables, models, arrays, JSON, timestamps, artifacts) rather than only referenced tables.
New type-specific path methods: get_table_path(), get_model_path(), get_numpy_path(), get_json_path(), get_timestamp_path() — each returns the full path to the stored file for that item type.
get_table_path() now works for both included and referenced tables. Previously get_data_path() raised an error for included tables; now it returns the parquet file path inside the bundle.

1.1.0

Item Curation

Archive / Unarchive

New archive() method: Mark one or more items as hidden without deleting them.
Accepts a single name, a list of names, or a glob pattern (e.g. folio.archive("intermediate/*"))
Archived items are excluded from list_contents(), describe(), and copy() by default
Data remains fully accessible via get_data() / get_table() / etc.
create_snapshot() still captures archived items (snapshots record complete state)
New unarchive() method: Restore archived items to active status.
Same flexible name/list/glob interface as archive()
include_archived=True parameter added to list_contents(), describe(), and copy() — pass this flag to reveal or include archived items in any of those views.

Lineage-Aware Copy

New follow_lineage=True parameter on copy(): When combined with include_items, automatically resolves all transitive upstream dependencies of the named items.
folio.copy("pub", include_items=["final_model"], follow_lineage=True) copies final_model plus every item it depends on, recursively.
Items referenced in lineage metadata that are not in this folio (external tables, etc.) are silently skipped — the lineage metadata is still preserved.
Works together with include_archived=True to control whether archived upstream items are included or excluded.

Major Features

Generic Data Interface

New add_data() method: Universal data addition method that automatically detects data type and routes to the appropriate handler
Supports DataFrames, numpy arrays, dicts, lists, scalars, and external references
Single, intuitive interface for all data types
New get_data() method: Universal data retrieval method that automatically returns data in its original format
No need to remember which getter to use for each data type

Numpy Array Support

New add_numpy() method: Store numpy arrays as .npy files with full metadata
Preserves shape, dtype, and array properties
Supports lineage tracking (inputs, code context)
New get_numpy() method: Retrieve numpy arrays with original shape and dtype

JSON Data Support

New add_json() method: Store JSON-serializable data (dicts, lists, scalars)
Supports nested structures
Type information stored in metadata
Supports lineage tracking
New get_json() method: Retrieve JSON data in original format

Timestamp Support

New add_timestamp() method: Store datetime objects with proper timezone handling
Accepts timezone-aware datetime.datetime objects or Unix timestamps (int/float)
Rejects naive datetimes to prevent timezone ambiguity
Automatically converts all timestamps to UTC for consistent storage
Stores as ISO 8601 strings in JSON format for human readability
Supports lineage tracking (inputs, code context)
New get_timestamp() method: Retrieve timestamps in multiple formats
Returns UTC-aware datetime by default
Optional as_unix=True parameter to return Unix timestamp (float)
Always reads fresh from disk (not cached)
Integration features:
Timestamps appear in list_contents() under "timestamps" key
Timestamps display in describe() output with human-readable formatting
Full support for folio.data.timestamp_name accessor pattern
Round-trip preservation of microsecond precision

Enhanced Features

Improved `describe()` Method

Compact output format: More readable, information-dense display
New parameters:
return_string=True: Returns description as string instead of printing
show_empty=True: Shows empty sections in output
max_metadata_fields=10: Limit number of metadata fields displayed (default: 10)
show_paths=True: Show the file path for every item — especially useful for cloud-hosted folios where paths can be copied and sent to collaborators directly
Unified data sections: Tables section now combines referenced and included tables
Better metadata display: Shows shape, dtype, init_args, and other relevant info inline
Improved lineage display: Clearer visualization of data dependencies
Smart metadata display: New metadata section with intelligent truncation
Automatically filters out internal fields (_datafolio, created_at, updated_at)
Truncates long strings with ellipsis (shows first 50 chars)
Shows type and item count for collections (lists, dicts)
Limits display to configurable number of fields with "... and N more fields" indicator

New get_item_path() method: Returns the full path to any item's data file by name, regardless of item type (table, model, artifact, array, JSON, or timestamp).
For items stored in the bundle returns the full local or cloud URI (e.g. s3://bucket/my-run/tables/results.parquet)
For referenced tables returns the external path recorded at reference time
Makes it easy to hand off individual files to colleagues who don't use datafolio

New `delete()` Method

Delete items from DataFolio: Remove items and their associated files
Flexible input: Accepts single string or list of strings
Transaction-like validation: Checks all items exist before deleting any
Dependency warnings: Warns (but doesn't block) when deleting items with dependents
Parameters:
name: Item name(s) to delete (string or list)
warn_dependents=True: Print warning if deleted items have dependents
Method chaining: Returns Self for fluent API
Complete cleanup: Removes both manifest entries and physical files

Autocomplete-Friendly Data Access (`folio.data`)

New data property: Access items with IDE autocomplete support
Dual access patterns:
Attribute-style: folio.data.my_table.content
Dictionary-style: folio.data['my_table'].content
ItemProxy properties: Each item provides rich metadata access
.content: Returns data in appropriate format (DataFrame, array, dict, model, or file path)
.description: Item description string
.type: Item type identifier
.path: File path (for referenced tables and artifacts)
.inputs: List of lineage inputs
.dependents: List of dependent items
.metadata: Full metadata dictionary
IPython/Jupyter support: Full autocomplete via __dir__ implementation
Type-appropriate returns: Automatically returns correct data type based on item type

Enhanced `list_contents()` Method

New keys in return dict:
numpy_arrays: List of numpy array items
json_data: List of JSON data items
timestamps: List of timestamp items

Internal Improvements

Refactored to handler-based architecture: Separated data type logic into modular handlers for improved maintainability and extensibility
Core folio.py reduced from 3,659 → 764 lines (79% smaller)
7 specialized handlers for different data types
Zero breaking changes - all existing APIs preserved
Improved test coverage: 69% → 80% coverage with 694 passing tests (up from 265)
Enhanced code quality: Complete type hints, no circular dependencies, clean linting

Documentation

Comprehensive documentation update with examples for all new features
Added Quick Start guide with generic interface examples
Added complete ML workflow example using the new generic interface
Updated directory structure documentation

0.1.0

Initial release

Changelog

1.3.0

Polars DataFrame Support

1.2.0

Generic Path Methods

1.1.0

Item Curation

Archive / Unarchive

Lineage-Aware Copy

Major Features

Generic Data Interface

Numpy Array Support

JSON Data Support

Timestamp Support

Enhanced Features

Improved describe() Method

Path Sharing for Collaborators

New delete() Method

Autocomplete-Friendly Data Access (folio.data)

Enhanced list_contents() Method

Internal Improvements

Documentation

0.1.0

Improved `describe()` Method

New `delete()` Method

Autocomplete-Friendly Data Access (`folio.data`)

Enhanced `list_contents()` Method