Transform Operations
TransformBase
- class tiny_blocks.transform.base.TransformBase(*, uuid: UUID = None, name: str, version: str = 'v1', description: str = None)
Transform Base Block
Each transformation Block implements the get_iter method. This method get one or multiple iterators and return an Iterator of chunked DataFrames.
- get_iter(source) Iterator[DataFrame]
Return an iterator of chunked dataframes
The chunksize is defined as kwargs in each transformation block
Apply
- class tiny_blocks.transform.apply.Apply(*, uuid: UUID = None, name: Literal['apply'] = 'apply', version: str = 'v1', description: str = None, apply_to_column: str, set_to_column: str, func: Callable, kwargs: KwargsApply = KwargsApply())
Apply function. Defines block to apply function.
The method is applied to a single column. For different functionality please rewrite the Block.
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Apply >>> from tiny_blocks.extract import FromCSV >>> >>> from_csv = FromCSV(path='/path/to/file.csv') >>> apply = Apply( ... apply_to_column="column_A", ... set_to_column="column_b", ... func=lambda x: x + 1, >>> ) >>> >>> generator = from_csv.get_iter() >>> generator = apply.get_iter(generator) >>> df = pd.concat(generator)
For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
Astype
- class tiny_blocks.transform.astype.Astype(*, uuid: UUID = None, name: Literal['astype'] = 'astype', version: str = 'v1', description: str = None, dtype: Dict[str, str], kwargs: KwargsAstype = KwargsAstype(errors='ignore'))
Astype Block. Defines the type casting for column dataframes.
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Astype >>> from tiny_blocks.extract import FromCSV >>> >>> from_csv = FromCSV(path="/path/to/file.csv") >>> as_type = Astype(dtype={"e": "float32"}) >>> >>> generator = from_csv.get_iter() >>> generator = as_type.get_iter(generator) >>> df = pd.concat(generator)
For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html
DropDuplicates
- class tiny_blocks.transform.drop_duplicates.DropDuplicates(*, uuid: UUID = None, name: Literal['drop_duplicates'] = 'drop_duplicates', version: str = 'v1', description: str = None, kwargs: KwargsDropDuplicates = KwargsDropDuplicates(chunksize=1000), keep: Literal['first', 'last'] = 'first', subset: Set[str] = None)
Drop Duplicates Block. Defines the drop duplicates functionality
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import DropDuplicates >>> from tiny_blocks.extract import FromCSV >>> >>> extract_csv = FromCSV(path='/path/to/file.csv') >>> drop_duplicates = DropDuplicates() >>> >>> generator = extract_csv.get_iter() >>> generator = drop_duplicates.get_iter(generator) >>> df = pd.concat(generator)
DropNa
- class tiny_blocks.transform.dropna.DropNa(*, uuid: UUID = None, name: Literal['drop_na'] = 'drop_na', version: str = 'v1', description: str = None, kwargs: KwargsDropNa = KwargsDropNa(subset=None, axis=None, how=None, thresh=None))
Drop Nan Block. Defines the drop None values functionality
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import DropNa >>> from tiny_blocks.extract import FromCSV >>> >>> extract_csv = FromCSV(path='/path/to/file.csv') >>> drop_na = DropNa() >>> >>> generator = extract_csv.get_iter() >>> generator = drop_na.get_iter(generator) >>> df = pd.concat(generator)
For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html
Merge
- class tiny_blocks.transform.merge.Merge(*, uuid: UUID = None, name: Literal['merge'] = 'merge', version: str = 'v1', description: str = None, how: Literal['left', 'right', 'outer', 'inner', 'cross'] = 'inner', left_on: str, right_on: str, kwargs: KwargsMerge = KwargsMerge(chunksize=1000))
Merge. Defines merge functionality between two blocks.
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Merge >>> from tiny_blocks.extract import FromCSV >>> >>> from_csv_1 = FromCSV(path="/path/to/file_1.csv") >>> from_csv_2 = FromCSV(path="/path/to/file_2.csv") >>> merge = Merge(how="left", left_on="col_A", right_on="col_B") >>> >>> left_source = from_csv_1.get_iter() >>> right_source = from_csv_2.get_iter() >>> generator = merge.get_iter(source=[left_source, right_source]) >>> df = pd.concat(generator)
Rename
- class tiny_blocks.transform.rename.Rename(*, uuid: UUID = None, name: Literal['rename'] = 'rename', version: str = 'v1', description: str = None, kwargs: KwargsRename = KwargsRename(axis=None, level=None, errors=None), columns: Dict[str, str])
Rename Block. Defines Rename columns functionality
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Rename >>> from tiny_blocks.extract import FromCSV >>> >>> from_csv = FromCSV(path='/path/to/file.csv') >>> sort = Rename(columns={"column_name": "new_column_name"}) >>> >>> generator = from_csv.get_iter() >>> generator = sort.get_iter(generator) >>> df = pd.concat(generator)
For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html
Sort
- class tiny_blocks.transform.sort.Sort(*, uuid: UUID = None, name: Literal['sort'] = 'sort', version: str = 'v1', description: str = None, by: List[str], ascending: bool = True, kwargs: KwargsSort = KwargsSort(chunksize=1000))
Sort Block. Defines the Sorting operation
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Sort >>> from tiny_blocks.extract import FromCSV >>> >>> extract_csv = FromCSV(path='/path/to/file.csv') >>> sort = Sort(by=["column_A"], ascending=False) >>> >>> generator = extract_csv.get_iter() >>> generator = sort.get_iter(generator) >>> df = pd.concat(generator)
Validate
- class tiny_blocks.transform.validate.Validate(*, uuid: UUID = None, name: Literal['validate'] = 'validate', version: str = 'v1', description: str = None, schema_model: SchemaModel, lazy: bool = True)
Validate block. Defines block to apply validation.
- Basic example:
>>> import pandas as pd >>> from tiny_blocks.transform import Apply >>> from tiny_blocks.extract import FromCSV >>> >>> from_csv = FromCSV(path='/path/to/file.csv') >>> validate = Validate( ... schema_model=my_schema_validation, lazy=True >>> ) >>> >>> generator = from_csv.get_iter() >>> generator = validate.get_iter(generator) >>> df = pd.concat(generator)