Transform Operations

TransformBase

class tiny_blocks.transform.base.TransformBase(*, uuid: UUID = None, name: str, version: str = 'v1', description: str = None)

Transform Base Block

Each transformation Block implements the get_iter method. This method get one or multiple iterators and return an Iterator of chunked DataFrames.

get_iter(source) Iterator[DataFrame]

Return an iterator of chunked dataframes

The chunksize is defined as kwargs in each transformation block

Apply

class tiny_blocks.transform.apply.Apply(*, uuid: UUID = None, name: Literal['apply'] = 'apply', version: str = 'v1', description: str = None, apply_to_column: str, set_to_column: str, func: Callable, kwargs: KwargsApply = KwargsApply())

Apply function. Defines block to apply function.

The method is applied to a single column. For different functionality please rewrite the Block.

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Apply
>>> from tiny_blocks.extract import FromCSV
>>>
>>> from_csv = FromCSV(path='/path/to/file.csv')
>>> apply = Apply(
...   apply_to_column="column_A",
...   set_to_column="column_b",
...   func=lambda x: x + 1,
>>> )
>>>
>>> generator = from_csv.get_iter()
>>> generator = apply.get_iter(generator)
>>> df = pd.concat(generator)

For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

Astype

class tiny_blocks.transform.astype.Astype(*, uuid: UUID = None, name: Literal['astype'] = 'astype', version: str = 'v1', description: str = None, dtype: Dict[str, str], kwargs: KwargsAstype = KwargsAstype(errors='ignore'))

Astype Block. Defines the type casting for column dataframes.

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Astype
>>> from tiny_blocks.extract import FromCSV
>>>
>>> from_csv = FromCSV(path="/path/to/file.csv")
>>> as_type = Astype(dtype={"e": "float32"})
>>>
>>> generator = from_csv.get_iter()
>>> generator = as_type.get_iter(generator)
>>> df = pd.concat(generator)

For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html

DropDuplicates

class tiny_blocks.transform.drop_duplicates.DropDuplicates(*, uuid: UUID = None, name: Literal['drop_duplicates'] = 'drop_duplicates', version: str = 'v1', description: str = None, kwargs: KwargsDropDuplicates = KwargsDropDuplicates(chunksize=1000), keep: Literal['first', 'last'] = 'first', subset: Set[str] = None)

Drop Duplicates Block. Defines the drop duplicates functionality

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import DropDuplicates
>>> from tiny_blocks.extract import FromCSV
>>>
>>> extract_csv = FromCSV(path='/path/to/file.csv')
>>> drop_duplicates = DropDuplicates()
>>>
>>> generator = extract_csv.get_iter()
>>> generator = drop_duplicates.get_iter(generator)
>>> df = pd.concat(generator)

DropNa

class tiny_blocks.transform.dropna.DropNa(*, uuid: UUID = None, name: Literal['drop_na'] = 'drop_na', version: str = 'v1', description: str = None, kwargs: KwargsDropNa = KwargsDropNa(subset=None, axis=None, how=None, thresh=None))

Drop Nan Block. Defines the drop None values functionality

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import DropNa
>>> from tiny_blocks.extract import FromCSV
>>>
>>> extract_csv = FromCSV(path='/path/to/file.csv')
>>> drop_na = DropNa()
>>>
>>> generator = extract_csv.get_iter()
>>> generator = drop_na.get_iter(generator)
>>> df = pd.concat(generator)

For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html

Merge

class tiny_blocks.transform.merge.Merge(*, uuid: UUID = None, name: Literal['merge'] = 'merge', version: str = 'v1', description: str = None, how: Literal['left', 'right', 'outer', 'inner', 'cross'] = 'inner', left_on: str, right_on: str, kwargs: KwargsMerge = KwargsMerge(chunksize=1000))

Merge. Defines merge functionality between two blocks.

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Merge
>>> from tiny_blocks.extract import FromCSV
>>>
>>> from_csv_1 = FromCSV(path="/path/to/file_1.csv")
>>> from_csv_2 = FromCSV(path="/path/to/file_2.csv")
>>> merge = Merge(how="left", left_on="col_A", right_on="col_B")
>>>
>>> left_source = from_csv_1.get_iter()
>>> right_source = from_csv_2.get_iter()
>>> generator = merge.get_iter(source=[left_source, right_source])
>>> df = pd.concat(generator)

Rename

class tiny_blocks.transform.rename.Rename(*, uuid: UUID = None, name: Literal['rename'] = 'rename', version: str = 'v1', description: str = None, kwargs: KwargsRename = KwargsRename(axis=None, level=None, errors=None), columns: Dict[str, str])

Rename Block. Defines Rename columns functionality

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Rename
>>> from tiny_blocks.extract import FromCSV
>>>
>>> from_csv = FromCSV(path='/path/to/file.csv')
>>> sort = Rename(columns={"column_name": "new_column_name"})
>>>
>>> generator = from_csv.get_iter()
>>> generator = sort.get_iter(generator)
>>> df = pd.concat(generator)

For more Kwargs info: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html

Sort

class tiny_blocks.transform.sort.Sort(*, uuid: UUID = None, name: Literal['sort'] = 'sort', version: str = 'v1', description: str = None, by: List[str], ascending: bool = True, kwargs: KwargsSort = KwargsSort(chunksize=1000))

Sort Block. Defines the Sorting operation

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Sort
>>> from tiny_blocks.extract import FromCSV
>>>
>>> extract_csv = FromCSV(path='/path/to/file.csv')
>>> sort = Sort(by=["column_A"], ascending=False)
>>>
>>> generator = extract_csv.get_iter()
>>> generator = sort.get_iter(generator)
>>> df = pd.concat(generator)

Validate

class tiny_blocks.transform.validate.Validate(*, uuid: UUID = None, name: Literal['validate'] = 'validate', version: str = 'v1', description: str = None, schema_model: SchemaModel, lazy: bool = True)

Validate block. Defines block to apply validation.

Basic example:
>>> import pandas as pd
>>> from tiny_blocks.transform import Apply
>>> from tiny_blocks.extract import FromCSV
>>>
>>> from_csv = FromCSV(path='/path/to/file.csv')
>>> validate = Validate(
...   schema_model=my_schema_validation, lazy=True
>>> )
>>>
>>> generator = from_csv.get_iter()
>>> generator = validate.get_iter(generator)
>>> df = pd.concat(generator)