generative_ai.dataset_generation.orchestrate_generation module#

Define functionalities to orchestrate dataset generation.

generate_json_dataset(raw_datasets: list[Dataset]) JSONDataset#

Convert raw documents into JSON format.

Parameters:

raw_datasets (list[Dataset]) -- all retrieval and tuning documents for root package and its contents

Returns:

all details for querying a package documentation in JSON format

Return type:

JSONDataset

generate_raw_datasets(package_name: str) list[Dataset]#

Generate all retrieval and tuning documents for exploring documentation of a package.

Parameters:

package_name (str) -- name of the root package to import with

Returns:

all retrieval and tuning documents for root package and its contents

Return type:

list[Dataset]

load_json_dataset(file_path: Path) JSONDataset#

Load JSON dataset from a JSON file.

Parameters:

file_path (pathlib.Path) -- path to load JSON dataset from

Returns:

all details for querying a package documentation in JSON format

Return type:

JSONDataset

store_json_dataset(json_dataset: JSONDataset, file_path: Path) None#

Dump JSON dataset into a JSON file.

Parameters:
  • json_dataset (JSONDataset) -- all details for querying a package documentation in JSON format

  • file_path (pathlib.Path) -- path to store JSON dataset