generative_ai.dataset_generation package#
Submodules#
- generative_ai.dataset_generation.orchestrate_generation module
- generative_ai.dataset_generation.step_1_generation module
- generative_ai.dataset_generation.step_2_generation module
- generative_ai.dataset_generation.utils_generation module
AttributeDetailsClassDetailsClassDetails.member_typeClassDetails.class_parametersClassDetails.class_methodsClassDetails.class_attributesClassDetails.class_summaryClassDetails.class_notesClassDetails.member_typeClassDetails.class_parametersClassDetails.class_methodsClassDetails.class_attributesClassDetails.class_summaryClassDetails.class_notesClassDetails.model_computed_fieldsClassDetails.model_configClassDetails.model_fields
DatasetDocumentEnumDetailsEnumMemberDetailsFunctionDetailsFunctionDetails.member_typeFunctionDetails.function_parametersFunctionDetails.function_returnsFunctionDetails.function_summaryFunctionDetails.function_raisesFunctionDetails.function_warnsFunctionDetails.function_notesFunctionDetails.function_referencesFunctionDetails.function_examplesFunctionDetails.member_typeFunctionDetails.function_parametersFunctionDetails.function_returnsFunctionDetails.function_summaryFunctionDetails.function_raisesFunctionDetails.function_warnsFunctionDetails.function_notesFunctionDetails.function_referencesFunctionDetails.function_examplesFunctionDetails.model_computed_fieldsFunctionDetails.model_configFunctionDetails.model_fields
JSONDatasetJSONDocumentMemberDetailsMemberDetails.member_nameMemberDetails.member_qualified_nameMemberDetails.member_hierarchyMemberDetails.member_moduleMemberDetails.member_docstringMemberDetails.member_type_detailsMemberDetails.member_nameMemberDetails.member_qualified_nameMemberDetails.member_hierarchyMemberDetails.member_moduleMemberDetails.member_docstringMemberDetails.member_type_detailsMemberDetails.model_computed_fieldsMemberDetails.model_configMemberDetails.model_fields
MemberTypeMethodDetailsModuleDetailsModuleDetails.module_nameModuleDetails.module_qualified_nameModuleDetails.module_hierarchyModuleDetails.package_nameModuleDetails.module_membersModuleDetails.module_summaryModuleDetails.module_all_exportsModuleDetails.module_nameModuleDetails.module_qualified_nameModuleDetails.module_hierarchyModuleDetails.package_nameModuleDetails.module_membersModuleDetails.module_summaryModuleDetails.module_all_exportsModuleDetails.model_computed_fieldsModuleDetails.model_configModuleDetails.model_fields
ModuleMemberDetailsPackageDetailsPackageDetails.package_namePackageDetails.package_qualified_namePackageDetails.package_hierarchyPackageDetails.parent_package_namePackageDetails.children_sub_packages_namesPackageDetails.children_modules_namesPackageDetails.package_summaryPackageDetails.package_all_exportsPackageDetails.package_namePackageDetails.package_qualified_namePackageDetails.package_hierarchyPackageDetails.parent_package_namePackageDetails.children_sub_packages_namesPackageDetails.children_modules_namesPackageDetails.package_summaryPackageDetails.package_all_exportsPackageDetails.model_computed_fieldsPackageDetails.model_configPackageDetails.model_fields
ParameterDetailsParameterDetails.parameter_nameParameterDetails.parameter_defaultParameterDetails.parameter_annotationParameterDetails.parameter_kindParameterDetails.parameter_summaryParameterDetails.parameter_nameParameterDetails.parameter_defaultParameterDetails.parameter_annotationParameterDetails.parameter_kindParameterDetails.parameter_summaryParameterDetails.parameter_detailsParameterDetails.model_computed_fieldsParameterDetails.model_configParameterDetails.model_fields
RaiseDetailsReturnDetailsSplitNameSplitProportionsSplitProportions.train_proportionSplitProportions.validation_proportionSplitProportions.test_proportionSplitProportions.train_proportionSplitProportions.validation_proportionSplitProportions.test_proportionSplitProportions.validate_proportionsSplitProportions.model_computed_fieldsSplitProportions.model_configSplitProportions.model_fields
WarnDetails
Module contents#
Define functionalities for dataset generation.
- class JSONDataset(*, retrieval_documents: list[str], tuning_documents: list[JSONDocument])#
Bases:
BaseModelStore all details for querying a package documentation in JSON format.
- retrieval_documents#
chunks of text to be used for retrieval
- Type:
list[str]
- tuning_documents#
pairs of question and answer to be used for tuning
- Type:
list[JSONDocument]
- tuning_documents: list[JSONDocument]#
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'retrieval_documents': FieldInfo(annotation=list[str], required=True), 'tuning_documents': FieldInfo(annotation=list[JSONDocument], required=True)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class JSONDocument(*, context: str, question: str, answer: str, split: SplitName)#
Bases:
BaseModelStore details of a document in JSON format.
- split#
split allocation of the document
- Type:
SplitName
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'answer': FieldInfo(annotation=str, required=True), 'context': FieldInfo(annotation=str, required=True), 'question': FieldInfo(annotation=str, required=True), 'split': FieldInfo(annotation=SplitName, required=True)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- generate_json_dataset(raw_datasets: list[Dataset]) JSONDataset#
Convert raw documents into JSON format.
- Parameters:
raw_datasets (
list[Dataset]) -- all retrieval and tuning documents for root package and its contents- Returns:
all details for querying a package documentation in JSON format
- Return type:
- generate_member_dataset(member_details: MemberDetails) tuple[Dataset, ...]#
Create a dataset for a member.
- Parameters:
member_details (
MemberDetails) -- all details of the member- Returns:
all documents for retrieval and tuning for querying member documentation
- Return type:
tuple[Dataset,]- Raises:
ValueError -- if the member type is not supported
Notes
There will be a single return if member type is not enum, class or function.
Otherwise, there will be two returns, one for the member and one for the member type.
- generate_module_dataset(module_contents: ModuleDetails) Dataset#
Create relevant question and answers based on module details.
- Parameters:
module_contents (
ModuleDetails) -- details of a python module- Returns:
all documents for retrieval and tuning for querying module documentation
- Return type:
Dataset
- generate_package_dataset(package_contents: PackageDetails) Dataset#
Create relevant question and answers based on package details.
- Parameters:
package_contents (
PackageDetails) -- details of a python package- Returns:
all documents for retrieval and tuning for querying package documentation
- Return type:
Dataset
- generate_raw_datasets(package_name: str) list[Dataset]#
Generate all retrieval and tuning documents for exploring documentation of a package.
- Parameters:
package_name (
str) -- name of the root package to import with- Returns:
all retrieval and tuning documents for root package and its contents
- Return type:
list[Dataset]
- get_all_member_details(module_name: str, member_name: str, member_object: Any) MemberDetails#
Extract all details of a module object.
- get_all_module_contents(module_name: str) ModuleDetails#
Extract all details of a module.
- Parameters:
module_name (
str) -- name of the module to import with- Returns:
details of the module
- Return type:
ModuleDetails
- get_all_package_contents(package_name: str) list[PackageDetails]#
Extract all details of a root package.
- Parameters:
package_name (
str) -- name of the root package to import with- Returns:
all details of the root package and its sub-packages
- Return type:
list[PackageDetails]
- load_json_dataset(file_path: Path) JSONDataset#
Load JSON dataset from a JSON file.
- Parameters:
file_path (
pathlib.Path) -- path to load JSON dataset from- Returns:
all details for querying a package documentation in JSON format
- Return type:
- store_json_dataset(json_dataset: JSONDataset, file_path: Path) None#
Dump JSON dataset into a JSON file.
- Parameters:
json_dataset (
JSONDataset) -- all details for querying a package documentation in JSON formatfile_path (
pathlib.Path) -- path to store JSON dataset