cli module#

Define command line interface using Typer.

generate_dataset(package_name: str, dataset_file: Path = PosixPath('json_documents.json'), force: bool = False) None#

Create JSON dataset for querying a package documentation.

Parameters:
  • package_name (str) -- name of the root package to import with

  • dataset_file (pathlib.Path, optional) -- path to store JSON dataset, by default pathlib.Path("json_documents.json")

  • force (bool, optional) -- override if dataset_file already exists, by default False

generate_database(dataset_file: Path = PosixPath('json_documents.json'), embedding_model: str = 'sentence-transformers/all-MiniLM-L6-v2', database_directory: Path = PosixPath('embeddings_database'), force: bool = False) None#

Generate embedding database for querying a package documentation.

Parameters:
  • dataset_file (pathlib.Path, optional) -- path storing JSON dataset, by default pathlib.Path("json_documents.json")

  • embedding_model (str, optional) -- name of Sentence Transformers model, by default "sentence-transformers/all-MiniLM-L6-v2"

  • database_directory (pathlib.Path, optional) -- path to directory for storing vector store, by default pathlib.Path("embeddings_database")

  • force (bool, optional) -- override if database_directory already exists, by default False

answer_query(query: str, embedding_model: str = 'sentence-transformers/all-MiniLM-L6-v2', database_directory: Path = PosixPath('embeddings_database'), search_type: RetrievalType = RetrievalType.SIMILARITY, number_of_documents: int = 5, initial_number_of_documents: int = 10, diversity_level: float = 0.5, language_model_type: TransformerType = TransformerType.STANDARD_TRANSFORMERS, standard_pipeline_type: PipelineType = PipelineType.TEXT2TEXT_GENERATION, standard_model_name: str = 'google/flan-t5-large', quantised_model_name: str = 'TheBloke/zephyr-7B-beta-GGUF', quantised_model_file: str = 'zephyr-7b-beta.Q4_K_M.gguf', quantised_model_type: str = 'mistral') None#

Get response from large language model.

Parameters:
  • query (str) -- question from user

  • embedding_model (str, optional) -- name of Sentence Transformers model, by default "sentence-transformers/all-MiniLM-L6-v2"

  • database_directory (pathlib.Path, optional) -- path to directory for storing vector store, by default pathlib.Path("embeddings_database")

  • search_type (RetrievalType, optional) -- kind of retrieval algorithm for searching vector store, by default RetrievalType.SIMILARITY

  • number_of_documents (int, optional) -- number of documents to retrieve, by default 5

  • initial_number_of_documents (int, optional) -- initial number of documents to consider, by default 10

  • diversity_level (float, optional) -- similarity between retrieved documents, by default 0.5

  • language_model_type (TransformerType, optional) -- kind of language model, by default TransformerType.STANDARD_TRANSFORMERS

  • standard_pipeline_type (PipelineType, optional) -- kind of Hugging Face pipeline, by default PipelineType.TEXT2TEXT_GENERATION

  • standard_model_name (str, optional) -- name of transformers compatible model, by default "google/flan-t5-large"

  • quantised_model_name (str, optional) -- name of ctransformers compatible model, by default "TheBloke/zephyr-7B-beta-GGUF"

  • quantised_model_file (str, optional) -- named of quantised model file, by default "zephyr-7b-beta.Q4_K_M.gguf"

  • quantised_model_type (str, optional) -- type of quantised model, by default "mistral"