generative_ai.information_retrieval.step_1_retrieval module#

Define functionalities to store document embeddings.

create_document_embedder(embedding_model: str) → HuggingFaceEmbeddings#

Prepare a Sentence Transformers model for document embedding.

Parameters:: embedding_model (str) -- name of Sentence Transformers model from Hugging Face
Returns:: document embedder
Return type:: HuggingFaceEmbeddings

create_vector_store(embedder: HuggingFaceEmbeddings, directory_path: pathlib.Path) → Chroma#

Initialise a Chroma vector store.

Parameters:

embedder (HuggingFaceEmbeddings) -- document embedder
directory_path (pathlib.Path) -- path to directory for storing vector store

Returns:

vector store

Return type:

Chroma

load_json_documents(file_path: pathlib.Path) → list[Document]#

Load retrieval documents from a JSON file.

partition_documents(raw_documents: list[Document]) → list[Document]#

Partition retrieval documents into chunks.

Notes