groq_qa_generator package

Submodules

groq_qa_generator.cli module

groq_qa_generator.config module

groq_qa_generator.groq_api module

groq_qa_generator.groq_api.get_api_key()[source]

Retrieve the Groq API key from environment variables.

This function loads environment variables from a .env file (if present) and returns the value of the GROQ_API_KEY variable.

Returns:: The Groq API key.
Return type:: str
Raises:: Exception – If the GROQ_API_KEY variable is not set in the environment.

groq_qa_generator.groq_api.get_groq_client(api_key)[source]

Initialize and return a Groq API client.

Parameters:: api_key (str) – The API key used to authenticate with the Groq service.
Returns:: An instance of the Groq client for interacting with the Groq API.
Return type:: Groq

groq_qa_generator.groq_api.get_groq_completion(client, system_prompt, chunk_text, model, temperature, max_tokens)[source]

Generate a completion from the Groq API using a system prompt and input text.

This function sends a request to the Groq API to generate a completion based on the provided system prompt and chunked input text.

Parameters:

client (Groq) – The Groq API client.
system_prompt (str) – The prompt that defines the system’s behavior for the model.
chunk_text (str) – The input text chunk that is being processed by the model.
model (str) – The model identifier (e.g., “llama3-70b-8192”) for the completion.
temperature (float) – The temperature setting to control randomness in the output.
max_tokens (int) – The maximum number of tokens the model can generate in the response.

Returns:

The Groq API response object containing the completion results. None: If an error occurs during the API call, logs the error and returns None.

Return type:

completion (object)

groq_qa_generator.groq_api.stream_completion(completion)[source]

Stream the Groq API completion and return the accumulated response.

This function streams the generated completion from the Groq API.

Parameters:: completion (object) – The streamed response from the Groq API.
Returns:: The accumulated response from the streamed completion.
Return type:: str

groq_qa_generator.groq_qa module

groq_qa_generator.logging_setup module

groq_qa_generator.logging_setup.initialize_logging()[source]

Configure the logging settings for the application.

This function sets up the global logging configuration to display log messages at the INFO level or higher. The log format includes the timestamp, logger name, log level, and message. It also adjusts the log level for the ‘httpx’ library to display only warnings or higher to reduce unnecessary verbosity.

Logging format: - Timestamp (in the format ‘YYYY-MM-DD HH:MM:SS’) - Logger name - Log level (e.g., INFO, WARNING) - Log message

groq_qa_generator.qa_generation module

groq_qa_generator.text_processing module

groq_qa_generator.text_processing.clean_text(text)[source]

Cleans the input text by removing excessive whitespace.

This function replaces all sequences of whitespace characters (including tabs, newlines, and multiple spaces) with a single space. It also trims any leading or trailing whitespace from the text.

Parameters:

text (str) – The input text to be cleaned.

Returns:

The cleaned text with excessive whitespace removed and leading/trailing: whitespace trimmed.

Return type:

str

groq_qa_generator.text_processing.write_response_to_file(response, output_file, json_format=False)[source]

Write the generated response to the specified output file.

Depending on the json_format flag, the response is either written as JSON or plain text.

Parameters:

response (str) – The response string to be written to the file.
output_file (str) – The base name for the output file (without extension).
json_format (bool) – Flag to indicate whether to write as JSON. Defaults to False.

Side Effects:: Writes the response to the specified output file.

groq_qa_generator.tokenizer module

groq_qa_generator.tokenizer.count_tokens(text)[source]

Counts the number of tokens in a given text using the specified tokenization model.

This function uses the “cl100k_base” encoding model from tiktoken to encode the input text and counts the total number of tokens generated.

Parameters:: text (str) – The input text for which tokens need to be counted.
Returns:: The total number of tokens in the input text.
Return type:: int

groq_qa_generator.tokenizer.generate_text_chunks(file_path, chunk_size)[source]

Reads text from a file and splits it into chunks based on a token limit.

This function reads the input text file line by line, and accumulates text into chunks such that the total number of tokens in each chunk does not exceed chunk_size. When the token limit is reached, the chunk is added to the list of chunks.

Parameters:

file_path (str) – The path to the text file that needs to be chunked.
chunk_size (int) – The maximum number of tokens allowed in each chunk.

Returns:

A list containing text chunks where each chunk respects the token limit.

Return type:

list of str

groq_qa_generator package

Submodules

groq_qa_generator.cli module

groq_qa_generator.config module

groq_qa_generator.groq_api module

groq_qa_generator.groq_qa module

groq_qa_generator.logging_setup module

groq_qa_generator.qa_generation module

groq_qa_generator.text_processing module

groq_qa_generator.tokenizer module

Module contents