推荐|Python使用当前文件的绝对路径

Python使用当前文件的绝对路径

23-11-14 09:42

2480

7287

blog.csdn.net

Python使用当前文件的绝对路径

文章目录

Python使用当前文件的绝对路径

背景：

在Python中，使用不同的软件环境相对路径有可能发生变化，执行时有可能报出找不到文件的问题，

比如说当前项目的路径不是执行文件的路径，

那么执行文件引用它同级目录的文件就有可能报错，

如何统一路径呢，以下介绍引用当前文件的绝对路径的方法：

文件结构如下：

#主目录下有个demo_mian.py的文件
demo_mian.py
#主目录下有个FuncLib.txt的文件
FuncLib.txt
1
2
3
4

处理代码：

import os
  
current_path = os.path.dirname(os.path.realpath(__file__))
#print(current_path)

txtfilename = 'FuncLib.txt' 
txtfile = os.path.join(current_path, txtfilename)
1
2
3
4
5
6
7

这样处理python不管在哪个环境下运行，都能准确无误地找到相应的文件。

《AUTOSAR谱系分解(ETAS工具链)》之总目录

id="article_content" class="article_content clearfix"> id="content_views" class="markdown_views prism-atom-one-light">

将表格数据（CSV 或 Excel 文件）加载到向量数据库（ChromaDB）中。这里定义的类 PrepareVectorDBFromTabularData，它的主要功能是读取表格数据文件到DataFrame中、生成嵌入向量、并将这些数据存储在向量数据库的集合中，同时对注入的数据进行验证。

代码结构与功能分析

1. 类的简介

目标：将 CSV 或 Excel 文件的数据转换为向量并存储到 ChromaDB 中。
主要方法：
- run_pipeline：运行整个数据处理管道。
- _load_dataframe：加载数据文件为 Pandas DataFrame。
- _prepare_data_for_injection：根据文件内容生成向量和相关元数据。
- _inject_data_into_chromadb：将数据注入到 ChromaDB 中。
- _validate_db：验证向量数据库的集合内容。

2. 初始化 (`init`)

def __init__(self, file_directory:str) -> None:
    self.APPCFG = LoadConfig()
    self.file_directory = file_directory
 class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def run_pipeline(self): self.df, self.file_name = self._load_dataframe(file_directory=self.file_directory) self.docs, self.metadatas, self.ids, self.embeddings = self._prepare_data_for_injection(df=self.df, file_name=self.file_name) self._inject_data_into_chromadb() self._validate_db() class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def _load_dataframe(self, file_directory: str): file_names_with_extensions = os.path.basename(file_directory) file_name, file_extension = os.path.splitext(file_names_with_extensions) if file_extension == ".csv": df = pd.read_csv(file_directory) return df, file_name elif file_extension == ".xlsx": df = pd.read_excel(file_directory) return df, file_name else: raise ValueError("The selected file type is not supported") class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def _prepare_data_for_injection(self, df:pd.DataFrame, file_name:str): docs = [] metadatas = [] ids = [] embeddings = [] for index, row in df.iterrows(): output_str = "" for col in df.columns: output_str += f"{col}: {row[col]},\n" response = self.APPCFG.OpenAIEmbeddings.embed_documents(output_str)[0] embeddings.append(response) docs.append(output_str) metadatas.append({"source": file_name}) ids.append(f"id{index}") return docs, metadatas, ids, embeddings class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def _inject_data_into_chromadb(self): chroma_client = self.APPCFG.chroma_client existing_collections = chroma_client.list_collections() collection_name = self.APPCFG.collection_name existing_collection_names = [collection.name for collection in existing_collections] if collection_name in existing_collection_names: collection = chroma_client.get_collection(name=collection_name) print(f"Retrieved existing collection: {collection_name}") else: collection = chroma_client.create_collection(name=collection_name) print(f"Created new collection: {collection_name}") collection.add( documents=self.docs, metadatas=self.metadatas, embeddings=self.embeddings, ids=self.ids ) print("Data is stored in ChromaDB.") class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def _validate_db(self): vectordb = self.APPCFG.chroma_client.get_collection(name=self.APPCFG.collection_name) print("Number of vectors in vectordb:", vectordb.count()) class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

import os import pandas as pd from utils.load_config import LoadConfig import pandas as pd class PrepareVectorDBFromTabularData: """ This class is designed to prepare a vector database from a CSV and XLSX file. It then loads the data into a ChromaDB collection. The process involves reading the CSV file, generating embeddings for the content, and storing the data in the specified collection. Attributes: APPCFG: Configuration object containing settings and client instances for database and embedding generation. file_directory: Path to the CSV file that contains data to be uploaded. """ def __init__(self, file_directory:str) -> None: """ Initialize the instance with the file directory and load the app config. Args: file_directory (str): The directory path of the file to be processed. """ self.APPCFG = LoadConfig() self.file_directory = file_directory def run_pipeline(self): """ Execute the entire pipeline for preparing the database from the CSV. This includes loading the data, preparing the data for injection, injecting the data into ChromaDB, and validating the existence of the injected data. """ self.df, self.file_name = self._load_dataframe(file_directory=self.file_directory) self.docs, self.metadatas, self.ids, self.embeddings = self._prepare_data_for_injection(df=self.df, file_name=self.file_name) self._inject_data_into_chromadb() self._validate_db() def _load_dataframe(self, file_directory: str): """ Load a DataFrame from the specified CSV or Excel file. Args: file_directory (str): The directory path of the file to be loaded. Returns: DataFrame, str: The loaded DataFrame and the file's base name without the extension. Raises: ValueError: If the file extension is neither CSV nor Excel. """ file_names_with_extensions = os.path.basename(file_directory) print(file_names_with_extensions) file_name, file_extension = os.path.splitext( file_names_with_extensions) if file_extension == ".csv": df = pd.read_csv(file_directory) return df, file_name elif file_extension == ".xlsx": df = pd.read_excel(file_directory) return df, file_name else: raise ValueError("The selected file type is not supported") def _prepare_data_for_injection(self, df:pd.DataFrame, file_name:str): """ Generate embeddings and prepare documents for data injection. Args: df (pd.DataFrame): The DataFrame containing the data to be processed. file_name (str): The base name of the file for use in metadata. Returns: list, list, list, list: Lists containing documents, metadatas, ids, and embeddings respectively. """ docs = [] metadatas = [] ids = [] embeddings = [] for index, row in df.iterrows(): output_str = "" # Treat each row as a separate chunk for col in df.columns: output_str += f"{col}: {row[col]},\n" response = self.APPCFG.OpenAIEmbeddings.embed_documents(output_str)[0] embeddings.append(response) docs.append(output_str) metadatas.append({"source": file_name}) ids.append(f"id{index}") return docs, metadatas, ids, embeddings def _inject_data_into_chromadb(self): """ Inject the prepared data into ChromaDB. Raises an error if the collection_name already exists in ChromaDB. The method prints a confirmation message upon successful data injection. """ chroma_client = self.APPCFG.chroma_client # 列出所有集合的名称 existing_collections = chroma_client.list_collections() collection_name = self.APPCFG.collection_name #"titanic_small" # 获取所有集合 existing_collections = chroma_client.list_collections() # 提取集合名称 existing_collection_names = [collection.name for collection in existing_collections] if collection_name in existing_collection_names: # 如果集合存在，获取它 collection = chroma_client.get_collection(name=collection_name) print(f"Retrieved existing collection: {collection_name}") else: # 如果集合不存在，创建它 collection = chroma_client.create_collection(name=collection_name) print(f"Created new collection: {collection_name}") collection.add( documents=self.docs, metadatas=self.metadatas, embeddings=self.embeddings, ids=self.ids ) print("==============================") print("Data is stored in ChromaDB.") def _validate_db(self): """ Validate the contents of the database to ensure that the data injection has been successful. Prints the number of vectors in the ChromaDB collection for confirmation. """ vectordb = self.APPCFG.chroma_client.get_collection(name=self.APPCFG.collection_name) print("==============================") print("Number of vectors in vectordb:", vectordb.count()) print("==============================") class="hljs-button signin active" data-title="登录复制" data-report-click="{"spm":"1001.2101.3001.4334"}"> class="hide-preCode-box">

Python使用当前文件的绝对路径

Python使用当前文件的绝对路径

文章目录

背景：

文件结构如下：

处理代码：

代码结构与功能分析

1. 类的简介

2. 初始化 (`init`)

3. 运行数据处理管道 (`run_pipeline`)

4. 加载数据文件 (`_load_dataframe`)

5. 准备数据 (`_prepare_data_for_injection`)

6. 注入数据到 ChromaDB (`_inject_data_into_chromadb`)

7. 验证数据库内容 (`_validate_db`)

总结

评论记录：

Python使用当前文件的绝对路径

文章目录

背景：

文件结构如下：

处理代码：

代码结构与功能分析

1. 类的简介

2. 初始化 (__init__)

3. 运行数据处理管道 (run_pipeline)

4. 加载数据文件 (_load_dataframe)

5. 准备数据 (_prepare_data_for_injection)

6. 注入数据到 ChromaDB (_inject_data_into_chromadb)

7. 验证数据库内容 (_validate_db)

总结

评论记录：

2. 初始化 (`init`)

3. 运行数据处理管道 (`run_pipeline`)

4. 加载数据文件 (`_load_dataframe`)

5. 准备数据 (`_prepare_data_for_injection`)

6. 注入数据到 ChromaDB (`_inject_data_into_chromadb`)

7. 验证数据库内容 (`_validate_db`)