langchain本身不提供大模型,提供了llms工具与其他大语言模型进行交互
设置内存缓存 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 from langchain_community.cache import InMemoryCacheimport langchainfrom langchain_deepseek import ChatDeepSeekfrom dotenv import load_dotenvimport timeload_dotenv() llm = ChatDeepSeek(model="deepseek-chat" ) langchain.llm_cache = InMemoryCache() start_time = time.time() response_1 = llm.invoke("给我讲个笑话吧" ) end_time = time.time() during_time1 = end_time - start_time print (f"第一次花费{during_time1} " )start_time = time.time() response_2 = llm.invoke("给我讲个笑话吧" ) end_time = time.time() during_time2 = end_time - start_time print (f"第二次花费{during_time2} " )
设置数据库缓存 重启后缓存数据不丢失、不受内存限制、可备份和迁移、无需额外服务,基于文件存储
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 from langchain_community.cache import SQLiteCacheimport langchainfrom langchain_deepseek import ChatDeepSeekfrom dotenv import load_dotenvimport timeload_dotenv() llm = ChatDeepSeek(model="deepseek-chat" ) langchain.llm_cache = SQLiteCache(database_path="./langchain_cache.db" ) start_time = time.time() response_1 = llm.invoke("给我讲个笑话吧" ) print (response_1.content)end_time = time.time() during_time1 = end_time - start_time print (f"第一次花费{during_time1} " )start_time = time.time() response_2 = llm.invoke("给我讲个笑话吧" ) end_time = time.time() during_time2 = end_time - start_time print (f"第二次花费{during_time2} " )
模拟LLM fakelistLLM fakelistLLM可以模拟LLM的行为,主要用以省钱、无网络本地测试、原型开发
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 from langchain_community.llms import FakeListLLMfrom langchain.agents import initialize_agent, AgentTypefrom langchain_community.agent_toolkits.load_tools import load_toolsfrom langchain_experimental.tools import PythonREPLTooltools = [PythonREPLTool()] responses = [ "Action: Python REPL\nAction Input: print(2 + 2)" , "Final Answer: 4" ] llm = FakeListLLM(responses=responses) agent = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True ) agent.run("what's 2+2" )
异步调用LLM asyncio库 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import timeimport asynciofrom langchain_deepseek import ChatDeepSeekfrom dotenv import load_dotenvload_dotenv() def generate_serially (): llm = ChatDeepSeek(model="deepseek-chat" ) for _ in range (5 ): resp = llm.invoke(["这是一条串行函数的信息" ]) async def generate_async (llm ): resp = await llm.ainvoke("这是一条异步函数的信息" ) async def generate_concurrently (): llm = ChatDeepSeek(model="deepseek-chat" ) tasks = [generate_async(llm) for _ in range (5 )] await asyncio.gather(*tasks) s = time.perf_counter() generate_serially() elapsed = time.perf_counter() - s print (f"串行执行5次时间为{elapsed} " )s = time.perf_counter() asyncio.run(generate_concurrently()) elapsed = time.perf_counter() - s print (f"异步执行5次时间为{elapsed} " )
保存大模型配置(未完成) 通过json将大模型相关配置保存到本地
参数名
默认值
描述
使用场景示例
temperature
1.0
控制生成随机性:值越高越创意,越低越确定
代码生成:0.2
问答对话:0.7
创意写作:0.9
max_tokens
None
最大生成长度限制(token数)
短回复:500
长文生成:2000
frequency_penalty
0.0
减少重复内容:正值降低重复,负值增加重复
减少重复:0.3-0.5
允许重复:-0.1-(-0.3)
presence_penalty
0.0
鼓励话题多样性:正值鼓励新话题,负值专注同一话题
鼓励多样性:0.1-0.3
保持专注:-0.1-(-0.3)
n
1
生成多个候选答案的数量
单候选:1
多候选:3-5
best_of
1
服务器端生成多个并返回最佳(按对数概率评分)
标准模式:1
质量优先:3-5
_type
“chat_deepseek”
内部类型标识符(自动设置)
序列化/反序列化时使用
request_timeout
600.0
单个API请求超时时间(秒)
快速对话:30
长文生成:60-120
大模型流式输出 就是一个字一个字输出,不会阻塞在那几十秒
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from langchain_deepseek import ChatDeepSeekfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandlerfrom dotenv import load_dotenvfrom langchain.callbacks import get_openai_callbackload_dotenv() llm = ChatDeepSeek( model="deepseek-chat" , streaming=True , callbacks=[StreamingStdOutCallbackHandler()] ) llm.invoke("请写一篇关于AI的短文" ) with get_openai_callback() as cb: result = llm.invoke("解释深度学习" ) print (f"总Token数: {cb.total_tokens} " ) print (f"总成本: ${cb.total_cost} " )