Hugging Face Transformers · pipeline

装库

pip install transformers torch

pipeline：最简上手

from transformers import pipeline

# 情感分析
clf = pipeline("sentiment-analysis")
print(clf("I love this!"))
# [{'label': 'POSITIVE', 'score': 0.999}]

第一次跑会自动下载模型（几百 MB），之后缓存到 ~/.cache/huggingface。

常见任务一览

from transformers import pipeline

# 文本分类
clf = pipeline("sentiment-analysis")

# 翻译
trans = pipeline("translation_en_to_zh")
trans("Hello world")

# 摘要
summ = pipeline("summarization")
summ(long_text, max_length=130, min_length=30)

# 问答（给上下文 + 问题）
qa = pipeline("question-answering")
qa(question="谁发明了电话？", context="贝尔在 1876 年发明了电话。")

# 文本生成（小模型，玩具级）
gen = pipeline("text-generation", model="gpt2")
gen("Once upon a time", max_length=50)

# 命名实体识别
ner = pipeline("ner", aggregation_strategy="simple")
ner("Apple was founded by Steve Jobs in California.")

# 零样本分类
zsc = pipeline("zero-shot-classification")
zsc("我想学 Python", candidate_labels=["编程", "美食", "旅游"])

# 图像分类
img_clf = pipeline("image-classification")
img_clf("path/to/cat.jpg")

# 语音识别
asr = pipeline("automatic-speech-recognition")
asr("audio.wav")

指定模型

不指定的话用任务默认模型。可以换：

clf = pipeline("sentiment-analysis", model="uer/roberta-base-finetuned-jd-binary-chinese")
print(clf("这个东西真好用"))

Hugging Face Hub 有几十万模型——按任务 / 语言 / 大小过滤。

中文相关任务的好模型

任务	推荐
中文情感	`uer/roberta-base-finetuned-jd-binary-chinese`
中文 NER	`uer/roberta-base-finetuned-cluener2020-chinese`
中英翻译	`Helsinki-NLP/opus-mt-zh-en`
中文摘要	`csebuetnlp/mT5_multilingual_XLSum`

用 GPU

clf = pipeline("sentiment-analysis", device=0)    # GPU 0
clf = pipeline("sentiment-analysis", device="mps") # Apple Silicon
clf = pipeline("sentiment-analysis", device="cpu") # 默认

批量推理（重要：单条调用慢得多）

texts = ["text1", "text2", "text3", ...]      # 1000 条
results = clf(texts, batch_size=32)

批量比单条调用快 10–50 倍。

pipeline 背后做了三件事

# 1. 文本 → token IDs
inputs = tokenizer(text, return_tensors="pt")
# 2. 模型预测
outputs = model(**inputs)
# 3. 后处理（解码 / softmax）

下一篇拆开 pipeline 看里面——tokenizer 和 model。

局限

pipeline 适合快速试试 / 上手
大批量、复杂逻辑、自定义后处理 → 用底层 API
LLM 文本生成（聊天 / 长上下文）→ 不要用 pipeline，直接用 LLM API（OpenAI / Claude / 本地 Ollama）

下一篇：底层 API 的 Tokenizer 与 Model。