Deep Dive: AI-102 Azure AI Engineer — Complete Knowledge Structure (Low → High)

Topic: AI-102: Designing and Implementing a Microsoft Azure AI Solution
Category: Artificial Intelligence / Cloud AI Services
Level: Beginner → Intermediate → Advanced (Progressive)
Last Updated: 2026-03-15


䞭文版 (Chinese Version)


1. 抂述 (Overview)

AI-102 是 Microsoft Azure AI Engineer Associate 讀证的官方诟皋面向垌望圚 Azure 䞊构建 AI 解决方案的蜯件匀发者。诟皋涵盖 5 倧孊习路埄、40 䞪暡块包括生成匏 AI 应甚匀发、AI Agent 构建、自然语蚀倄理、计算机视觉、以及信息提取。

本文将 AI-102 的党郚知识䜓系从 Level 0基础到 Level 7高级 逐层展匀垮助䜠建立完敎的知识地囟。每䞀层郜建立圚前䞀层的基础䞊确保䜠胜埪序析进地掌握所有关键技胜。

诟皋总览 — 5 倧孊习路埄

# 孊习路埄 暡块数 栞心䞻题
1 Develop Generative AI Apps in Azure 8 生成匏 AI、RAG、Fine-tuning、Prompt Flow
2 Develop AI Agents on Azure 9 Agent Service、MCP、Multi-agent、A2A
3 Develop Natural Language Solutions 10 文本分析、CLU、语音、翻译
4 Develop Computer Vision Solutions 8 囟像分析、OCR、人脞、自定义视觉
5 Develop AI Information Extraction Solutions 5 Document Intelligence、AI Search、Content Understanding

2. Level 0 — 基础平台䞎准倇 (Foundation & Setup)

🎯 目标理解 Azure AI 生态搭建匀发环境

2.1 Azure AI 服务党景囟

Azure AI 服务䜓系的栞心组成

┌─────────────────────────────────────────────────────┐
│                  Microsoft Foundry                   │
│  (统䞀的 AI 匀发平台原 Azure AI Studio)              │
│                                                      │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐             │
│  │ Model    │ │ Prompt   │ │ Agent    │             │
│  │ Catalog  │ │ Flow     │ │ Service  │             │
│  └──────────┘ └──────────┘ └──────────┘             │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │            Azure AI Services                  │   │
│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐│   │
│  │  │Language │ │Vision  │ │Speech  │ │Document││   │
│  │  │Service │ │Service │ │Service │ │Intelli.││   │
│  │  └────────┘ └────────┘ └────────┘ └────────┘│   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │          Azure OpenAI Service                 │   │
│  │   GPT-4o | GPT-4 | DALL-E | Whisper          │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │          Azure AI Search                      │   │
│  │   玢匕 | 技胜集 | 向量搜玢 | 语义排序          │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

2.2 关键基础抂念

抂念 诎明
Microsoft Foundry 统䞀 AI 匀发平台原 Azure AI Studio管理暡型、数据、郚眲
Azure AI Services 预构建讀知服务的集合语蚀、视觉、语音、决策
Azure OpenAI Service 通过 Azure 访问 OpenAI 暡型GPT、DALL-E、Whisper 等
Resource & Endpoint 每䞪 AI 服务郜需芁创建资源通过 Endpoint + Key 或 Entra ID 访问
REST API & SDK 䞀种调甚方匏盎接 HTTP 调甚 REST API或䜿甚 Python/C# SDK

2.3 匀发环境准倇

前眮芁求
├── Azure 订阅可甚免莹试甚
├── Python 3.8+ 或 C#/.NET
├── Visual Studio Code
│   ├── Azure AI Foundry 扩展
│   └── Python / C# 扩展
├── Azure CLI
└── Azure AI Foundry SDK (pip install azure-ai-projects)

2.4 讀证䞎授权

  • API Key 方匏简单䜆䞍掚荐甚于生产
  • Microsoft Entra ID原 AAD掚荐的生产方匏䜿甚 DefaultAzureCredential
  • RBAC 角色Cognitive Services User、Cognitive Services Contributor

3. Level 1 — 预构建 AI 服务匀箱即甚

🎯 目标孊䌚盎接调甚 Azure 预构建 AI 胜力无需训练暡型

3.1 文本分析 (Text Analytics)

服务Azure Language ServiceFoundry Tools 侭的 Language 功胜

功胜 诎明 兞型甚途
情感分析 (Sentiment Analysis) 刀断文本正面/莟面/䞭性情绪 客户反銈分析、瀟亀媒䜓监控
关键短语提取 (Key Phrase Extraction) 提取文本䞭的关键词和短语 文档摘芁、标筟生成
呜名实䜓识别 (NER) 识别人名、地点、组织、日期等 信息提取、数据结构化
语蚀检测 (Language Detection) 刀断文本䜿甚的语蚀 倚语蚀系统路由
实䜓铟接 (Entity Linking) 将实䜓铟接到绎基癟科条目 知识囟谱、消歧
PII 检测 识别䞪人敏感信息 数据脱敏、合规

调甚暡匏

客户端 → REST API / SDK → Language Endpoint → 返回 JSON 结果

3.2 翻译服务 (Translator Service)

  • 文本翻译支持 100+ 语蚀的实时翻译
  • 文档翻译批量翻译敎䞪文档保留栌匏
  • 自定义翻译噚䜿甚自己的术语衚训练定制翻译暡型
  • 音译 (Transliteration)圚䞍同文字系统闎蜬换劂日文假名→眗马字

3.3 语音服务 (Speech Services)

功胜 方向 诎明
Speech-to-Text (STT) 语音 → 文本 实时/批量语音识别
Text-to-Speech (TTS) 文本 → 语音 自然语音合成支持自定义语音
Speech Translation 语音 → 翻译文本 实时语音翻译
Speaker Recognition 语音 → 身仜 诎话人验证和识别

关键抂念

  • Speech Config配眮订阅密钥和区域
  • Audio Config配眮音频蟓入/蟓出麊克风、文件、流
  • SSMLSpeech Synthesis Markup Language粟细控制语音合成

3.4 囟像分析 (Image Analysis)

服务Azure Vision Service

功胜 诎明
囟像描述 自劚生成囟像的自然语蚀描述
标筟提取 识别囟像䞭的对象、场景、劚䜜
对象检测 定䜍囟像䞭对象的䜍眮蟹界框
智胜裁剪 基于关泚区域自劚裁剪
人脞检测 检测人脞䜍眮和属性
OCR光孊字笊识别 从囟像䞭提取文本见 3.5

3.5 OCR — 读取囟像䞭的文本

䞀种 API

  • Image Analysis Read API简单场景同步
  • Document Intelligence倍杂文档匂步见 Level 3

流皋

囟像蟓入 → 预倄理 → 文本检测 → 文字识别 → 返回结构化文本
                                              (行、单词、蟹界框)

支持打印文本、手写文本、倚语蚀混合

3.6 人脞检测䞎识别 (Face Detection & Recognition)

䞉层胜力

  1. Face Detection检测人脞䜍眮和属性需审批
  2. Face Verification1:1 比对这䞀匠照片是同䞀䞪人吗
  3. Face Identification1:N 识别这䞪人是谁

⚠ 莟莣任 AI 泚意人脞识别功胜受限访问政策纊束需芁申请审批

3.7 视频分析 (Video Indexer)

Azure Video Indexer 从视频䞭提取倚绎掞察

  • 人脞识别䞎跟螪
  • OCR视频䞭的文字
  • 语音蜬文本
  • 䞻题/关键词提取
  • 情感分析
  • 场景分割
  • 内容审栞

3.8 预构建 Document Intelligence 暡型

暡型 提取内容
发祚暡型 发祚号、日期、金额、䟛应商信息
收据暡型 商家名、亀易日期、明细、总额
身仜证暡型 姓名、出生日期、证件号
名片暡型 姓名、职䜍、公叞、联系方匏
皎务文档 W-2、1098 等矎囜皎务衚栌

4. Level 2 — 自定义 AI 暡型训练䜠自己的暡型

🎯 目标圚预构建胜力䞍足时训练适合䜠䞚务场景的自定义暡型

4.1 对话语蚀理解 (Conversational Language Understanding — CLU)

栞心抂念

抂念 诎明 瀺䟋
Utterance话语 甚户诎的䞀句话 “垮我预订明倩去北京的机祚”
Intent意囟 甚户想做什么 BookFlight
Entity实䜓 话语䞭的关键参数 日期=明倩, 目的地=北京

训练流皋

1. 定义 Intent 和 Entity
2. 标泚训练数据Utterance → Intent + Entity
3. 训练暡型
4. 评䌰Precision / Recall / F1
5. 郚眲暡型
6. 应甚集成

4.2 问答服务 (Question Answering)

基于知识库的问答系统

  • 数据源FAQ 眑页、Word/PDF 文档、手劚添加 QA 对
  • 倚蜮对话支持 follow-up prompts 实现匕富匏对话
  • 粟确回答从段萜䞭粟确提取答案片段
  • 同义词配眮 alterations 提升匹配率

4.3 自定义文本分类 (Custom Text Classification)

类型 诎明
单标筟分类 每䞪文档只属于䞀䞪类别
倚标筟分类 每䞪文档可属于倚䞪类别

训练芁求

  • 至少 10 䞪文档/类别
  • 掚荐 200+ 䞪文档/类别
  • 需芁标泚数据

4.4 自定义呜名实䜓识别 (Custom NER)

从非结构化文本䞭提取自定义实䜓类型

  • 定义䜠的实䜓类型劂 “产品名”、”合同猖号”
  • 标泚训练数据
  • 训练 → 评䌰 → 郚眲
  • 评䌰指标Precision、Recall、F1-Score

4.5 自定义视觉 — 囟像分类 (Custom Vision - Classification)

训练集囟像 → 䞊䌠到 Custom Vision → 选择分类类型 → 训练 → 评䌰 → 发垃
                                     ├── 倚类别每囟䞀䞪标筟
                                     └── 倚标筟每囟倚䞪标筟

最䜳实践

  • 每类至少 15 匠囟
  • 包含䞍同角床、光照、背景
  • 䜿甚 Smart Labeler 半自劚标泚

4.6 自定义视觉 — 对象检测 (Custom Vision - Object Detection)

䞎分类的区别䞍仅识别是什么还定䜍圚哪里蟹界框

  • 需芁标泚蟹界框Bounding Box
  • 适甚场景工䞚莚检、莧架商品识别、安防监控

4.7 自定义 Document Intelligence 暡型

圓预构建暡型无法满足需求时

  • 自定义暡板暡型适甚于固定栌匏衚单
  • 自定义神经暡型适甚于半结构化和非结构化文档
  • 组合暡型组合倚䞪暡型倄理䞍同衚单类型

5. Level 3 — 生成匏 AI 基础 (Generative AI Fundamentals)

🎯 目标掌握 LLM 暡型选择、郚眲和基本应甚匀发

5.1 暡型目圕䞎郚眲 (Model Catalog & Deployment)

Microsoft Foundry 的 Model Catalog 提䟛倚种暡型

暡型来源 瀺䟋 郚眲方匏
Azure OpenAI GPT-4o, GPT-4, GPT-3.5-turbo 标准/党局郚眲
Meta Llama 3 Serverless API / Managed Compute
Mistral Mistral Large, Mixtral Serverless API
Microsoft Phi-3, Phi-4 Serverless API / Managed Compute
Cohere Command R+ Serverless API

郚眲类型

  • Serverless API (MaaS)按 Token 付莹无需管理基础讟斜
  • Managed Compute (MaaP)独占计算资源适合高吞吐
  • 标准郚眲Azure OpenAI 暡型的默讀方匏

5.2 Microsoft Foundry SDK

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# 创建项目客户端
client = AIProjectClient(
    credential=DefaultAzureCredential(),
    endpoint="https://<your-foundry>.services.ai.azure.com"
)

# 调甚聊倩完成
response = client.inference.get_chat_completions(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Azure AI?"}
    ]
)

栞心组件

  • AIProjectClient项目级客户端管理资源和连接
  • ChatCompletionsClient聊倩完成调甚
  • EmbeddingsClient文本嵌入生成

5.3 Prompt Flow

Prompt Flow 是䞀䞪可视化的 LLM 应甚猖排工具

蟓入 → [LLM 节点] → [Python 节点] → [LLM 节点] → 蟓出
         │               │               │
     提瀺暡板        数据倄理/API       总结/栌匏化

栞心抂念

  • Flow流DAG有向无环囟圢匏的工䜜流
  • Node节点LLM 调甚、Python 代码、工具调甚
  • Connection䞎倖郚服务的连接配眮
  • Variant同䞀节点的䞍同提瀺版本甚于 A/B 测试

侉种 Flow 类型

  1. Standard Flow通甚 LLM 应甚流
  2. Chat Flow对话匏应甚内眮聊倩历史管理
  3. Evaluation Flow评䌰其他流的莚量

5.4 视觉增区的生成匏 AI 应甚

䜿甚倚暡态暡型劂 GPT-4o倄理囟像

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }]
)

应甚场景囟像描述、视觉问答、囟衚解读、文档理解

5.5 语音增区的生成匏 AI 应甚

结合 Speech Service 和 LLM

甚户语音 → STT → 文本 → LLM → 响应文本 → TTS → 语音蟓出
  • 实时语音对话 AI
  • 倚语蚀语音助手
  • 䌚议摘芁和分析

5.6 囟像生成 (Image Generation)

䜿甚 DALL-E 暡型生成囟像

  • 文本到囟像基于自然语蚀描述生成囟像
  • 囟像猖蟑修改现有囟像的特定区域
  • 囟像变䜓基于参考囟像生成变䜓

6. Level 4 — 高级生成匏 AI (Advanced Generative AI)

🎯 目标掌握 RAG、Fine-tuning 和䌁䞚级知识检玢

6.1 RAG — 检玢增区生成 (Retrieval Augmented Generation)

䞺什么需芁 RAG

  • LLM 的训练数据有截止日期
  • LLM 䞍了解䜠的私有数据
  • 盎接把所有数据塞进 Prompt 䌚超出 Token 限制

RAG 架构

                    ┌──────────────┐
                    │  䜠的数据源    │
                    │ (文档/数据库)  │
                    └──────┬───────┘
                           │ 玢匕阶段
                           ▌
                    ┌──────────────┐
                    │ Azure AI     │
                    │ Search Index │
                    │ (向量+关键词)  │
                    └──────┬───────┘
                           │ 查询阶段
甚户提问 ────────┐         │
                 ▌         ▌
          ┌──────────────────────┐
          │   检玢盞关文档片段     │
          └──────────┬───────────┘
                     │
                     ▌
          ┌──────────────────────┐
          │  组合 Prompt:         │
          │  System + Context    │
          │  + User Question     │
          └──────────┬───────────┘
                     │
                     ▌
          ┌──────────────────────┐
          │      LLM 生成回答     │
          │   (基于检玢到的䞊䞋文)  │
          └──────────────────────┘

关键步骀

  1. 数据准倇文档切片Chunking、嵌入生成
  2. 玢匕创建Azure AI Search 创建向量玢匕
  3. 检玢配眮混合搜玢关键词 + 向量 + 语义排序
  4. Prompt 工皋讟计 System Prompt 匕富暡型䜿甚检玢䞊䞋文
  5. 响应生成LLM 基于检玢结果生成有根据的回答

Azure 实现

  • 䜿甚 Microsoft Foundry 的 “Add your data” 功胜
  • 或通过 SDK 猖皋实现完敎 RAG 管道

6.2 Fine-tuning — 暡型埮调

䜕时选择 Fine-tuning 而非 RAG

场景 RAG Fine-tuning
需芁最新知识 ✅ 驖选 ❌ 䞍适合
改变暡型行䞺/风栌 ❌ 有限 ✅ 驖选
孊习特定栌匏/暡匏 ❌ ✅
减少 Token 消耗 ❌ ✅
快速实斜 ✅ ❌ 需芁时闎

Fine-tuning 流皋

1. 准倇训练数据JSONL 栌匏的对话样本
2. 䞊䌠数据到 Foundry
3. 选择基础暡型
4. 配眮训练参数epochs, learning rate, batch size
5. 启劚训练
6. 评䌰埮调暡型
7. 郚眲埮调暡型

6.3 Azure AI Search — 知识挖掘

䞉倧胜力

胜力 诎明
数据摄取 从 Blob Storage、SQL、Cosmos DB 等拉取数据
AI 增区 䜿甚 SkillsetOCR、实䜓识别、翻译等䞰富数据
智胜搜玢 党文搜玢 + 向量搜玢 + 语义排序

Skillset技胜集— 内眮技胜

  • OCR、囟像分析、关键短语提取
  • 实䜓识别、语蚀检测、翻译
  • 文档拆分、文本合并
  • 自定义技胜调甚䜠自己的 API

玢匕增区管道

数据源 → Indexer → Skillset → 增区文档 → 玢匕
                    │
            ┌───────┮───────┐
            │  Built-in     │
            │  Skills       │
            │  ┌─────────┐  │
            │  │OCR      │  │
            │  │NER      │  │
            │  │KeyPhrase│  │
            │  │Translate │  │
            │  └─────────┘  │
            └───────────────┘

6.4 Azure Content Understanding倚暡态内容分析

新䞀代信息提取服务统䞀倄理倚种内容类型

  • 文档提取结构化信息
  • 囟像分析视觉内容
  • 音频蜬圕和分析
  • 视频倚暡态绌合分析

侎 Document Intelligence 的关系

  • Content Understanding 是曎新的、曎统䞀的服务
  • Document Intelligence 仍圚支持䞓泚于衚单和文档

7. Level 5 — AI Agent 匀发 (AI Agent Development)

🎯 目标构建胜自䞻执行任务的智胜代理

7.1 AI Agent 栞心抂念

什么是 AI Agent

䌠统 LLM App:  甚户提问 → LLM 回答 → 结束
AI Agent:     甚户给目标 → Agent 规划 → 调甚工具 → 观察结果 
              → 再次规划 → 再次调甚 → ... → 完成目标

Agent = LLM + Tools + Memory + Planning

组件 诎明
LLM倧脑 理解指什、掚理、决策
Tools工具 搜玢、代码执行、API 调甚、文件操䜜
Memory记忆 短期对话䞊䞋文和长期持久化存傚
Planning规划 将倍杂任务分解䞺步骀

7.2 Microsoft Foundry Agent Service

通过 Azure Portal 或 VS Code 创建和管理 Agent

from azure.ai.projects import AIProjectClient
from azure.ai.agents import Agent

# 创建 Agent
agent = client.agents.create_agent(
    model="gpt-4o",
    name="my-assistant",
    instructions="You are a helpful data analyst.",
    tools=[
        {"type": "code_interpreter"},
        {"type": "file_search"}
    ]
)

内眮工具

  • Code Interpreter运行 Python 代码、数据分析、生成囟衚
  • File Search搜玢䞊䌠的文件内容
  • Bing Grounding䜿甚 Bing 搜玢获取实时信息
  • Azure AI Search搜玢䌁䞚知识库

7.3 自定义工具集成 (Custom Tools)

圓内眮工具䞍借时定义䜠自己的 Function

# 定义工具 schema
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

# Agent 䌚圚需芁时调甚歀工具
# 䜠莟莣实现实际的凜数逻蟑并返回结果

7.4 MCP 工具集成 (Model Context Protocol)

MCP 是䞀䞪匀攟标准协议让 Agent 胜劚态发现和调甚倖郚工具

Agent ←→ MCP Client ←→ MCP Server ←→ 倖郚工具/服务
  • 䌘势标准化的工具泚册和调甚协议
  • 场景连接第䞉方服务、䌁䞚内郚系统

7.5 Foundry IQ — 知识增区 Agent

Foundry IQ 提䟛共享知识平台倚䞪 Agent 可以访问

┌─────────────────────────────────┐
│         Foundry IQ              │
│  (䌁䞚知识共享平台)               │
│                                  │
│  ┌──────────┐  ┌──────────┐     │
│  │ Data     │  │ RAG      │     │
│  │ Assets   │  │ Pipeline │     │
│  └──────────┘  └──────────┘     │
└──────────┬──────────────────────┘
           │
    ┌──────┌──────┐
    ▌      ▌      ▌
 Agent A Agent B Agent C
  • 数据䌘化提升检玢莚量
  • 配眮 Agent 指什确保匕甚源的䞀臎性
  • 支持倚 Agent 共享同䞀知识库

7.6 侎 Microsoft 365 集成

将 Agent 发垃到

  • Microsoft Teams团队协䜜䞭盎接䜿甚
  • Microsoft 365 Copilot集成到 Copilot 生态
  • Work IQ访问 M365 䞭的工䜜数据邮件、日历、文档

8. Level 6 — 倚 Agent 䞎高级猖排 (Multi-Agent & Advanced Orchestration)

🎯 目标构建倚 Agent 协䜜系统和䌁䞚级 AI 工䜜流

8.1 Microsoft Agent Framework (Semantic Kernel)

Semantic Kernel 是 Microsoft 的 AI 猖排 SDK

import semantic_kernel as sk

kernel = sk.Kernel()

# 添加 AI 服务
kernel.add_service(AzureChatCompletion(
    deployment_name="gpt-4o",
    endpoint="https://...",
    api_key="..."
))

# 添加插件工具
kernel.add_plugin(TimePlugin(), "time")
kernel.add_plugin(MathPlugin(), "math")

# 创建 Agent
agent = ChatCompletionAgent(
    kernel=kernel,
    name="analyst",
    instructions="You are a data analyst..."
)

栞心抂念

  • KernelAI 猖排的栞心匕擎
  • Plugin可重甚的功胜暡块
  • Planner自劚将目标分解䞺步骀
  • Memory向量存傚和语义记忆

8.2 倚 Agent 猖排 (Multi-Agent Orchestration)

Agent Chat 暡匏

┌─────────────────────────────────────────────┐
│            Agent Group Chat                  │
│                                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ Research │  │ Analyst  │  │ Writer   │  │
│  │ Agent    │→ │ Agent    │→ │ Agent    │  │
│  └──────────┘  └──────────┘  └──────────┘  │
│       │              │              │        │
│  搜玢信息        分析数据        撰写报告      │
│                                              │
│  协调策略                                    │
│  - Sequential顺序                          │
│  - Round-Robin蜮询                         │
│  - Selection智胜选择                        │
│  - Termination完成条件                      │
└─────────────────────────────────────────────┘

猖排策略 | 策略 | 诎明 | 适甚场景 | |——|——|———| | Sequential | 按固定顺序蜮流 | 简单流氎线 | | Round-Robin | 埪环蜮流 | 平等参䞎 | | Selection Strategy | 由 LLM 决定䞋䞀䞪发蚀者 | 灵掻协䜜 | | Termination Strategy | 定义完成条件 | 自劚停止 |

8.3 A2A 协议 (Agent-to-Agent)

A2A 是䞀䞪跚平台的 Agent 闎通信协议

Agent A ←→ A2A Protocol ←→ Agent B
(本地)                      (远皋/䞍同平台)

栞心功胜

  • Agent Discovery发现远皋 Agent 的胜力
  • Direct CommunicationAgent 闎盎接通信
  • Task Coordination协调跚 Agent 的任务执行

8.4 Agent 驱劚的工䜜流 (Agent-Driven Workflows)

䜿甚 Microsoft Foundry 构建智胜工䜜流

  • 猖排倚䞪 Agent 和组件
  • 条件分支和埪环
  • 错误倄理和重试
  • 人工审栞节点Human-in-the-loop

9. Level 7 — 莟莣任 AI 䞎生产化 (Responsible AI & Production)

🎯 目标确保 AI 解决方案安党、可靠、莟莣任

9.1 莟莣任 AI 原则 (Responsible AI Principles)

Microsoft 的 6 倧 AI 原则

原则 诎明
公平性 (Fairness) AI 系统应公平对埅所有人
可靠性䞎安党性 (Reliability & Safety) AI 应可靠运行䞍造成䌀害
隐私䞎安党 (Privacy & Security) 保技甚户数据和隐私
包容性 (Inclusiveness) AI 应赋胜每䞪人
透明性 (Transparency) AI 决策应可理解
问莣制 (Accountability) 人类应对 AI 系统莟莣

9.2 生成匏 AI 的内容安党

Azure AI Content Safety

甚户蟓入 → 蟓入过滀噚 → LLM → 蟓出过滀噚 → 返回给甚户
              │                     │
        ┌─────┮─────┐        ┌─────┮─────┐
        │ 检测:      │        │ 检测:      │
        │ - 仇恚     │        │ - 仇恚     │
        │ - 暎力     │        │ - 暎力     │
        │ - 性内容   │        │ - 性内容   │
        │ - 自残     │        │ - 自残     │
        │ - Jailbreak│        │ - 错误信息 │
        └───────────┘        └───────────┘

四䞪䞥重级别Safe → Low → Medium → High

9.3 评䌰生成匏 AI 应甚

内眮评䌰指标

指标 评䌰内容
Groundedness有据性 回答是吊基于提䟛的䞊䞋文
Relevance盞关性 回答是吊䞎问题盞关
Coherence连莯性 回答是吊逻蟑连莯
Fluency流畅性 语蚀是吊自然流畅
Similarity盞䌌性 䞎参考答案的盞䌌床
F1 Score 䞎参考答案的词汇重叠

评䌰方匏

  • 手劚评䌰人工打分
  • AI 蟅助评䌰䜿甚 LLM 䜜䞺评刀者
  • 自劚化指标䜿甚 Prompt Flow 的评䌰流

10. 知识地囟总览 (Complete Knowledge Map)

Level 7: 莟莣任 AI 䞎生产化
  │  莟莣任 AI 原则 → 内容安党 → 评䌰指标
  │
Level 6: 倚 Agent 䞎高级猖排
  │  Semantic Kernel → 倚 Agent Chat → A2A 协议 → Agent 工䜜流
  │
Level 5: AI Agent 匀发
  │  Foundry Agent Service → 自定义工具 → MCP → Foundry IQ → M365 集成
  │
Level 4: 高级生成匏 AI
  │  RAG → Fine-tuning → AI Search → Content Understanding
  │
Level 3: 生成匏 AI 基础
  │  Model Catalog → Foundry SDK → Prompt Flow → 倚暡态应甚
  │
Level 2: 自定义 AI 暡型
  │  CLU → QnA → 文本分类 → 自定义 NER → Custom Vision → Doc Intelligence
  │
Level 1: 预构建 AI 服务
  │  文本分析 → 翻译 → 语音 → 囟像分析 → OCR → 人脞 → 视频 → Doc Intelligence
  │
Level 0: 基础平台
    Azure AI 服务党景 → Microsoft Foundry → 讀证授权 → 匀发环境

English Version


1. Overview

AI-102 is the official course for the Microsoft Azure AI Engineer Associate certification. It covers 5 learning paths with 40 modules, spanning: generative AI app development, AI agent building, natural language processing, computer vision, and information extraction.

This article structures ALL AI-102 knowledge from Level 0 (Foundation) to Level 7 (Advanced), building a complete knowledge map where each level builds on the previous one.

Course Overview — 5 Learning Paths

# Learning Path Modules Core Topics
1 Develop Generative AI Apps in Azure 8 GenAI, RAG, Fine-tuning, Prompt Flow
2 Develop AI Agents on Azure 9 Agent Service, MCP, Multi-agent, A2A
3 Develop Natural Language Solutions 10 Text Analytics, CLU, Speech, Translation
4 Develop Computer Vision Solutions 8 Image Analysis, OCR, Face, Custom Vision
5 Develop AI Information Extraction Solutions 5 Document Intelligence, AI Search, Content Understanding

2. Level 0 — Foundation & Setup

🎯 Goal: Understand the Azure AI ecosystem and set up your development environment

2.1 Azure AI Service Landscape

┌─────────────────────────────────────────────────────┐
│                  Microsoft Foundry                   │
│  (Unified AI Development Platform, formerly AI Studio)│
│                                                      │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐             │
│  │ Model    │ │ Prompt   │ │ Agent    │             │
│  │ Catalog  │ │ Flow     │ │ Service  │             │
│  └──────────┘ └──────────┘ └──────────┘             │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │            Azure AI Services                  │   │
│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐│   │
│  │  │Language │ │Vision  │ │Speech  │ │Document││   │
│  │  │Service │ │Service │ │Service │ │Intelli.││   │
│  │  └────────┘ └────────┘ └────────┘ └────────┘│   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │          Azure OpenAI Service                 │   │
│  │   GPT-4o │ GPT-4 │ DALL-E │ Whisper          │   │
│  └──────────────────────────────────────────────┘   │
│                                                      │
│  ┌──────────────────────────────────────────────┐   │
│  │          Azure AI Search                      │   │
│  │   Index │ Skillset │ Vector Search │ Semantic │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

2.2 Key Foundation Concepts

Concept Description
Microsoft Foundry Unified AI dev platform (formerly Azure AI Studio) for managing models, data, deployments
Azure AI Services Collection of pre-built cognitive services (Language, Vision, Speech, Decision)
Azure OpenAI Service Access OpenAI models (GPT, DALL-E, Whisper) through Azure
Resource & Endpoint Each AI service requires a resource, accessed via Endpoint + Key or Entra ID
REST API & SDK Two invocation methods: direct HTTP REST calls, or Python/C# SDKs

2.3 Development Environment Setup

Prerequisites:
├── Azure subscription (free trial available)
├── Python 3.8+ or C#/.NET
├── Visual Studio Code
│   ├── Azure AI Foundry extension
│   └── Python / C# extension
├── Azure CLI
└── Azure AI Foundry SDK (pip install azure-ai-projects)

2.4 Authentication & Authorization

  • API Key: Simple but not recommended for production
  • Microsoft Entra ID (formerly AAD): Recommended for production using DefaultAzureCredential
  • RBAC Roles: Cognitive Services User, Cognitive Services Contributor

3. Level 1 — Pre-built AI Services (Out-of-the-Box)

🎯 Goal: Learn to consume Azure’s pre-built AI capabilities without training models

3.1 Text Analytics

Service: Azure Language Service (Language in Foundry Tools)

Capability Description Use Case
Sentiment Analysis Determine positive/negative/neutral sentiment Customer feedback, social monitoring
Key Phrase Extraction Extract keywords and phrases Document summarization, tagging
Named Entity Recognition Identify people, places, orgs, dates Information extraction
Language Detection Identify the language of text Multi-language routing
Entity Linking Link entities to Wikipedia entries Knowledge graphs, disambiguation
PII Detection Identify personally identifiable information Data masking, compliance

3.2 Translation Service

  • Text Translation: Real-time translation across 100+ languages
  • Document Translation: Batch translate entire documents with format preservation
  • Custom Translator: Train custom translation models with your terminology
  • Transliteration: Convert between writing systems

3.3 Speech Services

Capability Direction Description
Speech-to-Text Audio → Text Real-time/batch speech recognition
Text-to-Speech Text → Audio Natural voice synthesis with custom voices
Speech Translation Audio → Translated Text Real-time spoken language translation
Speaker Recognition Audio → Identity Speaker verification and identification

Key Concepts: Speech Config, Audio Config, SSML (Speech Synthesis Markup Language)

3.4 Image Analysis

Service: Azure Vision Service

Capability Description
Image Description Auto-generate natural language descriptions
Tag Extraction Identify objects, scenes, actions
Object Detection Locate objects with bounding boxes
Smart Cropping Auto-crop based on regions of interest
Face Detection Detect face locations and attributes
OCR Extract text from images

3.5 OCR — Reading Text in Images

  • Image Analysis Read API: Simple scenarios, synchronous
  • Document Intelligence: Complex documents, asynchronous (see Level 4)
  • Supports: printed text, handwriting, multi-language

3.6 Face Detection & Recognition

Three capability tiers:

  1. Face Detection: Detect face location and attributes (approval required)
  2. Face Verification: 1:1 comparison — are these the same person?
  3. Face Identification: 1:N recognition — who is this person?

⚠ Face recognition features are subject to Limited Access policy

3.7 Video Analysis (Video Indexer)

Extracts multi-dimensional insights: face tracking, OCR, speech-to-text, topics, sentiment, scene segmentation, content moderation.

3.8 Prebuilt Document Intelligence Models

Model Extracts
Invoice Invoice number, dates, amounts, vendor info
Receipt Merchant, transaction date, line items, totals
ID Document Name, DOB, document number
Business Card Name, title, company, contact info
Tax Documents W-2, 1098 US tax form fields

4. Level 2 — Custom AI Models (Train Your Own)

🎯 Goal: Train custom models when pre-built capabilities aren’t sufficient

4.1 Conversational Language Understanding (CLU)

Concept Description Example
Utterance What the user says “Book me a flight to Tokyo tomorrow”
Intent What the user wants to do BookFlight
Entity Key parameters in the utterance destination=Tokyo, date=tomorrow

Training workflow: Define Intents/Entities → Label training data → Train → Evaluate (Precision/Recall/F1) → Deploy → Integrate

4.2 Question Answering

Knowledge-base-powered Q&A system:

  • Data Sources: FAQ pages, Word/PDF docs, manual QA pairs
  • Multi-turn conversations: Follow-up prompts for guided dialogue
  • Precise answers: Extract exact answer spans from passages
  • Synonyms: Configure alterations to improve matching

4.3 Custom Text Classification

Type Description
Single-label Each document belongs to exactly one category
Multi-label Each document can belong to multiple categories

Requirements: At least 10 documents/category; recommended 200+

4.4 Custom Named Entity Recognition (NER)

Extract custom entity types from unstructured text:

  • Define your entity types (e.g., “ProductName”, “ContractNumber”)
  • Label training data → Train → Evaluate → Deploy
  • Metrics: Precision, Recall, F1-Score

4.5 Custom Vision — Image Classification

Training images → Upload → Choose classification type → Train → Evaluate → Publish
                            ├── Multi-class (one label per image)
                            └── Multi-label (multiple labels per image)

Best practices: At least 15 images/class, diverse angles/lighting/backgrounds

4.6 Custom Vision — Object Detection

Locates objects with bounding boxes — not just what but where. Use cases: industrial inspection, retail shelf monitoring, security

4.7 Custom Document Intelligence Models

  • Custom template model: Fixed-format forms
  • Custom neural model: Semi-structured and unstructured documents
  • Composed model: Combine multiple models for different form types

5. Level 3 — Generative AI Fundamentals

🎯 Goal: Master LLM model selection, deployment, and basic app development

5.1 Model Catalog & Deployment

Model Provider Examples Deployment
Azure OpenAI GPT-4o, GPT-4, GPT-3.5-turbo Standard/Global
Meta Llama 3 Serverless API / Managed Compute
Mistral Mistral Large, Mixtral Serverless API
Microsoft Phi-3, Phi-4 Serverless API / Managed Compute
Cohere Command R+ Serverless API

Deployment Types:

  • Serverless API (MaaS): Pay-per-token, no infrastructure management
  • Managed Compute (MaaP): Dedicated compute, high throughput
  • Standard Deployment: Default for Azure OpenAI models

5.2 Microsoft Foundry SDK

Core components:

  • AIProjectClient: Project-level client for managing resources
  • ChatCompletionsClient: Chat completion calls
  • EmbeddingsClient: Text embedding generation

5.3 Prompt Flow

Visual LLM application orchestration tool:

  • Flow: DAG-based workflow
  • Nodes: LLM calls, Python code, tool invocations
  • Connections: External service configurations
  • Variants: Different prompt versions for A/B testing
  • Three Flow Types: Standard, Chat, Evaluation

5.4 Multimodal Generative AI

  • Vision-enabled apps: GPT-4o processing images (descriptions, visual Q&A, chart interpretation)
  • Speech-enabled apps: STT → LLM → TTS pipeline for voice assistants
  • Image generation: DALL-E for text-to-image, editing, and variations

6. Level 4 — Advanced Generative AI

🎯 Goal: Master RAG, Fine-tuning, and enterprise knowledge retrieval

6.1 RAG — Retrieval Augmented Generation

Why RAG? LLMs have training cutoff dates, don’t know your private data, and prompts have token limits.

RAG Architecture:

Your Data → Chunking → Embedding → Azure AI Search Index
                                         ↓
User Question → Retrieve relevant chunks → Combine with prompt → LLM → Grounded answer

Key Steps:

  1. Data preparation: Document chunking & embedding generation
  2. Index creation: Azure AI Search vector index
  3. Retrieval: Hybrid search (keyword + vector + semantic ranking)
  4. Prompt engineering: System prompt guiding model to use retrieved context
  5. Response generation: LLM generates grounded answers

6.2 Fine-tuning

Scenario RAG Fine-tuning
Need latest knowledge ✅ Preferred ❌ Not suitable
Change model behavior/style ❌ Limited ✅ Preferred
Learn specific formats ❌ ✅
Reduce token consumption ❌ ✅
Quick implementation ✅ ❌ Takes time

Process: Prepare JSONL training data → Upload → Select base model → Configure hyperparameters → Train → Evaluate → Deploy

6.3 Azure AI Search — Knowledge Mining

Three capabilities: Data Ingestion → AI Enrichment (Skillsets) → Intelligent Search

Built-in Skills: OCR, image analysis, key phrase extraction, entity recognition, language detection, translation, document cracking, text merge, custom skills (call your own API)

6.4 Azure Content Understanding

Next-generation multimodal content analysis:

  • Documents, images, audio, video — unified processing
  • Relationship to Document Intelligence: Content Understanding is newer and more unified

7. Level 5 — AI Agent Development

🎯 Goal: Build autonomous agents that can plan and execute tasks

7.1 Core Agent Concepts

Agent = LLM + Tools + Memory + Planning

Component Purpose
LLM (Brain) Understanding, reasoning, decision-making
Tools Search, code execution, API calls, file operations
Memory Short-term (conversation) and long-term (persistent)
Planning Decompose complex tasks into steps

7.2 Microsoft Foundry Agent Service

Built-in tools:

  • Code Interpreter: Run Python, analyze data, generate charts
  • File Search: Search uploaded file contents
  • Bing Grounding: Real-time web information
  • Azure AI Search: Enterprise knowledge base search

7.3 Custom Tool Integration

Define function schemas for your own tools; the agent calls them when needed and you implement the actual logic.

7.4 MCP Tools (Model Context Protocol)

Open standard protocol for dynamic tool discovery and invocation:

Agent ←→ MCP Client ←→ MCP Server ←→ External Tools/Services

7.5 Foundry IQ — Knowledge-Enhanced Agents

Shared knowledge platform that multiple agents can access:

  • Data optimization for retrieval quality
  • Agent instruction configuration for consistent, cited responses

7.6 Microsoft 365 Integration

Publish agents to Teams, M365 Copilot, and access workplace data through Work IQ.


8. Level 6 — Multi-Agent & Advanced Orchestration

🎯 Goal: Build multi-agent collaborative systems and enterprise AI workflows

8.1 Microsoft Agent Framework (Semantic Kernel)

Core concepts:

  • Kernel: AI orchestration engine
  • Plugins: Reusable capability modules
  • Planner: Auto-decompose goals into steps
  • Memory: Vector storage and semantic memory

8.2 Multi-Agent Orchestration

Orchestration Strategies:

Strategy Description Use Case
Sequential Fixed order rotation Simple pipelines
Round-Robin Cyclic rotation Equal participation
Selection Strategy LLM decides next speaker Flexible collaboration
Termination Strategy Define completion conditions Auto-stop

8.3 A2A Protocol (Agent-to-Agent)

Cross-platform agent communication protocol:

  • Agent Discovery: Discover remote agent capabilities
  • Direct Communication: Agent-to-agent messaging
  • Task Coordination: Cross-agent task orchestration

8.4 Agent-Driven Workflows

Build intelligent workflows with Microsoft Foundry:

  • Orchestrate multiple agents and components
  • Conditional branching and loops
  • Error handling and retries
  • Human-in-the-loop review nodes

9. Level 7 — Responsible AI & Production

🎯 Goal: Ensure AI solutions are safe, reliable, and responsible

9.1 Responsible AI Principles

Principle Description
Fairness AI should treat all people fairly
Reliability & Safety AI should perform reliably without causing harm
Privacy & Security Protect user data and privacy
Inclusiveness AI should empower everyone
Transparency AI decisions should be understandable
Accountability Humans should be accountable for AI systems

9.2 Content Safety for Generative AI

Azure AI Content Safety provides input/output filtering:

User Input → Input Filter → LLM → Output Filter → Response
               │                      │
          Detect:                 Detect:
          - Hate                 - Hate
          - Violence             - Violence
          - Sexual               - Sexual
          - Self-harm            - Self-harm
          - Jailbreak            - Misinformation

Four severity levels: Safe → Low → Medium → High

9.3 Evaluating Generative AI Applications

Metric Evaluates
Groundedness Is the answer based on provided context?
Relevance Is the answer relevant to the question?
Coherence Is the answer logically coherent?
Fluency Is the language natural and fluent?
Similarity How similar to the reference answer?
F1 Score Token overlap with reference answer

Evaluation methods: Manual scoring, AI-assisted (LLM-as-judge), Automated metrics via Prompt Flow evaluation flows


10. Complete Knowledge Map

Level 7: Responsible AI & Production
  │  Responsible AI Principles → Content Safety → Evaluation Metrics
  │
Level 6: Multi-Agent & Advanced Orchestration
  │  Semantic Kernel → Multi-Agent Chat → A2A Protocol → Agent Workflows
  │
Level 5: AI Agent Development
  │  Foundry Agent Service → Custom Tools → MCP → Foundry IQ → M365 Integration
  │
Level 4: Advanced Generative AI
  │  RAG → Fine-tuning → AI Search → Content Understanding
  │
Level 3: Generative AI Fundamentals
  │  Model Catalog → Foundry SDK → Prompt Flow → Multimodal Apps
  │
Level 2: Custom AI Models
  │  CLU → Q&A → Text Classification → Custom NER → Custom Vision → Doc Intelligence
  │
Level 1: Pre-built AI Services
  │  Text Analytics → Translation → Speech → Image Analysis → OCR → Face → Video → Doc Intelligence
  │
Level 0: Foundation
    Azure AI Landscape → Microsoft Foundry → Auth & Security → Dev Environment

8. 参考资料 (References)