【實戰乾貨】AI大模型工程應用於車聯網場景的實戰總結

阿里妹導讀

本文介紹了影像生成技術在AIGC領域的發展歷程、關鍵技術和當前趨勢，以及這些技術如何應用於新能源汽車行業的車聯網服務中。

一、前言

1.1 AIGC 發展背景

影像作為人工智慧內容生成的一種模態，一直在AIGC領域中扮演著重要角色，由於影像生成應用的廣泛性和實用性，使其受到學術界和產業界相當多的關注。近年來，影像生成技術也取得了很多關鍵性突破，從經典的GAN技術到目前主流的擴散模型，以及在此基礎上不斷迭代出效能更強、生成效果更好的演算法和模型，極大拓展了影像生成技術的應用領域和發展前景。而在進行商業化落地時，生成速度和穩定性的提升、可控性和多樣性的增強，以及資料隱私和智慧財產權等問題，也需要在影像生成向各行各業滲透的過程中進行解決和探索。在實際應用中，模型的效果表現主要體現在生成影像的質量和影像的多樣性，其在平面設計、遊戲製作、動畫製作等領域均有廣泛的應用，另外，在醫學影像合成與分析，化合物合成和藥物發現等方面，影像生成也具有很大的應用潛力。

根據影像構成的型別，影像按照顏色和灰度的多少可以分為二值圖、灰度圖、索引圖和RGB圖，影像生成模型可實現不同影像型別的轉換。
在實際應用中，模型的效果表現主要體現在生成影像的質量和影像的多樣性，其在平面設計、遊戲製作、動畫製作等領域均有廣泛的應用，另外，在醫學影像合成與分析，化合物合成和藥物發現等方面，影像生成也具有很大的應用潛力。

1.2 技術發展的關鍵階段

作為計算機視覺領域的重要組成部分，影像生成的技術發展大致經歷了三個關鍵階段：

GAN生成階段：

生成對抗網路（GAN）是上一代主流影像生成模型，GAN透過生成器和判別器進行博弈訓練來不斷提升生成能力和鑑別能力，使生成式網路的資料愈發趨近真實資料，從而達到生成逼真影像的目的。但在發展過程中，GAN也存在穩定性較差、生成影像缺乏多樣性、模式崩潰等問題。

自迴歸生成階段：

自迴歸模型進行影像生成的靈感得益於NLP預訓練方式的成功經驗，利用Transformer結構中的自注意力機制能夠最佳化GAN的訓練方式，提高了模型的穩定性和生成影像的合理性，但基於自迴歸模型的影像生成在推理速度和訓練成本方面的問題，使其實際應用受限。

擴散模型生成階段：

對於前代模型在效能方面的侷限性，擴散模型（Diffusion Model）已經使這些問題得到解決，其在訓練穩定性和結果準確性的效果提升明顯，因此迅速取代了GAN的應用。而對於產業應用中的大量跨模態影像生成需求，則需要結合CLIP進行，CLIP基於文字-影像對的訓練方式能夠建立跨模態的連線，顯著提升生成影像的速度和質量。

目前，業內主流且生成效果優秀的影像生成產品主要是基於擴散模型和CLIP實現的。

1.3 主流模型實現原理及優缺點

擴散模型（Diffusion Model）

1、實現原理：擴散模型是透過定義一個擴散步驟的馬爾可夫鏈，透過連續向資料新增隨機噪聲，直到得到一個純高斯噪聲資料，然後再學習逆擴散的過程，經過反向降噪推斷來生成影像。擴散模型透過系統地擾動資料中的分佈，再恢復資料分佈，使整個過程呈現一種逐步最佳化的性質，確保了模型的穩定性和可控度。

2、模型優缺點：擴散模型的優點在於其基於馬爾可夫鏈的正向及反向擴散過程能夠更加準確地還原真實資料，對影像細節的保持能力更強，因此生成影像的寫實性更好。特別是在影像補全修復、分子圖生成等應用上擴散模型都能取得很好的效果。但由於計算步驟的繁雜，相應地，擴散模型也存在取樣速度較慢的問題，以及對資料型別的泛化能力較弱。

CLIP（Contrastive Language-image Pre-training）

1、原理：CLIP是基於對比學習的文字-影像跨模態預訓練模型，其訓練原理是透過編碼器分別對文字和影像進行特徵提取，將文字和影像對映到同一表示空間，透過文字-影像對的相似度和差異度計算來訓練模型，從而能夠根據給定的文字生成符合描述的影像。

2、模型優缺點：CLIP模型的優點在於其基於多模態的對比學習和預訓練的過程，能夠將文字特徵和影像特徵進行對齊，因此無需事先標註資料，使其在零樣本影像文字分類任務中表現出色；同時對文字描述和影像風格的把握更加準確，並能夠在不改變準確性的同時對影像的非必要細節進行變化，因此在生成影像的多樣性方面表現更佳。

由於CLIP本質上屬於一種影像分類模型，因此對於複雜和抽象場景的表現存在侷限性，例如可能在包含時間序列資料和需要推理計算的任務中生成影像的效果不佳。另外，CLIP的訓練效果依賴大規模的文字-影像對資料集，對訓練資源的消耗比較大。

1.4 當前AIGC 行業發展趨勢

基於擴散模型和CLIP模型的基礎架構，衍生出一些列可悲開發者使用的工具平臺，加速AIGC生圖的生產力和商業化程序。目前行業主流AIGC工作流工具，基本就是ComfyUI和Web UI兩個。從Midjourney和SD的官方社群文件，可以檢視到下列兩個工具的比較：

二、應用場景

對於新能源汽車行業，車聯網的互動能力和趣味性，會成為行業內競爭堡壘。而且新能源企業面向車主的服務方式會更貼近於網際網路企業，內容互動的引流已經變成各家車企重點攻堅的方向。典型場景如下：

1）車主節假日中短途遊之後，基於車聯網和車載晶片，會記錄如下旅程資訊：

2）基於旅程資訊，期望大模型在汽車內容社群，自動生成如下的風格化素材，並推送

同時為了最大化的c端引流，車企對AIGC的能力提出了極高的要求，尤其注重生圖細節的下列部分：

生圖的風格化，是否能完全遵從指令
汽車logo和邊緣的色差
背景車型無違和拼裝等

三、實踐落地

3.1 AIGC生圖工具選型

總體來看，面向to c場景的生產環境使用，ComfyUI的學習曲線雖然較陡，但相較於其他的Stable Diffusion runtime有以下優勢：

在SDXL模型推理上相較於其他 UI 有很大的效能最佳化，圖片生成速度相較於 webui 有 10%~25% 的提升。
高度自定義，可以讓使用者更加精準和細粒度控制整個圖片生成過程，深度使用者可以透過 ComfyUI 更簡單地生成更好的圖片。
Workflow 以 json 或者圖片的形式更易於分享傳播，可以更好地提高效率。
開發者友好，Workflow 的 API 呼叫可以透過簡單載入相同的 API 格式 json 檔案，以任何語言來呼叫生成圖片。

當然，還有一個關鍵點是，PAI EAS基於場景化部署，對於ComfyUI的版本選擇更多樣，更便捷。

ComfyUI的工作流配置頁面

3.2 業務流程確認

往往容易被忽視的第1步，就是基於業務需求設計完整的工作流。

因為aigc最後的效果是要求比較高，所以為了實現目標，往往需要大語言模型，大視覺模型，NLP，VAE，CLIP等一系列模型組合才能達到效果，而且ComfyUI的生圖時間普遍較長，所以節點的編排和選擇，序列還是並行，哪個節點加圖層，都很有講究。

3.3 自定義節點開發

完成工作流的設計之後，下一步就是基於開源社群，確認可被使用的開發節點，以及需要後續自開發的節點。

目前透過github所能獲取的標準ComfyUI節點，都是開源模型節點，所以完成上邊鏈路所需要的文生文和文生圖，就需要對通義千問和通義萬相節點進行定製化編寫後，才能掛載到ComfyUI上。下邊是基於ComfyUI社群介紹，整理的《ComfyUI自定義節點開發規範》：

ComfyUI自定義節點開發規範

classExample:""" A example node Class methods ------------- INPUT_TYPES (dict): Tell the main program input parameters of nodes. IS_CHANGED: optional method to control when the node is re executed. Attributes ---------- RETURN_TYPES (`tuple`): The type of each element in the output tuple. RETURN_NAMES (`tuple`): Optional: The name of each output in the output tuple. FUNCTION (`str`):

        The name of the entry-point method. For example, if `FUNCTION = "execute"` then it will run Example().execute()

OUTPUT_NODE ([`bool`]): If this node is an output node that outputs a result/image from the graph. The SaveImage node is an example.

        The backend iterates on these output nodes and tries to execute all their parents if their parent graph is properly connected.

Assumed to be False if not present. CATEGORY (`str`): The category the node should appear in the UI. DEPRECATED (`bool`): Indicates whether the node is deprecated. Deprecated nodes are hidden by default in the UI, but remain functional in existing workflows that use them. EXPERIMENTAL (`bool`):

        Indicates whether the node is experimental. Experimental nodes are marked as such in the UI and may be subject to

significant changes or removal in future versions. Use with caution in production workflows. execute(s) -> tuple || None: The entry point method. The name of this method must be the same as the value of property `FUNCTION`.

        For example, if `FUNCTION = "execute"` then this method's name must be `execute`, if `FUNCTION = "foo"` then it must be `foo`.

""" def __init__(self): pass @classmethod def INPUT_TYPES(s): """ Return a dictionary which contains config for all input fields.

            Some types(string): "MODEL", "VAE", "CLIP", "CONDITIONING", "LATENT", "IMAGE", "INT", "STRING", "FLOAT".

Input types "INT", "STRING" or "FLOAT" are special values for fields on the node. The type can be a listfor selection. Returns: `dict`:

                - Key input_fields_group(`string`): Can be either required, hidden or optional. A node class must have property `required`

- Value input_fields(`dict`): Contains input fields config: * Key field_name(`string`): Name of a entry-point method's argument * Value field_config(`tuple`): + First value is a string indicate the type of field or a listfor selection. + Second value is a config for type "INT", "STRING" or "FLOAT". """return{"required": {"image": ("IMAGE",),"int_field": ("INT", {"default": 0, "min": 0, #Minimum value"max": 4096, #Maximum value"step": 64, #Slider's step"display": "number", # Cosmetic only: display as "number"or"slider""lazy": True # Will only be evaluated if check_lazy_status requires it }),"float_field": ("FLOAT", {"default": 1.0,"min": 0.0,"max": 10.0,"step": 0.01,

"round": 0.001, #The value representing the precision to round to, will be set to the step value by default. Can be set to False to disable rounding.

"display": "number","lazy": True }),"print_to_screen": (["enable", "disable"],),"string_field": ("STRING", {"multiline": False, #True if you want the field to look like the one on the ClipTextEncode node"default": "Hello World!","lazy": True }), }, } RETURN_TYPES = ("IMAGE",) #RETURN_NAMES = ("image_output_name",) FUNCTION = "test" #OUTPUT_NODE = False CATEGORY = "Example" def check_lazy_status(self, image, string_field, int_field, float_field, print_to_screen):""" Return a list of input names that need to be evaluated. This function will be called if there are any lazy inputs which have not yet been evaluated. As long as you return at least one field which has not yet been evaluated (and more exist), this function will be called again once the value of the requested field is available. Any evaluated inputs will be passed as arguments to this function. Any unevaluated inputs will have the value None. """if print_to_screen == "enable":return ["int_field", "float_field", "string_field"]else:return [] def test(self, image, string_field, int_field, float_field, print_to_screen):if print_to_screen == "enable": print(f"""Your input contains: string_field aka input text: {string_field} int_field: {int_field} float_field: {float_field} """)#do some processing on the image, in this example I just invert it image = 1.0 - imagereturn (image,)""" The node will always be re executed if any of the inputs change but this method can be used to force the node to execute again even when the inputs don't change.

        You can make this node return a number or a string. This value will be compared to the one returned the last time the node was

executed, if it is different the node will be executed again.

        This method is used in the core repo for the LoadImage node where they return the image hash as a string, if the image hash

changes between executions the LoadImage node is executed again. """ #@classmethod#def IS_CHANGED(s, image, string_field, int_field, float_field, print_to_screen):# return ""# Set the web directory, any .js file in that directory will be loaded by the frontend as a frontend extension# WEB_DIRECTORY = "./somejs"# Add custom API routes, using routerfrom aiohttp import webfrom server import PromptServer@PromptServer.instance.routes.get("/hello")async def get_hello(request):return web.json_response("hello")# A dictionary that contains all nodes you want to export with their names# NOTE: names should be globally uniqueNODE_CLASS_MAPPINGS = {"Example": Example}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {"Example": "Example Node"}

按照開發規範，透過呼叫百鍊介面，對qwen-max和wanx-v2的進行節點封裝：

qwen-max的plugin 節點from http import HTTPStatusimport dashscopeimport jsonclass 旅行文字生成: def __init__(self):# temp dashscope.api_key = "" @classmethod def INPUT_TYPES(s):return {"required": {

"system_prompt": ("STRING", {"default": """請根據我輸入的中文描述，生成符合主題的完整提示詞。生成後的內容服務於一個繪畫AI，它只能理解具象的提示詞而非抽象的概念。請嚴格遵守以下規則，規則如下：

#內容根據文字生成一張與風景相關的優美的畫面。#風格真實、高畫質、寫實#action

1.提取途徑城市之一，根據此地點搜尋當地最著名的景點或建築，例如：上海，可提取上海東方明珠

2.提取有關天氣的詞彙，會決定於整個畫面的色調3.提取有關心情、駕駛體驗的描述，與天氣同時決定畫面的色調4.提取日期，判斷季節，作為畫面的主要色調參考 """, "multiline": True }),"query_prompt": ("STRING", {"default": """- 使用者標記emoji：出遊- 使用者文字：新司機的五一齣遊！- 出行時間：2024/5/2 下午10:38-2024/5/5 下午6:57- 總駕駛時長：14小時28分鐘- 公里數：645.4km- 起點：上海市黃浦區中山南路1891號-1893號- 起點天氣：晴天- 終點：上海市閔行區申長路688號- 終點天氣：多雲- 途徑城市：湖州市無錫市常州市- 組隊資訊：歐陽開心的隊伍- 車輛資訊：黑色一代 """, "multiline": True}) }, } RETURN_TYPES = ("STRING",) FUNCTION = "生成繪畫提示詞" CATEGORY = "旅行文字生成" def 生成繪畫提示詞(self, system_prompt, query_prompt): messages = [ {'role': 'system', 'content': system_prompt}, {'role': 'user', 'content': query_prompt} ] response = dashscope.Generation.call( model="qwen-max", messages=messages, result_format='message' )if response.status_code == HTTPStatus.OK: # Assuming the response contains the generated prompt in the 'output' field painting_prompt = response.output.choices[0].message.contentelse:

            raise Exception('Request failed: Request id: %s, Status code: %s, error code: %s, error message: %s' % (

response.request_id, response.status_code, response.code, response.message ))return (painting_prompt,)# A dictionary that contains all nodes you want to export with their namesNODE_CLASS_MAPPINGS = {"旅行文字生成": 旅行文字生成}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {"旅行文字生成": "生成旅行本文提示詞"}

萬相2.0的plugin 節點from http import HTTPStatusfrom urllib.parse import urlparse, unquotefrom pathlib import PurePosixPathimport requestsimport dashscopefrom dashscope import ImageSynthesisimport randomclassImageSynthesisNode:""" A node for generating images based on a provided prompt. Class methods ------------- INPUT_TYPES (dict): Define the input parameters of the node. IS_CHANGED: Optional method to control when the node is re-executed. Attributes ---------- RETURN_TYPES (`tuple`): The type of each element in the output tuple. FUNCTION (`str`): The name of the entry-point method. CATEGORY (`str`): The category the node should appear in the UI. """ @classmethod def INPUT_TYPES(s):return{"required": {"prompt": ("STRING", {"default": "", "multiline": True }) }, } RETURN_TYPES = ("STRING",) FUNCTION = "generate_image_url" CATEGORY = "Image Synthesis" def __init__(self): # 設定API金鑰 dashscope.api_key = "" def generate_image_url(self, prompt):

        negative_prompt_str = '(car:1.4), NSFW, nude, naked, porn, (worst quality, low quali-ty:1.4), deformed iris, deformed pupils, (deformed, distorted, disfigured:1.3), cropped, out of frame, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, cloned face, (mu-tated hands and fingers:1.4), disconnected limbs, extra legs, fused fingers, too many fingers, long neck, mutation, mutated, ugly, disgusting, amputa-tion, blurry, jpeg artifacts, watermark, water-marked, text, Signature, sketch'

random_int = random.randint(1,4294967290) rsp = ImageSynthesis.call( model='wanx2-t2i-lite', prompt=prompt, negative_prompt=negative_prompt_str, n=1, size='768*960', extra_input={'seed':random_int} )if rsp.status_code == HTTPStatus.OK: # 獲取生成的圖片URL image_url = rsp.output.results[0].urlelse: raise Exception('Request failed: Status code: %s, code: %s, message: %s' % ( rsp.status_code, rsp.code, rsp.message ))return (image_url,)# A dictionary that contains all nodes you want to export with their namesNODE_CLASS_MAPPINGS = {"ImageSynthesisNode": ImageSynthesisNode}# A dictionary that contains the friendly/humanly readable titles for the nodesNODE_DISPLAY_NAME_MAPPINGS = {"ImageSynthesisNode": "Image Synthesis Node"}# 示例呼叫if __name__ == '__main__':

# prompt = "A beautiful and realistic high-definition landscape scene, featuring the famous landmark of Wuxi, the Turtle Head Isle Park, as it is one of the cities passed through during the journey. The weather transitions from a clear, sunny day in the starting point, Shanghai, to a cloudy sky at the destination, also in Shanghai. This scenic drive, experienced by a new driver on a May Day trip, spans from the evening of May 2, 2024, to the late afternoon of May 5, 2024, covering a total distance of 645.4 kilometers. The mood is joyful and adventurous, with the team named \"OuYang's Happy Team\" enjoying the ride in a black ES. The overall tone of the image reflects the transition from a bright, cheerful start to a more serene, calm atmosphere, with lush greenery and blooming flowers indicating the early summer season. The harmonious blend of natural beauty and man-made structures, along with the changing weather, creates a picturesque and tranquil setting."

    prompt = "A beautiful and realistic high-definition landscape scene, featuring the famous landmark of Wuxi, the Turtle Head Isle Park, as it is one of the cities passed through during the journey. The weather transitions from a clear, sunny day in the starting point, Shanghai, to a cloudy sky at the destination, also in Shanghai. The overall tone of the image reflects the transition from a bright, cheerful start to a more serene, calm atmosphere, with lush greenery and blooming flowers indicating the early summer season. The harmonious blend of natural beauty and man-made structures, along with the changing weather, creates a picturesque and tranquil setting"

node = ImageSynthesisNode() image_url = node.generate_image_url(prompt) print(f"Generated Image URL: {image_url}")

3.4 PAI 服務部署&增加算力選擇

目前對比了幾家主流雲廠商，阿里雲的PAI和AWS 的Bedrock是比較好的支援ComfyUI多版本的部署，同時對於資源掛載的適配也比較好。這裡注意透過PA對ComfyUI部署，會涉及到兩個版本：

標準版：適用於單使用者使用WebUI或使用一個例項呼叫API場景。支援透過WebUI生成影片，也可透過API進行呼叫。請求傳送時，會繞過EAS介面，前端直接將請求傳遞給後端伺服器，所有請求均由同一個後端例項進行處理。
API版：系統自動轉換服務為非同步模式，適用於高併發場景。僅支援透過API進行呼叫。如果需要多臺例項時，建議選用API版。

官方文件從價效比考慮，資源規格推薦使用GU30、A10或T4卡型。系統預設選擇GPU > ml.gu7i.c16m60.1-gu30。

實際測試結果看，建議部署L20卡，生圖速度相比GU30快一些。基於價效比考慮，選擇的是單卡L20，16核128G。

服務配置{"cloud": {"computing": {"instance_type": "ecs.gn8is-2x.8xlarge" },"networking": {"security_group_id": "sg-uf626dg02ts498gqoa2n","vpc_id": "vpc-uf6usys7jvf2p7ugcyq1j","vswitch_id": "vsw-uf6lv36zo7kkzyq9blyc6" } },"containers": [ {"image": "eas-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-eas/comfyui:1.7-beta","port": 8000,"script": "python main.py --listen --port 8000 --data-dir /deta-code-oss" } ],"metadata": {"cpu": 32,"enable_webservice": true,"gpu": 2,"instance": 1,"memory": 256000,"name": "jiashu16" },"name": "jiashu16","options": {"enable_cache": true },"storage": [ {"mount_path": "/deta-code-oss","oss": {"path": "oss://ai4d-k4kulrqkyt37jhz1mv/482832/data-205381316445420758/","readOnly": false },"properties": {"resource_type": "model" } } ]}

3.5 節點和模型掛載

服務部署後，系統會自動在已掛載的OSS或NAS儲存空間中建立以下目錄結構：

/custom_nodes：該目錄用來儲存ComfyUI外掛。編寫之後的qwen-max的plugin 節點和萬相2.0的plugin 節點，需要上傳到本資料夾。
/models：該目錄用來存放模型檔案。
/output：工作流最後的輸出結果的儲存地址。

3.6 基於workflow json的服務介面建設

完成ComfyUI的工作流搭建後【如果對ComfyUI工作流搭建的細節感興趣，歡迎在評論區留言】，主要要開啟開發者模式，並匯出workflow api json，後續就可以進行api呼叫。

工作流workflow api json樣例

工作流workflow api json樣例{"4": {"inputs": {"ckpt_name": "基礎模型XL _xl_1.0.safetensors" },"class_type": "CheckpointLoaderSimple","_meta": {"title": "Checkpoint載入器(簡易)" } },"6": {"inputs": {"text": ["149",0 ],"speak_and_recognation": true,"clip": ["145",1 ] },"class_type": "CLIPTextEncode","_meta": {"title": "CLIP文字編碼器" } },"7": {"inputs": {"text": "*I* *Do* *Not* *Use* *Negative* *Prompts*","speak_and_recognation": true,"clip": ["145",1 ] },"class_type": "CLIPTextEncode","_meta": {"title": "CLIP文字編碼器" } }

3.7 工程架構和穩定性保障