TL; NR

  • 轻量, remote调用(使用本地chrome) => puppeteer-core
  • 将chromium整合进安装包(需要asar打包时排除chromium), macOS下验证失败 => puppeteer

这个问题在我想在electron应用中使用puppeteer爬虫时出现,具体问题是如何引用正确的chromium/chrome位置,尤其是在打包后的环境下。

在开始之前,首先需要的是配置puppeteer.launch()函数启动时的executablePath参数,一般情况下使用puppeteer.executablePath()获取默认引用的可执行chrome位置即可:

import puppeteer from 'puppeteer'

const PresetPage = async() => {
    const browser = await puppeteer.launch(Object.assign(BrowserConfig.LaunchOption, {
        executablePath: puppeteer.executablePath()
    }))
    ...
}

其中:

console.log(puppeteer.executablePath())
// /Users/yrq/Desktop/project/sprite-garden/node_modules/puppeteer/.local-chromium/mac-599821/chrome-mac/Chromium.app/Contents/MacOS/Chromium

可以看到会使用安装在本地依赖中的chromium。在开发环境下是没有问题,但在electron应用打包后无法获取到chromium的正确路径导致puppeteer无法运行。

为了解决这个问题,尝试使用下面两种方法。

使用puppeteer-core远程调用chrome

已验证

loukaspd/puppeteer-electron-quickstart中发现的方法。

puppeteer1.7.0版本后发布了puppeteer-core,可看做是轻量版的puppeteer,可以用它远程调用一个已安装的浏览器。

yarn add puppeteer-core

它们之间的不同:

  1. puppeteer-core在安装时不会自动下载Chromium
  2. puppeteer-core会忽略所有PUPPETEER_*环境变量
import puppeteer from 'puppeteer-core'

const PresetPage = async() => {
    const browser = await puppeteer.launch(Object.assign(BrowserConfig.LaunchOption, {
        executablePath: this.chromePath
    }))
    ...
}

关于chromePath的值从下面的方法中获得: 配置文件 + 默认安装路径

  1. 在设置文件中寻找chrome路径

    ...
    getSavedPath() {
        const settingsPath = this._filePaths.settingsPath();
        return new Promise((resolve, reject) => {
            if (!fs.existsSync(settingsPath)) {
                resolve(undefined);
                return;
            }
            fs.readFile(settingsPath, "utf8", (err, fileContent) => {
                if (err) {
                    console.log(err)
                    reject();
                    return;
                }
                resolve(fileContent);
            });
        })
    }
    
    settingsPath() {
        return path.join(this.appFolderPath(), 'settings.json');
    }
    
    appFolderPath() {
        const documentsPath = path.join(os.homedir(), "Documents");
    
        return path.join(documentsPath, this.appFolderName);
    }
    ...
    
  2. 在默认安装路径寻找chrome

    根据系统类型返回结果:

    const getDefaultOsPath = () => {
        if (process.platform === 'win32') {
            return 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe'
        } else {
            return '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
        }
    }
    this.chromePath = getDefaultOsPath()
    

若在两种途径中均未找到,则初始化会失败。

使用内置的chromium

macOS下验证失败,可跳过

  1. 转换puppeteer.executablePath()得到的路径

    使用正则替换chromium的路径:

    function getChromiumExecPath() {
        return puppeteer.executablePath().replace('app.asar', 'app.asar.unpacked');
    }
    
    export function createBrowser(options = {}) {
        return puppeteer.launch({
            ...options,
            executablePath: getChromiumExecPath()
        });
    }
    
  2. 修改package.json中的构建配置

    不将puppeteer下载的chromium打包进asar包中:

    ...
    "build": {
        "asar": true,
        "asarUnpack": "node_modules/puppeteer/.local-chromium/**/*",
        ...
    }
    ...
    

这种方法在我的项目中(macOS系统)没有成功,在puppeteer.launch()时报了如下错误:

Failed to launch chrome!
dlopen /Users/yrq/Desktop/project/sprite-garden/build/mac/sprite-garden.app/Contents/Resources/app.asar.unpacked/node_modules/puppeteer/.local-chromium/mac-609904/chrome-mac/Chromium.app/Contents/MacOS/../Versions/72.0.3617.0/Chromium Framework.framework/Chromium Framework: dlopen(/Users/yrq/Desktop/project/sprite-garden/build/mac/sprite-garden.app/Contents/Resources/app.asar.unpacked/node_modules/puppeteer/.local-chromium/mac-609904/chrome-mac/Chromium.app/Contents/MacOS/../Versions/72.0.3617.0/Chromium Framework.framework/Chromium Framework, 261): image not found

TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

类似的错误源于系统的差异性,目前无法解决

参考