图片到点（PITA）：预测食物图像的相对成分量

论文标题

图片到点（PITA）：预测食物图像的相对成分量

Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

论文作者

Li, Jiatong, Han, Fangda, Guerrero, Ricardo, Pavlovic, Vladimir

论文摘要

对当今食物消费对健康和生活方式的影响的认识提高了，这引起了新的数据驱动食品分析系统。尽管这些系统可能会识别成分，但通常会忽略对餐食中其量的详细分析，这对于估计正确的营养至关重要。在本文中，我们研究了从食物图像中预测每种成分的相对量的新颖而充满挑战的问题。我们提出了PITA，即图片到漫游的深度学习架构，以解决问题。更具体地说，我们预测使用域驱动的瓦斯汀（Waserstein）损失的成分量从图像到型跨模式的嵌入学学会了使食物数据的两种视图对齐。从互联网收集的配方数据集上进行的实验显示，该模型会产生有希望的结果，并改善了这项具有挑战性的任务的基准。我们的系统和数据的演示可用：foodai.cs.rutgers.edu。

Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. More specifically, we predict the ingredient amounts using a domain-driven Wasserstein loss from image-to-recipe cross-modal embeddings learned to align the two views of food data. Experiments on a dataset of recipes collected from the Internet show the model generates promising results and improves the baselines on this challenging task. A demo of our system and our data is availableat: foodai.cs.rutgers.edu.

下载PDF全文

下载文献需遵守相关版权规定

论文标题