Today, we’re announcing Ego4D, a long-term project by Facebook AI that aims to solve research challenges around egocentric perception: the ability for AI to understand and interact with the world like we do, from a first-person perspective. AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the center of action. AI that understands the world from this point of view could unlock a new era of immersive experiences, as devices like augmented reality (AR) glasses and virtual reality (VR) headsets become as useful in everyday life as smartphones.
For this project, we brought together a consortium of 13 universities and labs across nine countries, who collected more than 2,200 hours of first-person video in the wild, featuring over 700 participants going about their daily lives. This greatly increases the amount of egocentric data publicly available to the research community, as this data set is 20X greater than any other in terms of hours of footage.
We also developed five benchmark challenges for developing smarter, more useful AI assistants including:
- Episodic memory: What happened when? (e.g., “Where did I leave my keys?”)
- Forecasting: What am I likely to do next? (e.g., “Wait, you’ve already added salt to this recipe”)
- Hand and object manipulation: What am I doing? (e.g., “Teach me how to play the drums”)
- Audio-visual diarization: Who said what when? (e.g., “What was the main topic during class?”)
- Social interaction: Who is interacting with whom? (e.g., “Help me better hear the person talking to me at this noisy restaurant”)
Learn more about our AI research and developments.