Audit procedures involve complex datasets across tabular, textual, and visual formats, but existing Robotic Process Automation (RPA) frameworks in auditing literature lack the flexibility to handle diverse procedures efficiently. This paper explores the feasibility of a collaborative AI-based multimodal auditing system that integrates foundation models into RPA to automate audit processes. Experiments from different preset scenarios demonstrate the latest publicly available foundation models have the potential to support such a system. In addition, the study demonstrates the importance of including non-routine audit procedures in RPA. The paper further introduces key terminologies related to generative AI to help accounting researchers better understand emerging technologies.