Photographs as well as video frames are two-dimensional projections of three-dimensional real-world scenes. Photorealistic Computer Graphic Imagery (CGI) is usually generated by "rendering" a sophisticated three-dimensional model of a realistic scene on to a virtual camera sensor. Such process is not invertible due to the high amount of data loss. Still both physically captured photographs and synthesized photos (CGI) contain considerable amount of information about the characteristics of the scene itself. Even naïve human viewers are able to infer important information about the scene e.g. shape material illumination distance and actions solely from a single image. This is indeed much more difficult for computer systems. Inverse rendering is one of the hot-topics in computer vision in which the goal is to estimate and model the three-dimensional scene and its illumination automatically given only one or a sparse set of images from the scene. This is by nature a highly under-constraint problem which makes it very complex to solve. Yet the advancements in computation and imaging technologies (e.g. depth sensors and light-field cameras) open new horizons in this field. Inverse rendering makes many interesting applications possible including: creating novel views of the scene/objects re-lighting detecting and altering the object materials in the photographs and augmented reality. This talk provides a brief overview of some of the state-of-the-art inverse rendering techniques and datasets with a strong focus on addressing the complexities of real-world scenarios.