In the following article, Hassan Ragab will be talking about his findings over the past few months while using Midjourney and Stable Diffusion in exploring the space between art and architecture.
As we enter the year of what many people are expecting to be the year where we finally feel the implications of how much artificial intelligence has been developed, Hassan believes it’s worthwhile to take a look back and reflect on the past insane few months; during which an enormous number of images, concepts, practices, and apps are founding the very roots of what might be a new creative paradigm in the history of art, architecture, and design.
While OpenAI was arguably the first to create a proper and coherent AI text-to-image generator (DALL-E-2), Both Midjourney and Stable Diffusion leaped ahead towards the end of 2022. Since their beta release last summer, both platforms have dispatched several updates which not only changed how these tools work but rather -and most importantly- have influenced the users to create what they are creating. This might have huge repercussions on how we would perceive both our physical and digital future.
These generators are not meant for architecture and design
Although Midjourney and Stable Diffusion are attracting thousands of architects almost every day, architects are considered only a small subset of the overall Midjourny users. This is probably due to the fact MJ and SD are purely artistic mediums. These AI models are trained on artistic datasets and aesthetics first and foremost. Nevertheless, many architects and designers have been using these tools extensively to create new concepts and visualizations. Each user tries to use inputs from their own preferences and backgrounds.
It’s a two-edged weapon really: These tools are currently imposing limitations and restrictions on the architects; not only by restricting them from visualizing their ideas through their favorite medium (3d models) but also sometimes the limitation of the architectural vocabulary itself which is currently not so easy to be conceived. On the other hand, it is an awesome opportunity for architects to get out of their shells and integrate their ideas with an ocean of different ideas and concepts which hardly existed in contemporary architectural practice.
Challenges with working in historical contexts
AI models have biases. It is a fact that he can’t get over it, yet he is learning to live with it. Hassan Ragab first noticed this during my early experiments with Islamic and Pharaonic architecture using Midjourney, DALL-E-2, and Stable diffusion last July. There were ways to overcome these challenges through the mystic ambiguity of the resulting images which lacked photorealism (yet were extremely beautiful). But now the results with historical context are hardly that great. Although the Midjourney V4 is great in giving you strict results to your prompts using only a few iterations, the integration between different concepts with historical architecture became extremely hard with often blurry images with fuzzy details (Although the V4 has a better recognition of some historical place which was completely absent in the older models; yet still the output results are in no comparison when working with more contemporary western architecture).
Amazing qualities and prompt responses; Fewer surprises and variations
In the pursuit of giving users more control over their results; The developers of Midjourney have been developing their models to create the most possible photorealistic results while trying to stay as true to the prompts. This has attracted even more users to use MJ as it became even easier to use (not to mention that the results are astonishing compared to the previous models’ outputs). However, it seems that having photorealistic results comes as a tradeoff with ambiguity (and thus “creativity”). Even the variations in V4 will hardly vary from the first seed (there are tools to overcome that like –chaos and the Remix tool; still, they don’t compensate for the older model’s interesting variations in my opinion).
Stable diffusion and the promise of a truly democratic medium
According to Hassan Ragab, while Midjourney is arguably the favorite text-to-image generator for many users. Stable diffusion’s open-source model has given access to many enthusiasts to incorporate SD in various tools. It’s also free to use on your own machine. While SD lacks the mobility, speed, user-friendliness, and beautiful aesthetics that Midjourney has; SD has way more control over the outputs while offering many tools that MJ lacks (inpainting/outpainting, animations, design software plugins) and more.
While we are still exploring the Ai potential to impact architecture and design. It’s becoming clearer every day that these tools will have a big impact on our practice in the very near future. We are already witnessing early trials of Text to 3d, text-to-simple BIM models and videos, and more. One could even argue that these tools are already driving designers toward new unprecedented aesthetics.
Especially that these tools are being developed almost every hour, which makes it harder to study and analyze. Users from different backgrounds have shown great adaptability in using these tools; harnessing these tools’ capabilities to empower the designers’ professions. It’s of utmost importance to keep on having an open mind and cultivate a healthy dialog regarding the usage of new technologies in what we do; while understanding that these tools are too powerful to be dealt with without great examinations and realistic expectations. We should try to embrace these unstoppable technologies in our processes while understanding the flags and the consequences of either neglect or blind trust. And finally, to understand that the communities which embrace these new technologies will be the ones that will establish the core visual principles of our future digital and physical realities.