OpenAI’s Shap-E model creates 3D objects from text or images
Recently we’ve seen AI models creating detailed text-to-video files or running a chatbot on your phone. Now OpenAI, the company behind ChatGPT, has introduced Shap-E, a model that generates 3D objects that you can open in Microsoft Paint 3D or even convert to an STL file that you can output on one of the best 3D printers can.
The Shap-E model is available for free on GitHub (opens in new tab) and it runs locally on your PC. Once all files and models are downloaded, there is no need to ping the internet. And best of all, it doesn’t require an OpenAI API key, so you won’t be charged for using it.
It’s quite a challenge to actually get Shap-E to work. OpenAI offers almost no instructions, it just asks you to use the Python pip command to install it. However, the company doesn’t mention the dependencies you need to make it work and that many of the latest versions of it just don’t work. I’ve spent over 8 hours getting this to work and I’ll share what worked for me below.
Once I finally got Shap-E installed, I found that by default it’s accessed through the Jupyter Notebook, which allows you to view and run the example code in small chunks to see what it does. There are three sample notebooks, which are text-to-3d (using a text prompt), image-to-3d (converting a 2D image into a 3D object), and encode_model, which uses an existing 3D model captures and uses, demonstrate Blender (which must be installed) to convert it into something else and re-render. I tested the first two of these as the third (using Blender with existing 3D objects) was beyond my abilities.
This is what Shap-E Text-to-3D looks like
Like so many AI models we review these days, the Shap-E is packed with potential, but current performance is mediocre at best. I tried the text to video with a few different prompts. In most cases I got the objects I wanted, but they were low resolution and missing important details.
When I used the sample_text_to_3d notebook, I got two types of output: color animated GIFs that displayed in my browser, and monochrome PLY files that I could later open in a program like Paint 3D. The animated GIFs always looked way better than the PLY files.
The default “a shark” prompt looked decent as an animated GIF, but when I opened the PLY in Paint 3D it seemed to be missing. By default, the notebook gives you four animated GIFs that are 64×64 in size, but I modified the code to increase the resolution to 256×256, which outputs as a single GIF (since all four GIFs look the same).
When I asked for something OpenAI had as one of their examples, “a plane that looks like a banana”, I got a pretty good GIF, especially when I upped the resolution to 256. However, the PLY file showed a lot of holes in the wings.
When I asked for one Minecraft Creeper, I got something, a GIF that was colored real green and black and a PLY that was the basic shape of a creeper. However, true Minecraft fans wouldn’t be happy with it and it was too messy a shape to 3D print (if I had converted it to an STL).
Shape-E image to 3D object
I’ve also tried the Image-to-3D script, which can convert an existing 2D image file into a 3D PLY file object. A sample illustration of a corgi dog became a decent low-res object that was output as a rotating, animated GIF with less detail. Below is the original image on the left and the GIF on the right. You can see that the eyes seem to be missing.
By changing the code I was also able to output a PLY 3D file that I could open in Paint 3D. This is what it looked like.
I also tried feeding the Image-to-3D script some of my own images, including a photo of an SSD that looked broken and a transparent PNG of the Tom’s Hardware logo that didn’t look much better.
However, it’s likely I’d get better results if I had a 2D PNG that looked a bit more 3D-ish (like the corgi).
Performance of Shap-E
Whether I was processing text or image in 3D, Shap-E required a lot of system resources. On my home desktop with an RTX 3080 GPU and a Ryzen 9 5900X CPU, it took about five minutes to complete a render. It took two to three minutes on an Asus ROG Strix Scar 18 with an RTX 4090 laptop GPU and an Intel Core i9-13980HX.
However, when I tried to do text-to-3D on my old laptop with an 8th gen Intel U-series CPU and integrated graphics, after an hour, only 3 percent of a render was complete. In short, if you want to use Shap-E, make sure you have an Nvidia GPU (Shap-E doesn’t support other GPU brands. The options are CUDA and CPU.). Otherwise it just takes too long.
I should note that the first time you run any of the scripts, they need to download the models, which are 2-3GB in size and can take several minutes to transfer.
How to install and run Shap-E on a PC
OpenAI posted a Shap-E repository on GitHub, along with some instructions on how to run it. I tried to install and run the software on Windows by creating a dedicated Python environment using Miniconda. However, I kept running into problems, particularly not being able to get Pytorch3D, a required library, to install.
However, when I decided to use WSL2 (Windows Subsytem for Linux), I was able to get it working with few problems. Therefore, the instructions below will work either on native Linux or in WSL2 on Windows. I tested them in WSL2.
1. Install Miniconda or Anaconda on Linux if you don’t already have it. A download and instructions can be found on the Conda website (opens in new tab).
2. Create a conda environment called shap-e with Python 3.9 installed (other versions of Python may work).
conda create -n shap-e python=3.9
3. Activate the shape-e environment.
conda activate shap-e
4. Install Python. If you have an Nvidia graphics card, use this command.
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
If you don’t have an Nvidia card, you’ll need to do a CPU-based install. Installation is quick, but processing the actual 3D generation with the CPU was extremely slow in my experience.
conda install pytorch torchvision torchaudio cpuonly -c pytorch
5. Build Python. This is the area where it took me hours and hours to find a combination that worked.
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
If you get a cuda error, try running it sudo apt install nvidia-cuda-dev and then repeat the process.
6. Install Jupyter Notebook with Conda.
conda install -c anaconda jupyter
7. Clone the Shape-e repo
git clone https://github.com/openai/shap-e
Git creates a shape-e folder below the folder you cloned it from.
8th. Enter the shape-e foldaround Run the install with pip
cd shap-e
pip install -e .
9. Start a Jupyter notebook.
jupyter notebook
10 Navigate to the localhost URL that the software will show you. It will be http://localhost:8888?token= and a token. You will see a directory with folders and files.
11. Navigate to shape-e/examples And double click sample_text_to_3d.ipynb.
A notebook will open with different sections of code.
12. Mark each section And Click the Run buttonwait for it to complete before proceeding to the next section.
This process will take a while the first time through as several large models are downloaded to your local drive. When everything is ready, you should see four 3D models of a shark in your browser. There are also four .ply files in the samples folder and you can open these in 3D image editors like Paint 3D. You can also convert them to STL files using an online converter (opens in new tab).
If you want to change the prompt and try again. Update your browser and change “a shark” to something else in the command prompt section. Also, changing the size from 64 to a higher number will give you a higher resolution image.
13. Double-click sample_image_to_3d.ipynb in the samples folder so you can try the Image-to-3D script.
14 Mark each section And Click Run.
By default, you end up with four small pictures of corgis.
However, I recommend adding the following code to the last notebook section so that both PLY files and animated GIFs are output.
from shap_e.util.notebooks import decode_latent_mesh
for i, latent in enumerate(latents):
with open(f'example_mesh_{i}.ply', 'wb') as f:
decode_latent_mesh(xm, latent).tri_mesh().write_ply(f)
15 Change the image save location in section 3 to change the picture. Also, I recommend changing the batch_size to 1 so that only one image is created. Resizing to 128 or 256 will result in a higher resolution image.
16 Create the following Python script And Save it as text-to-3d.py or another name. It allows you to generate PLY files based on text prompts on the command line.
import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))
batch_size = 1
guidance_scale = 15.0
prompt = input("Enter prompt: ")
filename = prompt.replace(" ","_")
latents = sample_latents(
batch_size=batch_size,
model=model,
diffusion=diffusion,
guidance_scale=guidance_scale,
model_kwargs=dict(texts=[prompt] * batch_size),
progress=True,
clip_denoised=True,
use_fp16=True,
use_karras=True,
karras_steps=64,
sigma_min=1e-3,
sigma_max=160,
s_churn=0,
)
render_mode="nerf" # you can change this to 'stf'
size = 64 # this is the size of the renders; higher values take longer to render.
from shap_e.util.notebooks import decode_latent_mesh
for i, latent in enumerate(latents):
with open(f'{filename}_{i}.ply', 'wb') as f:
decode_latent_mesh(xm, latent).tri_mesh().write_ply(f)
17 Run python text-to-3d.py And Enter your command prompt when the program asks for it.
This will give you a PLY output but not a GIF. If you are familiar with Python, you can modify the script to do more with it.