推荐|【6D位姿估计】FoundationPose 跑通demo 训练记录

前言

本文记录在FoundationPose中，跑通基于CAD模型为输入的demo，输出位姿信息，可视化结果。

然后分享NeRF物体重建部分的训练，以及RGBD图为输入的demo。

1、搭建环境

方案1：基于docker镜像（推荐）

首先下载开源代码：https://github.com/NVlabs/FoundationPose

然后执行下面命令，拉取镜像，并构建镜像环境


cd docker/
docker pull wenbowen123/foundationpose && docker tag wenbowen123/foundationpose foundationpose
bash docker/run_container.sh
bash build_all.sh

构建完成后，可以用docker exec 进入镜像容器中。

方案2：基于Conda（比较麻烦）

首先安装 Eigen3


cd $HOME && wget -q https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz && \
tar -xzf eigen-3.4.0.tar.gz && \
cd eigen-3.4.0 && mkdir build && cd build
cmake .. -Wno-dev -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-std=c++14 ..
sudo make install
cd $HOME && rm -rf eigen-3.4.0 eigen-3.4.0.tar.gz

然后参考下面命令，创建conda环境


# create conda environment
create -n foundationpose python=3.9
 
# activate conda environment
conda activate foundationpose
 
# install dependencies
python -m pip install -r requirements.txt
 
# Install NVDiffRast
python -m pip install --quiet --no-cache-dir git+https://github.com/NVlabs/nvdiffrast.git
 
# Kaolin (Optional, needed if running model-free setup)
python -m pip install --quiet --no-cache-dir kaolin==0.15.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.0.0_cu118.html
 
# PyTorch3D
python -m pip install --quiet --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu118_pyt200/download.html
 
# Build extensions
CMAKE_PREFIX_PATH=$CONDA_PREFIX/lib/python3.9/site-packages/pybind11/share/cmake/pybind11 bash build_all_conda.sh

2、基于CAD模型为输入的demo

首先，下载模型权重，里面包括两个文件夹；点击下载（模型权重）

在工程目录中创建weights/，将下面两个文件夹放到里面。

然后，下载测试数据，里面包括两个压缩文件；点击下载（demo数据）

在工程目录中创建demo_data/，加压文件，将下面两个文件放到里面。

运行 run_demo.py，实现CAD模型为输入的demo

python run_demo.py --debug 2

如果是服务器运行，没有可视化的，需要注释两行代码：

if debug>=1:
center_pose = [email protected](to_origin)
vis = draw_posed_3d_box(reader.K, img=color, ob_in_cam=center_pose, bbox=bbox)
vis = draw_xyz_axis(color, ob_in_cam=center_pose, scale=0.1, K=reader.K, thickness=3, transparency=0, is_input_rgb=True)
# cv2.imshow('1', vis[...,::-1])
# cv2.waitKey(1)

然后看到demo_data/mustard0/，里面生成了ob_in_cam、track_vis文件夹

ob_in_cam 是位姿估计的结果，用txt文件存储，示例文件：


6.073544621467590332e-01 -2.560715079307556152e-01 7.520291209220886230e-01 -4.481770694255828857e-01
-7.755840420722961426e-01 -3.960975110530853271e-01 4.915038347244262695e-01 1.187708452343940735e-01
1.720167100429534912e-01 -8.817789554595947266e-01 -4.391765296459197998e-01 8.016449213027954102e-01
0.000000000000000000e+00 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00

track_vis 是可视化结果，能看到多张图片：

3、NeRF物体重建训练

下载训练数据，Linemod和YCB-V两个公开数据集的示例：

点击下载（RGBD参考数据）

示例1：训练Linemod数据集

修改代码bundlesdf/run_nerf.py，修改为use_refined_mask=False，即98行：

mesh = run_one_ob(base_dir=base_dir, cfg=cfg, use_refined_mask=False)

然后执行命令：

python bundlesdf/run_nerf.py --ref_view_dir /DATASET/lm_ref_views --dataset linemod

如果是服务器运行，没有可视化的，需要安装xvfb


sudo apt-get update
sudo apt-get install -y xvfb

然后执行命令：

xvfb-run -s "-screen 0 1024x768x24" python bundlesdf/run_nerf.py --ref_view_dir model_free_ref_views/lm_ref_views --dataset linemod

因为训练NeRF需要渲染的，使用xvfb进行模拟。

能看到打印信息：


bundlesdf/run_nerf.py:61: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
  rgb = imageio.imread(color_file)
[compute_scene_bounds()] compute_scene_bounds_worker start
[compute_scene_bounds()] compute_scene_bounds_worker done
[compute_scene_bounds()] merge pcd
[compute_scene_bounds()] compute_translation_scales done
translation_cvcam=[0.00024226 0.00356217 0.00056694], sc_factor=19.274929219577043
[build_octree()] Octree voxel dilate_radius:1
[__init__()] level:0, vox_pts:torch.Size([1, 3]), corner_pts:torch.Size([8, 3])
[__init__()] level:1, vox_pts:torch.Size([8, 3]), corner_pts:torch.Size([27, 3])
[__init__()] level:2, vox_pts:torch.Size([64, 3]), corner_pts:torch.Size([125, 3])
[draw()] level:2
[draw()] level:2
level 0, resolution: 32
level 1, resolution: 37
level 2, resolution: 43
level 3, resolution: 49
level 4, resolution: 56
level 5, resolution: 64
level 6, resolution: 74
level 7, resolution: 85
level 8, resolution: 98
level 9, resolution: 112
level 10, resolution: 128
level 11, resolution: 148
level 12, resolution: 169
level 13, resolution: 195
level 14, resolution: 223
level 15, resolution: 256
GridEncoder: input_dim=3 n_levels=16 level_dim=2 resolution=32 -> 256 per_level_scale=1.1487 params=(26463840, 2) gridtype=hash align_corners=False
sc_factor 19.274929219577043
translation [0.00024226 0.00356217 0.00056694]
[__init__()] denoise cloud
[__init__()] Denoising rays based on octree cloud
[__init__()] bad_mask#=3
rays torch.Size([128387, 12])
[train()] train progress 0/1001
[train_loop()] Iter: 0, valid_samples: 524161/524288, valid_rays: 2048/2048, loss: 309.0942383, rgb_loss: 0.0216732, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 301.6735840, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 7.2143111, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.1152707,
 
[train()] train progress 100/1001
[train()] train progress 200/1001
[train()] train progress 300/1001
[train()] train progress 400/1001
[train()] train progress 500/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 500, valid_samples: 518554/524288, valid_rays: 2026/2048, loss: 1.0530750, rgb_loss: 0.0009063, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.2142579, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.8360301, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0008409,
 
[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(4536, 3), F:(8986, 3)
[train()] train progress 600/1001
[train()] train progress 700/1001
[train()] train progress 800/1001
[train()] train progress 900/1001
[train()] train progress 1000/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 1000, valid_samples: 519351/524288, valid_rays: 2029/2048, loss: 0.4827633, rgb_loss: 0.0006563, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.0935674, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.3876466, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0001022,
 
[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(5265, 3), F:(10328, 3)
[extract_mesh()] query_pts:torch.Size([42875, 3]), valid:42875
[extract_mesh()] Running Marching Cubes
[extract_mesh()] done V:(5265, 3), F:(10328, 3)
[()] OpenGL_accelerate module loaded
[()] Using accelerated ArrayDatatype
[mesh_texture_from_train_images()] Texture: Texture map computation
project train_images 0/16
project train_images 1/16
project train_images 2/16
project train_images 3/16
project train_images 4/16
project train_images 5/16
project train_images 6/16
project train_images 7/16
project train_images 8/16
project train_images 9/16
project train_images 10/16
project train_images 11/16
project train_images 12/16
project train_images 13/16
project train_images 14/16
project train_images 15/16

重点留意，损失的变化：

[train()] train progress 0/1001
[train_loop()] Iter: 0, valid_samples: 524161/524288, valid_rays: 2048/2048, loss: 309.0942383, rgb_loss: 0.0216732, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 301.6735840, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 7.2143111, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.1152707,

[train()] train progress 100/1001
[train()] train progress 200/1001
[train()] train progress 300/1001
[train()] train progress 400/1001
[train()] train progress 500/1001
Saved checkpoints at model_free_ref_views/lm_ref_views/ob_0000001/nerf/model_latest.pth
[train_loop()] Iter: 500, valid_samples: 518554/524288, valid_rays: 2026/2048, loss: 1.0530750, rgb_loss: 0.0009063, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.2142579, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.8360301, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0008409,

默认训练1000轮，训练也挺快的。

在lm_ref_views/ob_0000001/中生成了nerf文件夹，存放下面文件：

在lm_ref_views/ob_0000001/model中，生成了的model.obj，后续模型推理或demo直接使用它。

示例2：训练YCB-V数据集

python bundlesdf/run_nerf.py --ref_view_dir /DATASET/ycbv/ref_views_16 --dataset ycbv

如果是服务器运行，没有可视化的，需要安装xvfb


sudo apt-get update
sudo apt-get install -y xvfb

然后执行命令：

xvfb-run -s "-screen 0 1024x768x24" python bundlesdf/run_nerf.py --ref_view_dir /DATASET/ycbv/ref_views_16 --dataset ycbv

因为训练NeRF需要渲染的，使用xvfb进行模拟。

4、RGBD图输入demo

这里以Linemod数据集为示例，首先下面测试数据集

点击下载（测试数据）

然后加压文件，存放路径：FoundationPose-main/model_free_ref_views/lm_test_all

官方代码有问题，需要替换两个代码：datareader.py、run_linemod.py

run_linemod.py


# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.
 
 
from Utils import *
import json,uuid,joblib,os,sys
import scipy.spatial as spatial
from multiprocessing import Pool
import multiprocessing
from functools import partial
from itertools import repeat
import itertools
from datareader import *
from estimater import *
code_dir = os.path.dirname(os.path.realpath(__file__))
sys.path.append(f'{code_dir}/mycpp/build')
import yaml
import re
 
 
def get_mask(reader, i_frame, ob_id, detect_type):
  if detect_type=='box':
    mask = reader.get_mask(i_frame, ob_id)
    H,W = mask.shape[:2]
    vs,us = np.where(mask>0)
    umin = us.min()
    umax = us.max()
    vmin = vs.min()
    vmax = vs.max()
    valid = np.zeros((H,W), dtype=bool)
    valid[vmin:vmax,umin:umax] = 1
  elif detect_type=='mask':
    mask = reader.get_mask(i_frame, ob_id)
    if mask is None:
      return None
    valid = mask>0
  elif detect_type=='detected':
    mask = cv2.imread(reader.color_files[i_frame].replace('rgb','mask_cosypose'), -1)
    valid = mask==ob_id
  else:
    raise RuntimeError
  return valid
 
 
 
def run_pose_estimation_worker(reader, i_frames, est:FoundationPose=None, debug=0, ob_id=None, device='cuda:0'):
  torch.cuda.set_device(device)
  est.to_device(device)
  est.glctx = dr.RasterizeCudaContext(device=device)
 
  result = NestDict()
 
  for i, i_frame in enumerate(i_frames):
    logging.info(f"{i}/{len(i_frames)}, i_frame:{i_frame}, ob_id:{ob_id}")
    print("\n### ", f"{i}/{len(i_frames)}, i_frame:{i_frame}, ob_id:{ob_id}")
    video_id = reader.get_video_id()
    color = reader.get_color(i_frame)
    depth = reader.get_depth(i_frame)
    id_str = reader.id_strs[i_frame]
    H,W = color.shape[:2]
 
    debug_dir =est.debug_dir
 
    ob_mask = get_mask(reader, i_frame, ob_id, detect_type=detect_type)
    if ob_mask is None:
      logging.info("ob_mask not found, skip")
      result[video_id][id_str][ob_id] = np.eye(4)
      return result
 
    est.gt_pose = reader.get_gt_pose(i_frame, ob_id)
 
    pose = est.register(K=reader.K, rgb=color, depth=depth, ob_mask=ob_mask, ob_id=ob_id)
    logging.info(f"pose:\n{pose}")
 
    if debug>=3:
      m = est.mesh_ori.copy()
      tmp = m.copy()
      tmp.apply_transform(pose)
      tmp.export(f'{debug_dir}/model_tf.obj')
 
    result[video_id][id_str][ob_id] = pose
 
  return result, pose
 
 
def run_pose_estimation():
  wp.force_load(device='cuda')
  reader_tmp = LinemodReader(opt.linemod_dir, split=None)
  print("## opt.linemod_dir:", opt.linemod_dir)
 
  debug = opt.debug
  use_reconstructed_mesh = opt.use_reconstructed_mesh
  debug_dir = opt.debug_dir
 
  res = NestDict()
  glctx = dr.RasterizeCudaContext()
  mesh_tmp = trimesh.primitives.Box(extents=np.ones((3)), transform=np.eye(4)).to_mesh()
  est = FoundationPose(model_pts=mesh_tmp.vertices.copy(), model_normals=mesh_tmp.vertex_normals.copy(), symmetry_tfs=None, mesh=mesh_tmp, scorer=None, refiner=None, glctx=glctx, debug_dir=debug_dir, debug=debug)
 
  # ob_id
  match = re.search(r'\d+$', opt.linemod_dir)
  if match:
      last_number = match.group()
      ob_id = int(last_number)
  else:
      print("No digits found at the end of the string")
      
  # for ob_id in reader_tmp.ob_ids:
  if ob_id:
    if use_reconstructed_mesh:
      print("## ob_id:", ob_id)
      print("## opt.linemod_dir:", opt.linemod_dir)
      print("## opt.ref_view_dir:", opt.ref_view_dir)
      mesh = reader_tmp.get_reconstructed_mesh(ref_view_dir=opt.ref_view_dir)
    else:
      mesh = reader_tmp.get_gt_mesh(ob_id)
    # symmetry_tfs = reader_tmp.symmetry_tfs[ob_id]  # !!!!!!!!!!!!!!!!
 
    args = []
 
    reader = LinemodReader(opt.linemod_dir, split=None)
    video_id = reader.get_video_id()
    # est.reset_object(model_pts=mesh.vertices.copy(), model_normals=mesh.vertex_normals.copy(), symmetry_tfs=symmetry_tfs, mesh=mesh)  # raw
    est.reset_object(model_pts=mesh.vertices.copy(), model_normals=mesh.vertex_normals.copy(), mesh=mesh) # !!!!!!!!!!!!!!!!
 
    print("### len(reader.color_files):", len(reader.color_files))
    for i in range(len(reader.color_files)):
      args.append((reader, [i], est, debug, ob_id, "cuda:0"))
 
    # vis Data
    to_origin, extents = trimesh.bounds.oriented_bounds(mesh)
    bbox = np.stack([-extents/2, extents/2], axis=0).reshape(2,3)
    os.makedirs(f'{opt.linemod_dir}/track_vis', exist_ok=True)
 
    outs = []
    i = 0
    for arg in args[:200]:
      print("### num:", i)
      out, pose = run_pose_estimation_worker(*arg)
      outs.append(out)
      center_pose = [email protected](to_origin)
      img_color = reader.get_color(i)
      vis = draw_posed_3d_box(reader.K, img=img_color, ob_in_cam=center_pose, bbox=bbox)
      vis = draw_xyz_axis(img_color, ob_in_cam=center_pose, scale=0.1, K=reader.K, thickness=3, transparency=0, is_input_rgb=True)
      imageio.imwrite(f'{opt.linemod_dir}/track_vis/{reader.id_strs[i]}.png', vis)
      i = i + 1
 
    for out in outs:
      for video_id in out:
        for id_str in out[video_id]:
          for ob_id in out[video_id][id_str]:
            res[video_id][id_str][ob_id] = out[video_id][id_str][ob_id]
 
  with open(f'{opt.debug_dir}/linemod_res.yml','w') as ff:
    yaml.safe_dump(make_yaml_dumpable(res), ff)
    print("Save linemod_res.yml OK !!!")
 
 
if __name__=='__main__':
  parser = argparse.ArgumentParser()
  code_dir = os.path.dirname(os.path.realpath(__file__))
  parser.add_argument('--linemod_dir', type=str, default="/guopu/FoundationPose-main/model_free_ref_views/lm_test_all/000015", help="linemod root dir") # lm_test_all  lm_test
  parser.add_argument('--use_reconstructed_mesh', type=int, default=1)
  parser.add_argument('--ref_view_dir', type=str, default="/guopu/FoundationPose-main/model_free_ref_views/lm_ref_views/ob_0000015")
  parser.add_argument('--debug', type=int, default=0)
  parser.add_argument('--debug_dir', type=str, default=f'/guopu/FoundationPose-main/model_free_ref_views/lm_test_all/debug') # lm_test_all  lm_test
  opt = parser.parse_args()
  set_seed(0)
 
  detect_type = 'mask'   # mask / box / detected
  run_pose_estimation()

datareader.py


# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.
 
 
from Utils import *
import json,os,sys
 
 
BOP_LIST = ['lmo','tless','ycbv','hb','tudl','icbin','itodd']
BOP_DIR = os.getenv('BOP_DIR')
 
def get_bop_reader(video_dir, zfar=np.inf):
  if 'ycbv' in video_dir or 'YCB' in video_dir:
    return YcbVideoReader(video_dir, zfar=zfar)
  if 'lmo' in video_dir or 'LINEMOD-O' in video_dir:
    return LinemodOcclusionReader(video_dir, zfar=zfar)
  if 'tless' in video_dir or 'TLESS' in video_dir:
    return TlessReader(video_dir, zfar=zfar)
  if 'hb' in video_dir:
    return HomebrewedReader(video_dir, zfar=zfar)
  if 'tudl' in video_dir:
    return TudlReader(video_dir, zfar=zfar)
  if 'icbin' in video_dir:
    return IcbinReader(video_dir, zfar=zfar)
  if 'itodd' in video_dir:
    return ItoddReader(video_dir, zfar=zfar)
  else:
    raise RuntimeError
 
 
def get_bop_video_dirs(dataset):
  if dataset=='ycbv':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/ycbv/test/*'))
  elif dataset=='lmo':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/lmo/lmo_test_bop19/test/*'))
  elif dataset=='tless':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/tless/tless_test_primesense_bop19/test_primesense/*'))
  elif dataset=='hb':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/hb/hb_test_primesense_bop19/test_primesense/*'))
  elif dataset=='tudl':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/tudl/tudl_test_bop19/test/*'))
  elif dataset=='icbin':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/icbin/icbin_test_bop19/test/*'))
  elif dataset=='itodd':
    video_dirs = sorted(glob.glob(f'{BOP_DIR}/itodd/itodd_test_bop19/test/*'))
  else:
    raise RuntimeError
  return video_dirs
 
 
 
class YcbineoatReader:
  def __init__(self,video_dir, downscale=1, shorter_side=None, zfar=np.inf):
    self.video_dir = video_dir
    self.downscale = downscale
    self.zfar = zfar
    self.color_files = sorted(glob.glob(f"{self.video_dir}/rgb/*.png"))
    self.K = np.loadtxt(f'{video_dir}/cam_K.txt').reshape(3,3)
    self.id_strs = []
    for color_file in self.color_files:
      id_str = os.path.basename(color_file).replace('.png','')
      self.id_strs.append(id_str)
    self.H,self.W = cv2.imread(self.color_files[0]).shape[:2]
 
    if shorter_side is not None:
      self.downscale = shorter_side/min(self.H, self.W)
 
    self.H = int(self.H*self.downscale)
    self.W = int(self.W*self.downscale)
    self.K[:2] *= self.downscale
 
    self.gt_pose_files = sorted(glob.glob(f'{self.video_dir}/annotated_poses/*'))
 
    self.videoname_to_object = {
      'bleach0': "021_bleach_cleanser",
      'bleach_hard_00_03_chaitanya': "021_bleach_cleanser",
      'cracker_box_reorient': '003_cracker_box',
      'cracker_box_yalehand0': '003_cracker_box',
      'mustard0': '006_mustard_bottle',
      'mustard_easy_00_02': '006_mustard_bottle',
      'sugar_box1': '004_sugar_box',
      'sugar_box_yalehand0': '004_sugar_box',
      'tomato_soup_can_yalehand0': '005_tomato_soup_can',
    }
 
 
  def get_video_name(self):
    return self.video_dir.split('/')[-1]
 
  def __len__(self):
    return len(self.color_files)
 
  def get_gt_pose(self,i):
    try:
      pose = np.loadtxt(self.gt_pose_files[i]).reshape(4,4)
      return pose
    except:
      logging.info("GT pose not found, return None")
      return None
 
 
  def get_color(self,i):
    color = imageio.imread(self.color_files[i])[...,:3]
    color = cv2.resize(color, (self.W,self.H), interpolation=cv2.INTER_NEAREST)
    return color
 
  def get_mask(self,i):
    mask = cv2.imread(self.color_files[i].replace('rgb','masks'),-1)
    if len(mask.shape)==3:
      for c in range(3):
        if mask[...,c].sum()>0:
          mask = mask[...,c]
          break
    mask = cv2.resize(mask, (self.W,self.H), interpolation=cv2.INTER_NEAREST).astype(bool).astype(np.uint8)
    return mask
 
  def get_depth(self,i):
    depth = cv2.imread(self.color_files[i].replace('rgb','depth'),-1)/1e3
    depth = cv2.resize(depth, (self.W,self.H), interpolation=cv2.INTER_NEAREST)
    depth[(depth<0.1) | (depth>=self.zfar)] = 0
    return depth
 
 
  def get_xyz_map(self,i):
    depth = self.get_depth(i)
    xyz_map = depth2xyzmap(depth, self.K)
    return xyz_map
 
  def get_occ_mask(self,i):
    hand_mask_file = self.color_files[i].replace('rgb','masks_hand')
    occ_mask = np.zeros((self.H,self.W), dtype=bool)
    if os.path.exists(hand_mask_file):
      occ_mask = occ_mask | (cv2.imread(hand_mask_file,-1)>0)
 
    right_hand_mask_file = self.color_files[i].replace('rgb','masks_hand_right')
    if os.path.exists(right_hand_mask_file):
      occ_mask = occ_mask | (cv2.imread(right_hand_mask_file,-1)>0)
 
    occ_mask = cv2.resize(occ_mask, (self.W,self.H), interpolation=cv2.INTER_NEAREST)
 
    return occ_mask.astype(np.uint8)
 
  def get_gt_mesh(self):
    ob_name = self.videoname_to_object[self.get_video_name()]
    YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')
    mesh = trimesh.load(f'{YCB_VIDEO_DIR}/models/{ob_name}/textured_simple.obj')
    return mesh
 
 
class BopBaseReader:
  def __init__(self, base_dir, zfar=np.inf, resize=1):
    self.base_dir = base_dir
    self.resize = resize
    self.dataset_name = None
    self.color_files = sorted(glob.glob(f"{self.base_dir}/rgb/*"))
    if len(self.color_files)==0:
      self.color_files = sorted(glob.glob(f"{self.base_dir}/gray/*"))
    self.zfar = zfar
 
    self.K_table = {}
    with open(f'{self.base_dir}/scene_camera.json','r') as ff:
      info = json.load(ff)
    for k in info:
      self.K_table[f'{int(k):06d}'] = np.array(info[k]['cam_K']).reshape(3,3)
      self.bop_depth_scale = info[k]['depth_scale']
 
    if os.path.exists(f'{self.base_dir}/scene_gt.json'):
      with open(f'{self.base_dir}/scene_gt.json','r') as ff:
        self.scene_gt = json.load(ff)
      self.scene_gt = copy.deepcopy(self.scene_gt)   # Release file handle to be pickle-able by joblib
      assert len(self.scene_gt)==len(self.color_files)
    else:
      self.scene_gt = None
 
    self.make_id_strs()
 
 
  def make_scene_ob_ids_dict(self):
    with open(f'{BOP_DIR}/{self.dataset_name}/test_targets_bop19.json','r') as ff:
      self.scene_ob_ids_dict = {}
      data = json.load(ff)
      for d in data:
        if d['scene_id']==self.get_video_id():
          id_str = f"{d['im_id']:06d}"
          if id_str not in self.scene_ob_ids_dict:
            self.scene_ob_ids_dict[id_str] = []
          self.scene_ob_ids_dict[id_str] += [d['obj_id']]*d['inst_count']
 
 
  def get_K(self, i_frame):
    K = self.K_table[self.id_strs[i_frame]]
    if self.resize!=1:
      K[:2,:2] *= self.resize
    return K
 
 
  def get_video_dir(self):
    video_id = int(self.base_dir.rstrip('/').split('/')[-1])
    return video_id
 
  def make_id_strs(self):
    self.id_strs = []
    for i in range(len(self.color_files)):
      name = os.path.basename(self.color_files[i]).split('.')[0]
      self.id_strs.append(name)
 
 
  def get_instance_ids_in_image(self, i_frame:int):
    ob_ids = []
    if self.scene_gt is not None:
      name = int(os.path.basename(self.color_files[i_frame]).split('.')[0])
      for k in self.scene_gt[str(name)]:
        ob_ids.append(k['obj_id'])
    elif self.scene_ob_ids_dict is not None:
      return np.array(self.scene_ob_ids_dict[self.id_strs[i_frame]])
    else:
      mask_dir = os.path.dirname(self.color_files[0]).replace('rgb','mask_visib')
      id_str = self.id_strs[i_frame]
      mask_files = sorted(glob.glob(f'{mask_dir}/{id_str}_*.png'))
      ob_ids = []
      for mask_file in mask_files:
        ob_id = int(os.path.basename(mask_file).split('.')[0].split('_')[1])
        ob_ids.append(ob_id)
    ob_ids = np.asarray(ob_ids)
    return ob_ids
 
 
  def get_gt_mesh_file(self, ob_id):
    raise RuntimeError("You should override this")
 
 
  def get_color(self,i):
    color = imageio.imread(self.color_files[i])
    if len(color.shape)==2:
      color = np.tile(color[...,None], (1,1,3))  # Gray to RGB
    if self.resize!=1:
      color = cv2.resize(color, fx=self.resize, fy=self.resize, dsize=None)
    return color
 
 
  def get_depth(self,i, filled=False):
    if filled:
      depth_file = self.color_files[i].replace('rgb','depth_filled')
      depth_file = f'{os.path.dirname(depth_file)}/0{os.path.basename(depth_file)}'
      depth = cv2.imread(depth_file,-1)/1e3
    else:
      depth_file = self.color_files[i].replace('rgb','depth').replace('gray','depth')
      depth = cv2.imread(depth_file,-1)*1e-3*self.bop_depth_scale
    if self.resize!=1:
      depth = cv2.resize(depth, fx=self.resize, fy=self.resize, dsize=None, interpolation=cv2.INTER_NEAREST)
    depth[depth<0.1] = 0
    depth[depth>self.zfar] = 0
    return depth
 
  def get_xyz_map(self,i):
    depth = self.get_depth(i)
    xyz_map = depth2xyzmap(depth, self.get_K(i))
    return xyz_map
 
 
  def get_mask(self, i_frame:int, ob_id:int, type='mask_visib'):
    '''
    @type: mask_visib (only visible part) / mask (projected mask from whole model)
    '''
    pos = 0
    name = int(os.path.basename(self.color_files[i_frame]).split('.')[0])
    if self.scene_gt is not None:
      for k in self.scene_gt[str(name)]:
        if k['obj_id']==ob_id:
          break
        pos += 1
      mask_file = f'{self.base_dir}/{type}/{name:06d}_{pos:06d}.png'
      if not os.path.exists(mask_file):
        logging.info(f'{mask_file} not found')
        return None
    else:
      # mask_dir = os.path.dirname(self.color_files[0]).replace('rgb',type)
      # mask_file = f'{mask_dir}/{self.id_strs[i_frame]}_{ob_id:06d}.png'
      raise RuntimeError
    mask = cv2.imread(mask_file, -1)
    if self.resize!=1:
      mask = cv2.resize(mask, fx=self.resize, fy=self.resize, dsize=None, interpolation=cv2.INTER_NEAREST)
    return mask>0
 
 
  def get_gt_mesh(self, ob_id:int):
    mesh_file = self.get_gt_mesh_file(ob_id)
    mesh = trimesh.load(mesh_file)
    mesh.vertices *= 1e-3
    return mesh
 
 
  def get_model_diameter(self, ob_id):
    dir = os.path.dirname(self.get_gt_mesh_file(self.ob_ids[0]))
    info_file = f'{dir}/models_info.json'
    with open(info_file,'r') as ff:
      info = json.load(ff)
    return info[str(ob_id)]['diameter']/1e3
 
 
 
  def get_gt_poses(self, i_frame, ob_id):
    gt_poses = []
    name = int(self.id_strs[i_frame])
    for i_k, k in enumerate(self.scene_gt[str(name)]):
      if k['obj_id']==ob_id:
        cur = np.eye(4)
        cur[:3,:3] = np.array(k['cam_R_m2c']).reshape(3,3)
        cur[:3,3] = np.array(k['cam_t_m2c'])/1e3
        gt_poses.append(cur)
    return np.asarray(gt_poses).reshape(-1,4,4)
 
 
  def get_gt_pose(self, i_frame:int, ob_id, mask=None, use_my_correction=False):
    ob_in_cam = np.eye(4)
    best_iou = -np.inf
    best_gt_mask = None
    name = int(self.id_strs[i_frame])
    for i_k, k in enumerate(self.scene_gt[str(name)]):
      if k['obj_id']==ob_id:
        cur = np.eye(4)
        cur[:3,:3] = np.array(k['cam_R_m2c']).reshape(3,3)
        cur[:3,3] = np.array(k['cam_t_m2c'])/1e3
        if mask is not None:  # When multi-instance exists, use mask to determine which one
          gt_mask = cv2.imread(f'{self.base_dir}/mask_visib/{self.id_strs[i_frame]}_{i_k:06d}.png', -1).astype(bool)
          intersect = (gt_mask*mask).astype(bool)
          union = (gt_mask+mask).astype(bool)
          iou = float(intersect.sum())/union.sum()
          if iou>best_iou:
            best_iou = iou
            best_gt_mask = gt_mask
            ob_in_cam = cur
        else:
          ob_in_cam = cur
          break
 
 
    if use_my_correction:
      if 'ycb' in self.base_dir.lower() and 'train_real' in self.color_files[i_frame]:
        video_id = self.get_video_id()
        if ob_id==1:
          if video_id in [12,13,14,17,24]:
            ob_in_cam = [email protected]_tfs[ob_id][1]
    return ob_in_cam
 
 
  def load_symmetry_tfs(self):
    dir = os.path.dirname(self.get_gt_mesh_file(self.ob_ids[0]))
    info_file = f'{dir}/models_info.json'
    with open(info_file,'r') as ff:
      info = json.load(ff)
    self.symmetry_tfs = {}
    self.symmetry_info_table = {}
    for ob_id in self.ob_ids:
      self.symmetry_info_table[ob_id] = info[str(ob_id)]
      self.symmetry_tfs[ob_id] = symmetry_tfs_from_info(info[str(ob_id)], rot_angle_discrete=5)
    self.geometry_symmetry_info_table = copy.deepcopy(self.symmetry_info_table)
 
 
  def get_video_id(self):
    return int(self.base_dir.split('/')[-1])
 
 
class LinemodOcclusionReader(BopBaseReader):
  def __init__(self,base_dir='/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD-O/lmo_test_all/test/000002', zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'lmo'
    self.K = list(self.K_table.values())[0]
    self.ob_ids = [1,5,6,8,9,10,11,12]
    self.ob_id_to_names = {
      1: 'ape',
      2: 'benchvise',
      3: 'bowl',
      4: 'camera',
      5: 'water_pour',
      6: 'cat',
      7: 'cup',
      8: 'driller',
      9: 'duck',
      10: 'eggbox',
      11: 'glue',
      12: 'holepuncher',
      13: 'iron',
      14: 'lamp',
      15: 'phone',
    }
    # self.load_symmetry_tfs()
 
  def get_gt_mesh_file(self, ob_id):
    mesh_dir = f'{BOP_DIR}/{self.dataset_name}/models/obj_{ob_id:06d}.ply'
    return mesh_dir
 
 
 
class LinemodReader(LinemodOcclusionReader):
  def __init__(self, base_dir='/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD/lm_test_all/test/000001', zfar=np.inf, split=None):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'lm'
    if split is not None:  # train/test
      print("## split is not None")
      with open(f'/mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD/Linemod_preprocessed/data/{self.get_video_id():02d}/{split}.txt','r') as ff:
        lines = ff.read().splitlines()
      self.color_files = []
      for line in lines:
        id = int(line)
        self.color_files.append(f'{self.base_dir}/rgb/{id:06d}.png')
      self.make_id_strs()
 
    self.ob_ids = np.setdiff1d(np.arange(1,16), np.array([7,3])).tolist()  # Exclude bowl and mug
    # self.load_symmetry_tfs()
 
 
  def get_gt_mesh_file(self, ob_id):
    root = self.base_dir
    print(f'{root}/../')
    print(f'{root}/lm_models')
    print(f'{root}/lm_models/models/obj_{ob_id:06d}.ply')
    while 1:
      if os.path.exists(f'{root}/lm_models'):
        mesh_dir = f'{root}/lm_models/models/obj_{ob_id:06d}.ply'
        break
      else:
        root = os.path.abspath(f'{root}/../')
        mesh_dir = f'{root}/lm_models/models/obj_{ob_id:06d}.ply'
        break
    return mesh_dir
 
 
  def get_reconstructed_mesh(self, ref_view_dir):
    mesh = trimesh.load(os.path.abspath(f'{ref_view_dir}/model/model.obj'))
    return mesh
 
 
class YcbVideoReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'ycbv'
    self.K = list(self.K_table.values())[0]
 
    self.make_id_strs()
 
    self.ob_ids = np.arange(1,22).astype(int).tolist()
    YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')
    self.ob_id_to_names = {}
    self.name_to_ob_id = {}
    # names = sorted(os.listdir(f'{YCB_VIDEO_DIR}/models/'))
    if os.path.exists(f'{YCB_VIDEO_DIR}/models/'):
        names = sorted(os.listdir(f'{YCB_VIDEO_DIR}/models/'))
        for i,ob_id in enumerate(self.ob_ids):
          self.ob_id_to_names[ob_id] = names[i]
          self.name_to_ob_id[names[i]] = ob_id
    else:
        names = []
         
    if 0:
    # if 'BOP' not in self.base_dir:
      with open(f'{self.base_dir}/../../keyframe.txt','r') as ff:
        self.keyframe_lines = ff.read().splitlines()
 
    # self.load_symmetry_tfs()
    '''for ob_id in self.ob_ids:
      if ob_id in [1,4,6,18]:   # Cylinder
        self.geometry_symmetry_info_table[ob_id] = {
          'symmetries_continuous': [
              {'axis':[0,0,1], 'offset':[0,0,0]},
            ],
          'symmetries_discrete': euler_matrix(0, np.pi, 0).reshape(1,4,4).tolist(),
          }
      elif ob_id in [13]:
        self.geometry_symmetry_info_table[ob_id] = {
          'symmetries_continuous': [
              {'axis':[0,0,1], 'offset':[0,0,0]},
            ],
          }
      elif ob_id in [2,3,9,21]:   # Rectangle box
        tfs = []
        for rz in [0, np.pi]:
          for rx in [0,np.pi]:
            for ry in [0,np.pi]:
              tfs.append(euler_matrix(rx, ry, rz))
        self.geometry_symmetry_info_table[ob_id] = {
          'symmetries_discrete': np.asarray(tfs).reshape(-1,4,4).tolist(),
          }
      else:
        pass'''
 
  def get_gt_mesh_file(self, ob_id):
    if 'BOP' in self.base_dir:
      mesh_file = os.path.abspath(f'{self.base_dir}/../../ycbv_models/models/obj_{ob_id:06d}.ply')
    else:
      mesh_file = f'{self.base_dir}/../../ycbv_models/models/obj_{ob_id:06d}.ply'
    return mesh_file
 
 
  def get_gt_mesh(self, ob_id:int, get_posecnn_version=False):
    if get_posecnn_version:
      YCB_VIDEO_DIR = os.getenv('YCB_VIDEO_DIR')
      mesh = trimesh.load(f'{YCB_VIDEO_DIR}/models/{self.ob_id_to_names[ob_id]}/textured_simple.obj')
      return mesh
    mesh_file = self.get_gt_mesh_file(ob_id)
    mesh = trimesh.load(mesh_file, process=False)
    mesh.vertices *= 1e-3
    tex_file = mesh_file.replace('.ply','.png')
    if os.path.exists(tex_file):
      from PIL import Image
      im = Image.open(tex_file)
      uv = mesh.visual.uv
      material = trimesh.visual.texture.SimpleMaterial(image=im)
      color_visuals = trimesh.visual.TextureVisuals(uv=uv, image=im, material=material)
      mesh.visual = color_visuals
    return mesh
 
 
  def get_reconstructed_mesh(self, ob_id, ref_view_dir):
    mesh = trimesh.load(os.path.abspath(f'{ref_view_dir}/ob_{ob_id:07d}/model/model.obj'))
    return mesh
 
 
  def get_transform_reconstructed_to_gt_model(self, ob_id):
    out = np.eye(4)
    return out
 
 
  def get_visible_cloud(self, ob_id):
    file = os.path.abspath(f'{self.base_dir}/../../models/{self.ob_id_to_names[ob_id]}/visible_cloud.ply')
    pcd = o3d.io.read_point_cloud(file)
    return pcd
 
 
  def is_keyframe(self, i):
    color_file = self.color_files[i]
    video_id = self.get_video_id()
    frame_id = int(os.path.basename(color_file).split('.')[0])
    key = f'{video_id:04d}/{frame_id:06d}'
    return (key in self.keyframe_lines)
 
 
 
class TlessReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'tless'
 
    self.ob_ids = np.arange(1,31).astype(int).tolist()
    self.load_symmetry_tfs()
 
 
  def get_gt_mesh_file(self, ob_id):
    mesh_file = f'{self.base_dir}/../../../models_cad/obj_{ob_id:06d}.ply'
    return mesh_file
 
 
  def get_gt_mesh(self, ob_id):
    mesh = trimesh.load(self.get_gt_mesh_file(ob_id))
    mesh.vertices *= 1e-3
    mesh = trimesh_add_pure_colored_texture(mesh, color=np.ones((3))*200)
    return mesh
 
 
class HomebrewedReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'hb'
    self.ob_ids = np.arange(1,34).astype(int).tolist()
    self.load_symmetry_tfs()
    self.make_scene_ob_ids_dict()
 
 
  def get_gt_mesh_file(self, ob_id):
    mesh_file = f'{self.base_dir}/../../../hb_models/models/obj_{ob_id:06d}.ply'
    return mesh_file
 
 
  def get_gt_pose(self, i_frame:int, ob_id, use_my_correction=False):
    logging.info("WARN HomeBrewed doesn't have GT pose")
    return np.eye(4)
 
 
 
class ItoddReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'itodd'
    self.make_id_strs()
 
    self.ob_ids = np.arange(1,29).astype(int).tolist()
    self.load_symmetry_tfs()
    self.make_scene_ob_ids_dict()
 
 
  def get_gt_mesh_file(self, ob_id):
    mesh_file = f'{self.base_dir}/../../../itodd_models/models/obj_{ob_id:06d}.ply'
    return mesh_file
 
 
class IcbinReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'icbin'
    self.ob_ids = np.arange(1,3).astype(int).tolist()
    self.load_symmetry_tfs()
 
  def get_gt_mesh_file(self, ob_id):
    mesh_file = f'{self.base_dir}/../../../icbin_models/models/obj_{ob_id:06d}.ply'
    return mesh_file
 
 
class TudlReader(BopBaseReader):
  def __init__(self, base_dir, zfar=np.inf):
    super().__init__(base_dir, zfar=zfar)
    self.dataset_name = 'tudl'
    self.ob_ids = np.arange(1,4).astype(int).tolist()
    self.load_symmetry_tfs()
 
  def get_gt_mesh_file(self, ob_id):
    mesh_file = f'{self.base_dir}/../../../tudl_models/models/obj_{ob_id:06d}.ply'
    return mesh_file

运行run_linemod.py：

python run_linemod.py

能看到文件夹model_free_ref_views/lm_test_all/000015/track_vis/

里面存放可视化结果：

分享完成~

本文先介绍到这里，后面会分享“6D位姿估计”的其它数据集、算法、代码、具体应用示例。

前言

1、搭建环境

2、基于CAD模型为输入的demo

3、NeRF物体重建训练

4、RGBD图输入demo

评论记录：