Low-level Debugger (LLDB) Extension for Structure Visualization

Low-level Debugger (LLDB) Extension for Structure Visualization

Maxence Menager
Maxence Menager
5 Min Read

Debuggers are essential tools to dig into code issues. They can be a huge time-saver for developers.

However, debuggers need help to correctly handle nontrivial data structures. They need the insight to understand the internals of the data model. Recursive structures found in trees are particularly challenging.

In this blog, we will see how to implement an LLDB (low-level debugger) extension to improve variable formatting. We will explore how to customize LLDB to greatly improve our data structure visualization thus our debugging efficiency. We will play with toy structures that form the base of an Abstract Syntax Tree (AST).

Data Model

The first structure we need is a chained list. Chained lists are found everywhere in C, yet they are usually painful to debug. They contain a lot of indirection forcing the user to unravel them.

Here is the simple linked list we will work with. It can contain two types of data int or Node*.

    
     enum ListKind{
    IntListKind,
    NodeListKind
};

struct ListCell{
    union {
       int val;
       struct Node* node;
    } data;
    ListCell* next;
};

struct List{
    int size;
    enum ListKind kind;
    ListCell* head;
    ListCell* tail;
};

    
   

Because of the chained nature of the list, if we display the list we don’t see its actual content.

    
     (lldb) p *intList
(List) $3 = {
  size = 3
  kind = IntListKind
  head = 0x0000600000008000
  tail = 0x0000600000008020
}

    
   

Even in a GUI linked lists need a lot of clicks to be displayed.

lldb extension

The other structure we need is a composite struct. It’s a very common idiom in C.

    
     enum NodeKind{
    LeafNodeLabel,
    PassThroughNodeLabel,
    BranchNodeLabel,
    ListNodeLabel,
};

typedef struct Node Node;
// Our 'base' class
struct Node{ 
    enum NodeKind label;
};

// 'children' of Node
struct LeafNode{
    Node node;
    int valA;
    int valB;
};

struct PassThroughNode {
    Node node;
    Node* child;
};

struct BranchNode {
    Node node;
    Node* leftChild;
    Node* rightChild;
};

struct ListNode {
    Node node;
    struct List* list;
}

    
   

Because each node version contains a Node element as the first variable, we can use polymorphism. The true form of a node is contained in its label field. The issue is only the programmer knows that Node* pointers can be downcasted to their subtypes.

With only those two building blocks we can start building an interesting data structure:

    
     List *intList = makeEmptyList(IntListKind);
appendToIntList(intList, 1);
appendToIntList(intList, 2);
appendToIntList(intList, 3);

Node* leaf1 = makeLeafNode(1, 2);
Node* leaf2 = makeLeafNode(2, 4);
Node* leaf3 = makeLeafNode(3, 6);
Node* leaf4 = makeLeafNode(4, 8);
Node* leaf5 = makeLeafNode(5, 0);

Node* intListNode = makeListNode(intList);
Node* passThrough = makePassThroughNode(intListNode);
Node* branch1 = makeBranchNode(leaf1, leaf2);

List *nodeList = makeEmptyList(NodeListKind);
appendToNodeList(nodeList, passThrough);
appendToNodeList(nodeList, branch1);
appendToNodeList(nodeList, leaf3);
appendToNodeList(nodeList, leaf4);
appendToNodeList(nodeList, leaf5);

Node* root = makeListNode(nodeList);

    
   

To see the content of the tree we must cast it in LLDB.

    
     (lldb) p *root
(Node) $1 = (kind = ListNodeKind)
(lldb) p *((ListNode*) root)
(ListNode) $4 = {
  node = (kind = ListNodeKind)
  list = 0x0000600000208040
}

    
   
Improving Visualization

LLDB provides two solutions to improve variable formatting.

  • Summary strings
  • Synthetics

Summaries are a good quick way to improve the readability of objects. A simple format string is used to build the object representation.

    
     (lldb) type summary add --summary-string "Sign: ${var[31]%B} Exponent: ${var[30-23]%x} Mantissa: ${var[0-22]%u}" float
(lldb) frame variable float_point
(float) float_point = -3.14159 Sign: true Exponent: 0x00000080 Mantissa: 4788184

    
   

However, summaries have some limitations. They must use fields already present in the variable.

Synthetics are a more powerful tool. They allow adding “fake” children to composed types through python scripting.

To implement a Synthetic provider for a type we must implement this interface:

    
     class SyntheticChildrenProvider:
   def __init__(self, valobj: SBValue, internal_dict):
      'this call should initialize the Python object using valobj as the variable to provide synthetic children for'
   def num_children(self):
      'this call should return the number of children that you want your object to have'
   def get_child_index(self,name: str):
      'this call should return the index of the synthetic child whose name is given as argument'
   def get_child_at_index(self,index: int):
      'this call should return a new LLDB SBValue object representing the child at the index given as argument'

    
   
LinkedLists

Let’s start with LinkedLists. We don’t have to start from scratch, we can use the children already provided in LLDB.

    
     class ListSynthetic:
    """
    Synthetic for list
    """
    def __init__(self, valobj: SBValue, internal_dict):

        self.valobj = valobj
        self.childs: Dict[str, SBValue] = {"size": self.valobj.GetChildAtIndex(0),
                                           "kind": self.valobj.GetChildAtIndex(1),
                                           "head": self.valobj.GetChildAtIndex(2),
                                           "tail": self.valobj.GetChildAtIndex(3)}

        self.list_childs = list(self.childs.values())

    def num_children(self) -> int:
        return len(self.childs)

    def get_child_index(self, name: str):
        return self.childs.get(name, -1)

    def get_child_at_index(self, index: str):
        return self.list_childs[index]

    
   

Note that the GetChildAtIndex() match the C structure.

    
     struct List{
    int size;            // Child0
    enum ListKind kind;   // Child1
    ListCell* head;       // Child2
    ListCell* tail;       // Child3
}; 
Let's add a new child to see it in LLDB.
        # Get the size SBValue
        size: SBValue = self.valobj.GetChildAtIndex(0)
        # Create a new SBValue using the size, this is a trick to have a new name
        new_child = valobj.CreateValueFromData("new_val", size.data, size.type)

        # Register it in our class
        self.children_index["new_val"] = 4
        self.list_children.append(new_child)

    
   

To register the synthetic we can use those commands in LLDB:

(lldb) command script import ~/Repos/blog/lldb_plugin/main.py
(lldb) type synthetic add -l main.ListSynthetic List

To automate this, we can create a .lldbinit in our home directory with the same lines:

command script import ~/Repos/blog/lldb_plugin/main.py
type synthetic add -l main.ListSynthetic List

Now if we look at our intList in LLDB we can see the newly added child.

    
     (lldb) p *intList
(List) $0 = {
  size = 3
  kind = IntListKind
  head = 0x0000600000008000
  tail = 0x0000600000008020
  new_val = 3
}
(lldb)

    
   

Copying the size is not very useful, let’s unroll our list values instead.

We transform our renaming trick into a small function.

    
     def copy_sbvalue(value: SBValue, name: str) -> SBValue:
    """ Copy given sbvalue into a new one"""
    return value.CreateValueFromData(name, value.data, value.type)

    
   

Now we want to access every value in the linked list. Conveniently, we have the list size. We can access it with the GetChildMemberWithName method. The .valuemethod of an SBValue always returns a string as displayed in the debugger. Thus we need to cast it to an int.

    
     def _list_values(self) -> Iterable[SBValue]:
      """Iterate over list values"""
        hd: SBValue = self.valobj.GetChildMemberWithName("head")
        size: int = int(self.valobj.GetChildMemberWithName("size").value) # .value returns an string
        for i in range(size):
            yield hd.GetChildMemberWithName("data")
            hd = hd.GetChildMemberWithName("next")

    
   

Let’s use this new function in our init:

    
     def __init__(self, valobj: SBValue, internal_dict):
        self.valobj = valobj
        self.children_index: Dict[str, int] = {"size": 0,
                                               "kind": 1,
                                               "head": 2,
                                               "tail": 3}

        self.list_children = [self.valobj.GetChildAtIndex(i) for i in self.children_index.values()]
    
        # For every item in the list, add them as a new child. 
        for i, item in enumerate(self._list_values()):
            new_item = copy_sbvalue(item, f"item_{i}")
            self.children_index[new_item.name] = len(self.list_children)
            self.list_children.append(new_item)

    
   

As of now, there is no need to manually unroll the list anymore. We can see the result in LLDB.

    
     (lldb) p *intList
(List) $0 = {
  size = 3
  kind = IntListKind
  head = 0x0000600000008000
  tail = 0x0000600000008020
  item_0 = {
    val = 1
    node = 0x0000000000000001
  }
  item_1 = {
    val = 2
    node = 0x0000000000000002
  }
  item_2 = {
    val = 3
    node = 0x0000000000000003
  }

    
   

The display of the unions is still not perfect. The list stores the true type of its data in the label field. We can use this label to downcast the union to its real value. (Vocabulary)

First, we need a function to retrieve an SBType object from a typename. Types of information are stored in a frame’s symbol context. Only basetype are stored, there is no match for Node*.

    
     module = value.GetFrame().GetSymbolContext(lldb.eSymbolContextEverything).GetModule()
    sbtype: SBType = module.FindFirstType(typename)
We then implement a get_type function. We add the option to return a pointer instead of a base type. We can use GetPointerType for this.
def get_type(value: SBValue, typename: str, as_pointer=False) -> SBType:
    module = value.GetFrame().GetSymbolContext(lldb.eSymbolContextEverything).GetModule()
    sbtype: SBType = module.FindFirstType(typename)
    if as_pointer:
        return sbtype.GetPointerType()
    return sbtype

    
   

Then, we can create a method to retrieve the true SBType corresponding to the list kind. To do so we have to match the kind field with its different possibilities.

    
     def _kind_type(self) -> SBType:
    kind: str = self.valobj.GetChildMemberWithName("kind").value
    if kind == "IntListKind":
        return get_type(self.valobj, "int")
    if kind == "NodeListKind":
        return get_type(self.valobj, "Node", as_pointer=True)

    
   

We can now use this kind to cast item values:

    
     def _list_values(self) -> Iterable[SBValue]:
        """Iterate over list values"""
        hd: SBValue = self.valobj.GetChildMemberWithName("head")
        size: int = int(self.valobj.GetChildMemberWithName("size").value)
        kind_type = self._kind_type() 
        for i in range(size):
            yield hd.GetChildMemberWithName("data").Cast(kind_type) # Cast the union value to its true type 
            hd = hd.GetChildMemberWithName("next")

    def _kind_type(self) -> SBType:
        """ Get kind type contained in the list"""
        kind: str = self.valobj.GetChildMemberWithName("kind").value
        if kind == "IntListKind":
            return get_type(self.valobj, "int")
        if kind == "NodeListKind":
            return get_type(self.valobj, "Node", as_pointer=True)

    
   

Now if we display our list in LLDB we should see item true value at the top level.

    
     (lldb) p *intList
(List) $0 = {
  size = 3
  kind = IntListKind
  head = 0x0000600000008000
  tail = 0x0000600000008020
  item_0 = 1
  item_1 = 2
  item_2 = 3
}
(lldb) p *nodeList
(List) $1 = {
  size = 5
  kind = NodeListKind
  head = 0x00006000000080a0
  tail = 0x00006000000080e0
  item_0 = 0x0000600000008090
  item_1 = 0x0000600000204060
  item_2 = 0x0000600000008050
  item_3 = 0x0000600000008060
  item_4 = 0x0000600000008070
}

    
   
Composite Structs

The other building block we need is composite structs.

Because LLDB only sees *Node pointers, the formatting is very limited.

    
     (lldb) p *leaf1
(Node) $0 = (kind = LeafNodeKind)
(lldb) p *intListNode
(Node) $1 = (kind = ListNodeKind)
(lldb) p *branch1
(Node) $2 = (kind = BranchNodeKind)

    
   

We can solve this issue in a similar way we did for unions in linked lists.

We start by creating a new synthetic:

    
     class NodeSynthetic:
    """
    Synthetic for Node
    """
    def __init__(self, valobj: SBValue, internal_dict):
        self.valobj = valobj
        self.list_children = [self.valobj.GetChildAtIndex(i) for i in range(self.valobj.GetNumChildren()]
        self.children_index = {child.name: i for i, child in enumerate(self.list_children)}

    def num_children(self) -> int:
        return len(self.children_index)

    def get_child_index(self, name: str) -> int:
        return self.children_index.get(name, -1)

    def get_child_at_index(self, index):
        return self.list_children[index]

    
   

Now we can implement a _kind_typemethod for the Node synthetic similar to the List one.

    
     def _kind_type(self) -> SBType:
        """ Get base type of the node """
        kind: str = self.valobj.GetChildMemberWithName("kind").value
        if kind == "LeafNodeKind":
            return get_type(self.valobj, "LeafNode", as_pointer=True)
        elif kind == "PassThroughNodeKind":
            return get_type(self.valobj, "PassThroughNode", as_pointer=True)
        elif kind == "BranchNodeKind":
            return get_type(self.valobj, "BranchNode", as_pointer=True)
        elif kind == "ListNodeKind":
            return get_type(self.valobj, "ListNode", as_pointer=True)

    
   

Then we can use this base type to downcast our initial valobj and extract its true children. Base and Pointer types must be handled differently. A base type must be converted to a pointer before being downcasted.

    
     self.kindtype = self._kind_type()
# Two cases:
if valobj.TypeIsPointerType():
   # If we have a pointer we can directly cast it: kind_type *base_object_ptr = (kind_type*) valobj;
   self.base_object_ptr: SBValue = self.valobj.Cast(self._kind_type())
else:
    # TODO
    self.base_object_ptr: SBValue = self.valobj.address_of.Cast(self._kind_type())

    
   

We can now extract children of the base_object_ptr. node child is excluded to avoid infinite recursion. We also add the kind value of the base object.

    
     self.list_children = [child for child in self.base_object.children if child.name != "node"]
self.list_children.append(valobj.GetChildMemberWithName("kind"))

    
   

All those transformations would fail if the pointer is not initialized. We must enclose them in a try block. We end up with:

    
     def __init__(self, valobj: SBValue, internal_dict):
        try:
            self.valobj = valobj
            self.kindtype = self._kind_type()
            if valobj.TypeIsPointerType():
                self.base_object_ptr: SBValue = self.valobj.Cast(self._kind_type())
            else:
                self.base_object_ptr: SBValue = self.valobj.address_of.Cast(self._kind_type())
            self.list_children = [child for child in self.base_object_ptr.children if child.name != "node"]
            self.list_children.append(valobj.GetChildMemberWithName("kind"))
            self.children_index = {child.name: i for i, child in enumerate(self.list_children)}
        except Exception:
            self.list_children = []
            self.children_index = {}

    
   

We can now visualize our nodes in LLDB:

    
     (lldb) p *leaf1
(Node) $2 = (valA = 1, valB = 2, kind = LeafNodeKind)
(lldb) p *intListNode
(Node) $3 = {
  list = 0x000060000020c000
  kind = ListNodeKind
}
(lldb) p *passThrough
(Node) $4 = {
  child = 0x000060000000c080
  kind = PassThroughNodeKind
}
(lldb) p *branch1
(Node) $5 = {
  leftChild = 0x000060000000c030
  rightChild = 0x000060000000c040
  kind = BranchNodeKind
}

    
   

This is even better in a GUI debugger (here Clion).

lldb extension image
 
Conclusion

LLDB scripting API is powerful. We have seen here how to use it to improve variable formatting, but it can also be used to add new commands or to implement smart breakpoints.

Improving variable formatting is an investment that pays off very quickly. Moreover, because in C most idioms are very common, a synthetic can be easily reused in a different context.

Sign up for our newsletter and stay up to date