「現代C++設計魅力」虛擬函式繼承-thunk技術初探

一  問題背景

1  實踐驗證

工作中使用LLDB偵錯程式除錯這一段C++多繼承程式的時候,發現透過lldb print(expression命令的別名) 命令獲取的指標地址和實際理解的C++的記憶體模型的地址不一樣。那麼到底是什麼原因呢?程式如下:
classBase {public: Base(){}protected:float x;};classVBase {public: VBase(){}virtualvoidtest(){};virtualvoidfoo(){};protected:float x;};classVBaseA:public VBase {public: VBaseA(){}virtualvoidtest(){}virtualvoidfoo(){};protected:float x;};classVBaseB:public VBase {public: VBaseB(){}virtualvoidtest(){printf("test \n"); }virtualvoidfoo(){};protected:float x;};classVDerived :public VBaseA, public Base, public VBaseB {public: VDerived(){}virtualvoidtest(){}virtualvoidfoo(){};protected:float x;};intmain(int argc, char *argv[]){ VDerived *pDerived = new VDerived(); //0x0000000103407f30 Base *pBase = (Base*)pDerived; //0x0000000103407f40 VBaseA *pvBaseA = static_cast<VBaseA*>(pDerived);//0x0000000103407f30 VBaseB *pvBaseB = static_cast<VBaseB*>(pDerived);//0x0000000103407f30 這裡應該為0x0000000103407f48,但是顯示的是0x0000000103407f30unsignedlong pBaseAddressbase = (unsignedlong)pBase;unsignedlong pvBaseAAddressbase = (unsignedlong)pvBaseA;unsignedlong pvBaseBAddressbase = (unsignedlong)pvBaseB; pvBaseB->test();}
透過lldb print命令獲取的地址如下圖:
正常理解的C++記憶體模型
由於我使用的是x86_64的mac系統,所以指標是8位元組對齊,align=8。
按正常的理解的C++記憶體模型:pDerived轉換為Base 型別pBase,地址偏移了16,是沒問題的。
pDerived轉化為VBaseA,由於共用了首地址為0x0000000103407f30,一樣可以理解。pDerived轉化為Base,地址偏移了16個位元組(sizeof(VBaseA))為0x0000000103407f40,也是符合預期的。
但是pDerived轉化為VBase 型別pBaseB記憶體地址應該偏移24,為0x0000000103407f48;而不是0x0000000103407f30(物件的首地址),這個到底是什麼原因引起的的呢?

2  驗證引發的猜測

對於上面的這段程式碼
Base 類中沒有虛擬函式,VBaseB 中有虛擬函式test和foo,猜測如下
1.不含有虛擬函式的(不含有虛表的)基類的指標,在型別轉換時編譯器對地址按照實際偏移。
2.含有虛擬函式的(含有虛表的)基類指標,在型別轉換時,編譯器實際上沒有做地址的偏移,還是指向派生類,並沒有指向實際的VBaseB型別。

二  現象帶來的問題

1.有虛擬函式的(含有虛表的)基類指標,在派生類型別轉換為有虛擬函式的基類時,編譯器背後有做真實的地址偏移嗎?
2.如果做了偏移
  • 那C++中在透過基類指標呼叫派生類重寫的虛擬函式以及透過派生類指標呼叫虛擬函式的時候,編譯器是如何保證這兩種呼叫this指標的值是一樣的,以確保呼叫的正確性的?
  • 那為什麼LLDB expression獲取的地址是派生類物件的首地址呢?
3.如果沒有做偏移,那是如何透過派生類的指標呼叫基類成員變數和函式的?

三  現象核心原因

  1. 編譯器背後和普通的非虛擬函式繼承一樣,也做了指標的偏移。
  2. 做了指標偏移,C++ 中基類物件指標呼叫派生類物件時,編譯器透過thunk技術來實現每次引數呼叫和引數返回this地址的調整。
  3. LLDB expression顯示的是派生類物件的首地址(0x0000000103407f30),而不是偏移後基類物件的首地址(0x0000000103407f48),是由於LLDB偵錯程式在expression向用戶展示的時候,對於虛擬函式繼承的基類指標LLDB內部會透過summary format來對要獲取的結果進行格式化。summary format時,會根據當前的記憶體地址獲取C++執行時的動態型別和地址,來向用戶展示。

四  證實結論過程

1  指標型別轉換時編譯器是否做了偏移?

彙編指令分析

有虛擬函式的(含有虛表的)基類指標,在派生類型別轉換為有虛擬函式的基類時,編譯器背後有做真實的地址偏移嗎?
基於上面的猜測,透過下面執行時反彙編的程式,來驗證上面的猜測:
在開始反彙編程式之前,有一些下面要用到的彙編知識的普及。如果熟悉,可以忽略跳過。
注意:由於小編使用的是mac作業系統,所以處理器使用的是AT&T語法;和Intel語法不一樣。
AT&T語法的指令是從左到右,第一個是源運算元,第二個是目的運算元,比如:
movl %esp, %ebp //movl是指令名稱。%則表明esp和ebp是暫存器.在AT&T語法中, 第一個是源運算元,第二個是目的運算元。
而Intel指令是從右到左,第二個是源運算元,第一個是目的運算元
MOVQ EBP, ESP //interl手冊,你會看到是沒有%的intel語法, 它的運算元順序剛好相反
在x86_64的暫存器呼叫約定規定中
1.第一個引數基本上放在:RDI/edi暫存器,第二個引數:RSI/esi暫存器,第三個引數:RDX暫存器,第四個引數:RCD暫存器,第五個引數:R8暫存器,第六個引數:R9 暫存器;
2.如果超過六個引數在函數里就會透過棧來訪問額外的引數;
3.函式返回值一般放在eax暫存器,或者rax暫存器。
下面使用的mac Unix作業系統,本文用到的彙編指令都是AT&T語法,在函式傳引數時的第一個引數都放在RDI暫存器中。
下面是上面的main程式從開始執行到退出程式的所有彙編程式
透過上看的彙編程式碼我們發現編譯器在做型別轉換的時候不管是繼承的基類有虛擬函式,還是沒有虛擬函式,編譯器都會做實際的指標偏移,偏移到實際的基類物件的地址,證明上面的猜測是錯誤的。編譯器在型別轉換的時候不區分有沒有虛擬函式,都是實際做了偏移的。

2  記憶體分析

上面的猜測,後來我透過LLDB偵錯程式提供的:memory read ptr(memory read 命令縮寫 x )得到了驗證
(lldb) memory read pDerived0x103407f30: 40400000010000000000000000000000 @@..............0x103407f40: 10000000000000006040000001000000 ........`@......(lldb) memory read pvBaseB0x103407f48: 60 40 00 00 01 00 00 00 00 00 00 00 00 00 00 00 `@..............0x103407f58: de 2d0510000000000000000000000000 .-..............
我們發現不同型別的指標 在記憶體中確實讀取到的內容分別是pDerived:0x103407f30 pvBaseB:0x103407f48記憶體地址都不一樣;都是實際偏移後地址。

2  虛擬函式呼叫如何保證this的值一致的呢?

那既然內容中的真實地址是偏移後的,派生類重寫了基類的虛擬函式,在透過基類指標呼叫派生類重新的虛擬函式的時候和透過派生類呼叫自身實現的虛擬函式的時候,編譯器是如何保證這兩種呼叫this指標的值是一樣的,來確保呼叫的正確性的?
在網上查閱資料得知:C++在呼叫函式的時候, 編譯器透過thunk技術對this指標的內容做了調整,使其指向正確的記憶體地址。那麼什麼是thunk技術?編譯器是如何實現的呢?

虛擬函式調用匯編指令分析

透過上面main函式不難發現的pvBaseB->test() 的反彙編:
pBaseB->test();0x100003c84 <+244>: movq -0x40(%rbp), %rax //-x40存方的是pBaseB指標的內容,這裡取出pBaseB指向的地址0x100003c88 <+248>: movq (%rax), %rcx //然後將 rax的內容賦值給rcx0x100003c8b <+251>: movq %rax, %rdi // 之後再將rax的值給到rdi暫存器:我們都知道,rdi暫存器是函式呼叫的第一個引數,這裡的this是基類的地址-> 0x100003c8e <+254>: callq *(%rcx) // 在這裡取出rcx的地址,然後透過*(rcx) 間接呼叫rcx中存的地址
我們再跳到VDerived::test函式的彙編實現, 在這裡透過lldb的命令:register read rdi檢視函式的第一個傳參,也就是 this的地址,已經是派生類的地址了,不是呼叫前基類的地址
testCPPVirtualMemeory`VDerived::test:0x100003e00 <+0>: pushq %rbp // 棧低指標壓棧 0x100003e01 <+1>: movq %rsp, %rbp // 將BP指標指向SP,因為上一級函式的棧頂指標是下一級函式的棧底指標0x100003e04 <+4>: subq $0x10, %rsp // 開始函式棧幀空間0x100003e08 <+8>: movq %rdi, -0x8(%rbp) // 將函式第一個引數入棧,也就是this 指標-> 0x100003e0c <+12>: leaq 0x15c(%rip), %rdi ; "test\n"0x100003e13 <+19>: movb $0x0, %al0x100003e15 <+21>: callq 0x100003efc ; symbol stub for: printf0x100003e1a <+26>: addq $0x10, %rsp //回收棧空間0x100003e1e <+30>: popq %rbp //出棧 指回上一層 rbp0x100003e1f <+31>: retq //指向下一條命令
透過上面的彙編我們分析,編譯器在呼叫虛擬函式表中的函式時,是透過 *(%rcx)  間接定址,然後中間做了某一個操作,跳到 test的實現,那麼這個過程中thunk做了什麼操作呢?

llvm-thunk原始碼分析

小編使用的IDE都使用的是LLVM編譯器,於是透過翻看LLVM的原始碼找到了答案: 在VTableBuilder.cpp的AddMethods函式,小編找到了答案,描述如下:
// Now go through all virtual member functions and add them to the current// vftable. This is done by// - replacing overridden methods in their existing slots, as long as they// don't require return adjustment; calculating This adjustment if needed.// - adding new slots for methods of the current base not present in any// sub-bases;// - adding new slots for methods that require Return adjustment.// We keep track of the methods visited in the sub-bases in MethodInfoMap.
編譯器在編譯的時候會判斷基類的虛擬函式派生類有沒有覆蓋,如果有實現的時候,則動態替換虛擬函式表中的地址為派生類的地址,同時:
1.會計算呼叫時this指標的地址是否需要調整,如果需要調整的話,會為當前的方法開闢一塊新的記憶體空間;
2.也會為需要this返回值的函式開闢一塊新的記憶體空間;
程式碼如下:
void VFTableBuilder::AddMethods(BaseSubobject Base, unsigned BaseDepth,const CXXRecordDecl *LastVBase, BasesSetVectorTy &VisitedBases) {const CXXRecordDecl *RD = Base.getBase();if (!RD->isPolymorphic())return;const ASTRecordLayout &Layout = Context.getASTRecordLayout(RD);// See if this class expands a vftable of the base we look at, which is either// the one defined by the vfptr base path or the primary base of the current// class.const CXXRecordDecl *NextBase = nullptr, *NextLastVBase = LastVBase; CharUnits NextBaseOffset;if (BaseDepth < WhichVFPtr.PathToIntroducingObject.size()) { NextBase = WhichVFPtr.PathToIntroducingObject[BaseDepth];if (isDirectVBase(NextBase, RD)) { NextLastVBase = NextBase; NextBaseOffset = MostDerivedClassLayout.getVBaseClassOffset(NextBase); } else { NextBaseOffset = Base.getBaseOffset() + Layout.getBaseClassOffset(NextBase); } } elseif (const CXXRecordDecl *PrimaryBase = Layout.getPrimaryBase()) { assert(!Layout.isPrimaryBaseVirtual() &&"No primary virtual bases in this ABI"); NextBase = PrimaryBase; NextBaseOffset = Base.getBaseOffset(); }if (NextBase) { AddMethods(BaseSubobject(NextBase, NextBaseOffset), BaseDepth + 1, NextLastVBase, VisitedBases);if (!VisitedBases.insert(NextBase)) llvm_unreachable("Found a duplicate primary base!"); } SmallVector<const CXXMethodDecl*, 10> VirtualMethods;// Put virtual methods in the proper order. GroupNewVirtualOverloads(RD, VirtualMethods);// Now go through all virtual member functions and add them to the current// vftable. This is done by// - replacing overridden methods in their existing slots, as long as they// don't require return adjustment; calculating This adjustment if needed.// - adding new slots for methods of the current base not present in any// sub-bases;// - adding new slots for methods that require Return adjustment.// We keep track of the methods visited in the sub-bases in MethodInfoMap.for (const CXXMethodDecl *MD : VirtualMethods) { FinalOverriders::OverriderInfo FinalOverrider = Overriders.getOverrider(MD, Base.getBaseOffset());const CXXMethodDecl *FinalOverriderMD = FinalOverrider.Method;const CXXMethodDecl *OverriddenMD = FindNearestOverriddenMethod(MD, VisitedBases); ThisAdjustment ThisAdjustmentOffset;bool ReturnAdjustingThunk = false, ForceReturnAdjustmentMangling = false; CharUnits ThisOffset = ComputeThisOffset(FinalOverrider); ThisAdjustmentOffset.NonVirtual = (ThisOffset - WhichVFPtr.FullOffsetInMDC).getQuantity();if ((OverriddenMD || FinalOverriderMD != MD) && WhichVFPtr.getVBaseWithVPtr()) CalculateVtordispAdjustment(FinalOverrider, ThisOffset, ThisAdjustmentOffset);unsigned VBIndex = LastVBase ? VTables.getVBTableIndex(MostDerivedClass, LastVBase) : 0;if (OverriddenMD) {// If MD overrides anything in this vftable, we need to update the// entries. MethodInfoMapTy::iterator OverriddenMDIterator = MethodInfoMap.find(OverriddenMD);// If the overridden method went to a different vftable, skip it.if (OverriddenMDIterator == MethodInfoMap.end())continue; MethodInfo &OverriddenMethodInfo = OverriddenMDIterator->second; VBIndex = OverriddenMethodInfo.VBTableIndex;// Let's check if the overrider requires any return adjustments.// We must create a new slot if the MD's return type is not trivially// convertible to the OverriddenMD's one.// Once a chain of method overrides adds a return adjusting vftable slot,// all subsequent overrides will also use an extra method slot. ReturnAdjustingThunk = !ComputeReturnAdjustmentBaseOffset( Context, MD, OverriddenMD).isEmpty() || OverriddenMethodInfo.UsesExtraSlot;if (!ReturnAdjustingThunk) {// No return adjustment needed - just replace the overridden method info// with the current info.MethodInfo MI(VBIndex, OverriddenMethodInfo.VFTableIndex); MethodInfoMap.erase(OverriddenMDIterator); assert(!MethodInfoMap.count(MD) &&"Should not have method info for this method yet!"); MethodInfoMap.insert(std::make_pair(MD, MI));continue; }// In case we need a return adjustment, we'll add a new slot for// the overrider. Mark the overridden method as shadowed by the new slot. OverriddenMethodInfo.Shadowed = true;// Force a special name mangling for a return-adjusting thunk// unless the method is the final overrider without this adjustment. ForceReturnAdjustmentMangling = !(MD == FinalOverriderMD && ThisAdjustmentOffset.isEmpty()); } elseif (Base.getBaseOffset() != WhichVFPtr.FullOffsetInMDC || MD->size_overridden_methods()) {// Skip methods that don't belong to the vftable of the current class,// e.g. each method that wasn't seen in any of the visited sub-bases// but overrides multiple methods of other sub-bases.continue; }// If we got here, MD is a method not seen in any of the sub-bases or// it requires return adjustment. Insert the method info for this method.MethodInfo MI(VBIndex, HasRTTIComponent ? Components.size() - 1 : Components.size(), ReturnAdjustingThunk); assert(!MethodInfoMap.count(MD) &&"Should not have method info for this method yet!"); MethodInfoMap.insert(std::make_pair(MD, MI));// Check if this overrider needs a return adjustment.// We don't want to do this for pure virtual member functions. BaseOffset ReturnAdjustmentOffset; ReturnAdjustment ReturnAdjustment;if (!FinalOverriderMD->isPure()) { ReturnAdjustmentOffset = ComputeReturnAdjustmentBaseOffset(Context, FinalOverriderMD, MD); }if (!ReturnAdjustmentOffset.isEmpty()) { ForceReturnAdjustmentMangling = true; ReturnAdjustment.NonVirtual = ReturnAdjustmentOffset.NonVirtualOffset.getQuantity();if (ReturnAdjustmentOffset.VirtualBase) {const ASTRecordLayout &DerivedLayout = Context.getASTRecordLayout(ReturnAdjustmentOffset.DerivedClass); ReturnAdjustment.Virtual.Microsoft.VBPtrOffset = DerivedLayout.getVBPtrOffset().getQuantity(); ReturnAdjustment.Virtual.Microsoft.VBIndex = VTables.getVBTableIndex(ReturnAdjustmentOffset.DerivedClass, ReturnAdjustmentOffset.VirtualBase); } } AddMethod(FinalOverriderMD, ThunkInfo(ThisAdjustmentOffset, ReturnAdjustment, ForceReturnAdjustmentMangling ? MD : nullptr)); }}
透過上面程式碼分析,在this 需要調整的時候,都是透過AddMethod(FinalOverriderMD,ThunkInfo(ThisAdjustmentOffset, ReturnAdjustment,ForceReturnAdjustmentMangling ? MD : nullptr))函式來新增一個ThunkInfo的結構體,ThunkInfo在結構體(實現在ABI.h)如下:
structThunkInfo {/// The \c this pointer adjustment. ThisAdjustment This;/// The return adjustment. ReturnAdjustment Return;/// Holds a pointer to the overridden method this thunk is for,/// if needed by the ABI to distinguish different thunks with equal/// adjustments. Otherwise, null./// CAUTION: In the unlikely event you need to sort ThunkInfos, consider using/// an ABI-specific comparator.const CXXMethodDecl *Method; ThunkInfo() : Method(nullptr) { } ThunkInfo(const ThisAdjustment &This, const ReturnAdjustment &Return,const CXXMethodDecl *Method = nullptr) : This(This), Return(Return), Method(Method) {}friendbooloperator==(const ThunkInfo &LHS, const ThunkInfo &RHS) {return LHS.This == RHS.This && LHS.Return == RHS.Return && LHS.Method == RHS.Method; }boolisEmpty()const{return This.isEmpty() && Return.isEmpty() && Method == nullptr; }};}
Thunkinfo的結構體有一個method,存放函式的真正實現,This和Return記錄this需要調整的資訊,然後在生成方法的時候,根據這些資訊,編譯器自動插入thunk函式的資訊,透過ItaniumMangleContextImpl::mangleThunk(const CXXMethodDecl *MD,const ThunkInfo &Thunk,raw_ostream &Out)的函式,我們得到了證實,函式如下:
mangle和demangle:將C++源程式識別符號(original C++ source identifier)轉換成C++ ABI識別符號(C++ ABI identifier)的過程稱為mangle;相反的過程稱為demangle。wiki
void ItaniumMangleContextImpl::mangleThunk(const CXXMethodDecl *MD,const ThunkInfo &Thunk, raw_ostream &Out) {// <special-name> ::= T <call-offset> <base encoding>// # base is the nominal target function of thunk// <special-name> ::= Tc <call-offset> <call-offset> <base encoding>// # base is the nominal target function of thunk// # first call-offset is 'this' adjustment// # second call-offset is result adjustment assert(!isa<CXXDestructorDecl>(MD) &&"Use mangleCXXDtor for destructor decls!"); CXXNameMangler Mangler(*this, Out); Mangler.getStream() << "_ZT";if (!Thunk.Return.isEmpty()) Mangler.getStream() << 'c';// Mangle the 'this' pointer adjustment. Mangler.mangleCallOffset(Thunk.This.NonVirtual, Thunk.This.Virtual.Itanium.VCallOffsetOffset);// Mangle the return pointer adjustment if there is one.if (!Thunk.Return.isEmpty()) Mangler.mangleCallOffset(Thunk.Return.NonVirtual, Thunk.Return.Virtual.Itanium.VBaseOffsetOffset); Mangler.mangleFunctionEncoding(MD);}

thunk彙編指令分析

至此,透過LLVM原始碼我們解開了thunk技術的真面目,那麼我們透過反彙編程式來驗證證實一下, 這裡使用objdump 或者逆向利器 hopper都可以,小編使用的是hopper,彙編程式碼如下:
1.我們先來看編譯器實現的thunk 版的test函式
派生類實現的test函式
編譯器實現的thunk版的test函式
2.透過上面兩張截圖我們發現
編譯器實現的thunk的test函式地址為0x100003e30
派生類實現的test函式地址為0x100003e00
下面我們來看下派生類的虛表中存的真實地址是那一個
透過上圖我們可以看到:派生類的虛表中存的真實地址為編譯器動態新增的thunk函式的地址0x100003e30。
上面分析的*(rcx)間接定址:就是呼叫thunk函式的實現,然後在thunk中去呼叫真正的派生類覆蓋的函式。
在這裡我們可以確定的 thunk技術:
就是編譯器在編譯的時候,遇到呼叫this和返回值this需要調整的地方,動態的加入對應的thunk版的函式,在thunk函式的內部實現this的偏移調整,和呼叫派生類實現的虛擬函式;並將編譯器實現的thunk函式的地址存入虛表中,而不是派生類實現的虛擬函式的地址。

thunk函式的記憶體佈局

也可以確定對應的記憶體佈局如下:
故(繼承鏈中不是第一個)虛擬函式繼承的基類指標的呼叫順序為:

virtual-thunk和non-virtual-thunk

注意:在這裡可以看到,記憶體中有兩份VBase,在多繼承中分為普通繼承、虛擬函式繼承、虛繼承。虛繼承主要是為了解決上面看到的問題:在記憶體中同時有兩份Vbase 的記憶體,將上面的程式碼改動一下就會確保記憶體中的例項只有一份:
class VBaseA: public VBase  改成 class VBaseA: public virtual VBase
class VBaseB: public VBase  改成 class VBaseB: public virtual VBase
這樣記憶體中的VBase就只有一分記憶體了。
到這裡還有問題沒有解答,就是上面截圖裡的thunk函式型別是:
我們發現thunk函式是 non-virtual-thunk型別,那對應的virtual-thunk是什麼型別呢?
在解答這個問題之前我們現看下下面的例子?
public A {virtualvoidtest() { }}public B {virtualvoidtest1() { }}public C {virtualvoidtest2() { }}public D : publicvirtual A, publicvirtual B, public C {virtualvoidtest1() { // 這裡實現的test1函式在 B類的虛擬函式表裡就是virtual-trunk的型別 }virtualvoidtest2() { // 這裡實現的test2函式在 C類的虛擬函式表示就是no-virtual-trunk的型別 }}
虛擬函式繼承和虛繼承相結合,且該類在派生類的繼承鏈中不是第一個基類的時候,則該派生類實現的虛擬函式在編譯器編譯的時候,虛表裡存放就是virtual-trunk型別。
只有虛擬函式繼承的時候,且該類在派生類的繼承鏈中不是第一個基類的時候,則該派生類實現的虛擬函式在編譯器編譯的時候,虛表裡存放就是no-virtual-trunk型別。

3  為什麼LLDB偵錯程式顯示的地址一樣呢?

如果做了偏移,那為什麼LLDB expression顯示的地址是派生類物件的首地址呢?
到了現在瞭解了什麼是thunk技術,還沒有一個問題沒有解決:就是LLDB除錯的時候,顯示的this的地址是基類偏移後的(派生類的地址),前面透過彙編分析編譯器在型別轉換的時候,做了真正的偏移,透過讀取記憶體地址也發現是偏移後的真實地址,那lldb expression獲取的地址為啥還是派生類的地址呢?由此可以猜測是LLDB偵錯程式透過exppress 命令執行的時候做了型別的轉換。
透過翻閱LLDB偵錯程式的原始碼和LLDB說明文件,透過文件得知LLDB在每次拿到一個地址,需要向用戶友好的展示的時候,首先需要透過summary format()進行格式化轉換,格式化轉化的依據是動態型別(lldb-getdynamictypeandaddress)的獲取,在LLDB原始碼的bool ItaniumABILanguageRuntime::GetDynamicTypeAndAddress (lldb-summary-format)函式中找到了答案,程式碼如下
// For Itanium, if the type has a vtable pointer in the object, it will be at// offset 0// in the object. That will point to the "address point" within the vtable// (not the beginning of the// vtable.) We can then look up the symbol containing this "address point"// and that symbol's name// demangled will contain the full class name.// The second pointer above the "address point" is the "offset_to_top". We'll// use that to get the// start of the value object which holds the dynamic type.
bool ItaniumABILanguageRuntime::GetDynamicTypeAndAddress( ValueObject &in_value, lldb::DynamicValueType use_dynamic, TypeAndOrName &class_type_or_name, Address &dynamic_address, Value::ValueType &value_type) {// For Itanium, if the type has a vtable pointer in the object, it will be at// offset 0// in the object. That will point to the "address point" within the vtable// (not the beginning of the// vtable.) We can then look up the symbol containing this "address point"// and that symbol's name// demangled will contain the full class name.// The second pointer above the "address point" is the "offset_to_top". We'll// use that to get the// start of the value object which holds the dynamic type.// class_type_or_name.Clear(); value_type = Value::ValueType::eValueTypeScalar;// Only a pointer or reference type can have a different dynamic and static// type:if (CouldHaveDynamicValue(in_value)) {// First job, pull out the address at 0 offset from the object. AddressType address_type; lldb::addr_t original_ptr = in_value.GetPointerValue(&address_type);if (original_ptr == LLDB_INVALID_ADDRESS)returnfalse; ExecutionContext exe_ctx(in_value.GetExecutionContextRef()); Process *process = exe_ctx.GetProcessPtr();if (process == nullptr)returnfalse; Status error;const lldb::addr_t vtable_address_point = process->ReadPointerFromMemory(original_ptr, error);if (!error.Success() || vtable_address_point == LLDB_INVALID_ADDRESS) {returnfalse; } class_type_or_name = GetTypeInfoFromVTableAddress(in_value, original_ptr, vtable_address_point);if (class_type_or_name) { TypeSP type_sp = class_type_or_name.GetTypeSP();// There can only be one type with a given name,// so we've just found duplicate definitions, and this// one will do as well as any other.// We don't consider something to have a dynamic type if// it is the same as the static type. So compare against// the value we were handed.if (type_sp) {if (ClangASTContext::AreTypesSame(in_value.GetCompilerType(), type_sp->GetForwardCompilerType())) {// The dynamic type we found was the same type,// so we don't have a dynamic type here...returnfalse; }// The offset_to_top is two pointers above the vtable pointer.const uint32_t addr_byte_size = process->GetAddressByteSize();const lldb::addr_t offset_to_top_location = vtable_address_point - 2 * addr_byte_size;// Watch for underflow, offset_to_top_location should be less than// vtable_address_pointif (offset_to_top_location >= vtable_address_point)returnfalse;const int64_t offset_to_top = process->ReadSignedIntegerFromMemory( offset_to_top_location, addr_byte_size, INT64_MIN, error);if (offset_to_top == INT64_MIN)returnfalse;// So the dynamic type is a value that starts at offset_to_top// above the original address. lldb::addr_t dynamic_addr = original_ptr + offset_to_top;if (!process->GetTarget().GetSectionLoadList().ResolveLoadAddress( dynamic_addr, dynamic_address)) { dynamic_address.SetRawAddress(dynamic_addr); }returntrue; } } }return class_type_or_name.IsEmpty() == false;}
透過上面程式碼分析可知,每次在透過LLDB 命令expression動態呼叫 指標地址的時候,LLDB 會去按照偵錯程式預設的格式進行格式化,格式化的前提是動態獲取到對應的型別和偏移後的地址;在碰到C++有虛表的時候,且不是虛表中的第一個基類指標的時候,就會使用指標上頭的offset_to_top 獲取到這個對應動態的型別和返回動態獲取的該型別物件開始的地址。

五  總結

  1. 上面主要驗證了在指標型別轉換的時候,編譯器內部做了真實的地址偏移;
  2. 透過上面的分析,我們得知編譯器在函式呼叫時透過thunk技術動態調整入參this指標和返回值this指標,保證C++呼叫時this的正確性;
  3. 在透過LLDB expression獲取非虛擬函式基類指標內容時,LLDB內部透過summary format進行格式化轉換,格式化轉化時會進行動態型別的獲取。

六  工具篇

1  獲取彙編程式

預處理->彙編

 clang++ -E main.cpp -o main.i
 clang++ -S main.i

objdump

objdump -S -C 可執行程式

反彙編利器: hopper

下載hopper,可執行程式拖入即可

Xcode

  Xcode->Debug->Debug WorkFlow->Show disassembly

匯出C++記憶體佈局

Clang++編譯器

clang++ -cc1 -emit-llvm -fdump-record-layouts -fdump-vtable-layouts  main.cpp

七  參考文獻

https://matklad.github.io/2017/10/21/lldb-dynamic-type.html
https://lldb.llvm.org/use/variable.html
https://github.com/llvm-mirror/lldb/blob/bc19e289f759c26e4840aab450443d4a85071139/source/Plugins/LanguageRuntime/CPlusPlus/ItaniumABI/ItaniumABILanguageRuntime.cpp#L185
https://clang.llvm.org/doxygen/VTableBuilder_8cpp_source.html#l03109
https://clang.llvm.org/doxygen/ABI_8h_source.html
相關技術:
llvm-virtual-thunk 
llvm-no-virtual-thunk 
lldb-summary-format 
lldb-getdynamictypeandaddress

資料庫安全

點選閱讀原文檢視詳情!


相關文章