OPTEE OS

Disclaimer: This note is intended for specific version of software, and it may not work for the latest version. Author does not guarantee its accuracy. Everyone refers to this note at his/her own risk.

The note for OPTEE OS consists of three parts:

  1. Memory management(MM) and allocation in OPTEE
  2. Memory layout for secure/normal world and how to increase secure memory
  3. Miscellaneous

Environment:

  • Hardware: Hikey Board
  • OPTEE OS version: 2.3.0

Memory Management and Allocation

Some terms:

  • pgt_tables is an array of pages(4 KB) (length defined by PGT_CACHE_SIZE) that are used to store page tables.
  • pgt is the node of linked list. pgt.tbl will point to one of pgt_tables.
  • pgt_cache is a linked list of pgts.

In mm/core_mmu.c:

void core_init_mmu_map(void):

Entry point to initialize mmu, called once

In mm/core_mmu_lpae.c:

void core_init_mmu_tables(struct tee_mmap_region *mm):

Initialize mmu for all cores to zero, add entries for mm, replicate to other cores’ mmu, set user_va_idx to first zero l1 entry, called once

 

void core_mmu_set_info_table(struct core_mmu_table_info *tbl_info, unsigned level, vaddr_t va_base, void *table):

tbl_info: Table info that is going to be filled

level: table level

va_base: base va that this table covers

table: pointer to actual table

 

void core_mmu_set_entry_primitive(void *table, size_t level, size_t idx, paddr_t pa, uint32_t attr):

Set corresponding entry with attribute attr

 

In mm/core_mmu.c:

void core_mmu_set_entry(struct core_mmu_table_info *tbl_info, unsigned idx,

paddr_t pa, uint32_t attr):

Call core_mmu_set_primitive to set entry

tbl_info: info of table that entry belongs to

idx: index

pa: pa that this entry points to

attr: mem attribute

 

In mm/pgt_alloc.c:

# CFG_PAGED_USER_TA is not set

CFG_WITH_LPAE=y

# CFG_

Free any existing pgt_cache by pgt_free_unlocked()

Allocate a list of pgt_cache from **** to fill in virtual addr between begin and last, calls pgt_alloc_unlocked to grab the list, spinning if there is not enough available memory.

WITH_PAGER is not set // Pager not supported for ARM64

 

void pgt_alloc(struct pgt_cache *pgt_cache, void *ctx, vaddr_t begin, vaddr_t last):

Free any existing pgt_cache by pgt_free_unlocked()

Allocate a list of pgt_cache from **** to fill in virtual addr between begin and last, calls pgt_alloc_unlocked to grab the list, spinning if there is not enough available memory.

 

static bool pgt_alloc_unlocked(struct pgt_cache *pgt_cache, void *ctx, vaddr_t begin, vaddr_t last):

Allocation is in unit of PGDIR with size 2MB. Each iteration will grab one PGDIR from free list and push to pgt_cache list.

“PGDIR is the translation table above the translation table that holds the pages.” why need this?

 

static void pgt_free_unlocked(struct pgt_cache *pgt_cache, bool save_ctx __unused):

Pop every pgt_cache from the list and push it to free list.

 

OPTEE OS uses bget as dynamic memory allocator.

In lib/libutils/isoc/bget_malloc.c:

malloc(size) will call raw_malloc(0, 0, size)

bget(s) will return ptr

In core/arch/arm/kernel/generic_boot.c:

init_runtime():

malloc_add_pool(__heap1_start, __heap1_end – __heap1_start); // defined in kern.ld.S, which defines memory sections of TEE

By default, the memory pool has only tens of KB. You may reserve a larger memory region and call malloc_add_pool to increase memory pool capacity.

Increase Secure Memory

In optee_os/core/arch/arm/plat-hikey/platform_config.h:

The entire mem space is split into DRAM and TZDRAM(if pager is not used). The boundary is defined by (DRAM0_BASE, DRAM0_SIZE) and (TZDRAM_BASE, TZDRAM_SIZE).

TEE memory is determined by (CFG_TEE_LOAD_ADDR, CFG_TEE_RAM_VA_SIZE), at the beginning of TZDRAM. If TEE memory is not large enough, tweek this.

Shared memory is determined by (CFG_SHMEM_START, CFG_SHMEM_SIZE), located below TEE memory, at the top of DRAM region.

Note: nvme partition is placed at 0x30000000, we can not grow lower unless we move it to somewhere else. Instead, we shrink memory above 0x40000000. Leave it unmapped during optee initialization(don’t change TZDRAM_SIZE, otherwise there is a error message) and map as needed in future.

In edk2/HisiPkg/HiKeyPkg/Library/HiKeyLib/HiKeyMem.c:

STATIC struct HiKeyReservedMemory {

EFI_PHYSICAL_ADDRESS         Offset;

EFI_PHYSICAL_ADDRESS         Size;

} HiKeyReservedMemoryBuffer [] = {

{ 0x05E00000, 0x00100000 },    // MCU

{ 0x05F01000, 0x00001000 },    // ADB REBOOT “REASON”

{ 0x06DFF000, 0x00001000 },    // MAILBOX

{ 0x0740F000, 0x00001000 },    // MAILBOX

{ 0x21F00000, 0x00100000 },    // PSTORE/RAMOOPS

{ 0x3E000000, 0x02000000 }     // TEE OS, change to {0x31000000, 0x0f000000}

};

#define HIKEY_EXTRA_SYSTEM_MEMORY_BASE  0x40000000

#define HIKEY_EXTRA_SYSTEM_MEMORY_SIZE 0x40000000

Note: Change parameters above will shrink memory(2nd 1GB Memory) size for normal world

In edk2/HisiPkg/HiKeyPkg/HiKey.dsc:

 # System Memory (1GB)

gArmTokenSpaceGuid.PcdSystemMemoryBase|0x00000000

#gArmTokenSpaceGuid.PcdSystemMemorySize|0x3E000000 // Size of normal world memory, change to 0x3100000 to match starting address of HiKeyReservedMemoryBuffer TEE OS.

In edk2/HisiPkg/HiKeyPkg/HiKey.fdf:

[FD.BL33_AP_UEFI]

BaseAddress   = 0x35000000|gArmTokenSpaceGuid.PcdFdBaseAddress  # The base address of the Firmware in NOR Flash. Change to 0x25000000 since 0x35000000 is occupied by secure world.

Size          = 0x000F0000|gArmTokenSpaceGuid.PcdFdSize         # The size in bytes of the FLASH Device

ErasePolarity = 1

Miscellaneous

Time

In lib/libutee/include/tee_api_types.h:

    typedef struct {

uint32_t seconds;

uint32_t millis;

uint32_t micros;

} TEE_Time;

There are two ways to get time stamp in OPTEE OS. One way is getting time from Rich Execution Environment(REE). Another way is by reading processor register.

From REE:

In core/arch/arm/kernel/tee_time.c:

TEE_Result tee_time_get_ree_time(TEE_Time *time) //Get the time of REE

Add time->micros = params.u.value.b / 1000;

This function takes 3-4 micro secs. It requires world switches, which introduce high overhead.

From:

tee_time_get_sys_time(TEE_Time *time) // Will call following function on ARM arch

static TEE_Result arm_cntpct_get_sys_time(TEE_Time *time)

Add time->micros = (cntpct % cntfrq) * TEE_TIME_MILLIS_BASE * 1000 / cntfrq;

Lower overhead, only a few instructions(mcr and arithmetic instructions).

Note: TEE_Time.micros is not part of original code. I add this because I need higher resolution timestamps.

Disable IRQ

Once a thread executes smc instruction and switch one core to secure world, it is not guaranteed that the thread will pin to specific core until it finishes. Instead, the initially assigned core may switch back to normal world and handle interrupts(from timer, I/O, etc.). And trusted application could continue execution on another core. Therefore, if you want a non-preemptive environment, please disable IRQ.

thread_mask_exception()

thread_unmask_exception()

 

Advertisements

Hardware Engineer?

Thanks to my undergraduate major (computer engineering) and lifelong impact from my dad, I have a chance to learn hardware knowledge/circuit design. This is a unique experience/add-on for people who deal with software day and night. I treat this as one of my hobbies rather than skills. And fortunately, I have successfully applied my knowledge in  prior course projects/research.

Our senior design project, “Intelligent Eye”, uses binocular vision to achieve human detection and autonomous navigation. As one of the requirements, we have to build our own PCB from scratch. Following are schematic and actual PCB. Most of the components are soldered by ourselves. If you are interested, please check our video and project website.

schematics

Intelligible Eye Schematic

pcb

Intelligible Eye PCB

Another project that I helped with is Power Sandbox. I built a prototype for WIFI power measurement based on BeagleBone Black with WIFI cape.

bbb

BBB with WIFI Cape

JTAG debugging experience for ARMTZ

Recently we ran into this issue: when the Hikey board boots and switches from trusted firmware to UEFI, the entire system hangs. Because the execution has not reached kernel, there is almost no way to debug. Therefore, we bought a bus blaster and decided to use JTAG to examine the internal status of processor. In this article I will share my experience on JTAG as a beginner.

What can we do if we have no clue where the bug is? We halte the cpu some time after the board boots up and dump register values. First we notice that the processor exception level is EL2, which means that the execution stays in normal world and either in UEFI or Hypervisor. Also the PC is messed up (pointing to 0x40XXXXXXXXXXXXXX), which implies that exception might happen. The link register stores the return address, which is the next address after faulty instruction. Sometimes CPSR may also provide useful information, such as ISA selection(ARM vs Thumb).

After narrowing down the scope of possible faulty region, we set a breakpoint a few bytes before that instruction and single stepped until it hit faulty instruction. The faulty instruction tries to push some value to the stack; however, the SP is not aligned which triggers the exception. In order to understand what it is trying to do, we want to go back to source code. But another problem is arisen. The module is dynamically loaded, therefore, it is impossible to know which module corresponds to this piece of code. So we dump a few instructions around the faulty instruction and search the source file with same consecutive instructions. After matching the binary code back to source file, we finally understand the problem. UEFI tries to reserve some space for stack. And if the available space is not enough, it will put the stack to another place. The firmware designer may only test one scenario, but the problem happened in the other path.