Unable to start profiling. Profiler attach failed (HRESULT: 0x80131379)

Qiohu

Created May 17, 2021 17:56

I am trying to start profiling a dot process on linux, and got below error "[Undefined resource string ID:0x7379] (0x80131379)"

root@online-trainer-c39e944184354dbdb1834b36e66c0f38-fd7b95866-v9vk9:/app/dotMemoryTool# ./dotmemory get-snapshot 7
Performs memory profiling of .NET applications

Found 1 process(es):
[7] dotnet

Attaching to [7] dotnet runtime...
[Undefined resource string ID:0x7379] (0x80131379)
Can't set event mask: unknown error (hresult_error:80131379)
---
Unable to start profiling. Profiler attach failed (HRESULT: 0x80131379)

13 comments

Anna Guseva

Created May 17, 2021 19:24

Hello,

What Linux version do you use? Also please enter 'lscpu' command to get info about CPU.

Qiohu

Created May 17, 2021 19:36

This is a kubernetes pod running on Auzre VM. This issue seems randomly happen on some pod but not others.

output of cat /etc/os-release:
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Output of lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping: 7
CPU MHz: 2593.906
BogoMIPS: 5187.81
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-15
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_vnni md_clear arch_capabilities

Anna Guseva

Created May 18, 2021 10:34

Thank you for information.

Profiler core calls attach method with arguments including 2 minutes timeout. This method is executed in dotnet code and waits for the concurrent garbage collector to turn off. Attach can't be completed while concurrent GC mode is enabled.

0x80131379 is 'CORPROF_E_TIMEOUT_WAITING_FOR_CONCURRENT_GC' error. It means that something prevented GC mode switching. It could be a long GC or the process was suspended or in "not responding" state or something else.

Could you please provide more information about your site? Do you have any ideas what could be the reason for such a delay? Could the process be suspended, for example, due to migration to another pod at this moment?

Qiohu

Created May 18, 2021 16:25

Thanks Anna, when this happens, the process is running on the pod without a problem, and I tried several times, it always got this error. CPU usage was low (around 2 cores) and we allocated 6 cores to the pod. Memory usage was around 30G but we allocated 60G to the pod. When this happens, I did notice the pod running slow, meaning it processing data slowly and that was why I tried to get a dump, trying to help troubleshoot. I think if GC was busy and caused it to be slow, we should see CPU busy, right?

Qiohu

Created May 18, 2021 16:28

Is there any way to increase this timeout value? When this error happened, it happened quickly after I ran the command (several seconds), it didn't wait for 2 minutes

Anna Guseva

Created May 18, 2021 16:43

What dotnet version is targeted by your application? How much of these 30GB is used by your application?

Mikhail Khalizev

Created July 24, 2022 18:46

Is there any activity on this issue? I just got this error too. `dotmemory` immediately fails:

root@62104183f5c8:/app# dotmemory get-snapshot 1 --save-to-dir=~
Performs memory profiling of .NET applications

Found 1 process(es):
  [1] Mk.App

Attaching to [1] Mk.App runtime...
[Undefined resource string ID:0x7379] (0x80131379)
Can't set event mask: unknown error (hresult_error:80131379)
  ---
Unable to start profiling. Profiler attach failed (HRESULT: 0x80131379)

I run it in docker on Ubuntu 20.04.4 LTS.

dotmemory version: JetBrains.dotMemory.Console.linux-x64.2022.1.2.tar.gz

`top` output:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0   50.0g  38.5g  31096 S 866.7  30.6  14388:26 Mk.App

`lscpu` output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           33
Model name:                      AMD Ryzen 9 5950X 16-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         2562.084
CPU max MHz:                     3400.0000
CPU min MHz:                     2200.0000
BogoMIPS:                        6786.90
Virtualization:                  AMD-V
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        8 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0-31
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid a
                                 perfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce to
                                 poext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_
                                 ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausef
                                 ilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca

Anna Guseva

Created July 25, 2022 13:21

Hello Mikhail,

What dotnet version is targeted by your application?

Also could please execute "dotnet --version" command and send us its output?

Mikhail Khalizev

Created July 25, 2022 14:52

Yes, of cource:

root@f5e0a6d0d337:/app# dotnet --version
6.0.302

Mikhail Khalizev

Created July 25, 2022 14:53

csproj:

<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net6.0</TargetFramework>
        <Nullable>enable</Nullable>
    </PropertyGroup>
...

Anna Guseva

Created August 03, 2022 06:45

Hi Mikhail,

Unfortunately, we can't reproduce it on our side. Could you please provide us project sample on which this problem is reproduced?

Also could you please perform the following:

- Stop your application and start it again.

- Attach dotMemory to the new process. Does attach fail immediately or after some timeout?

- Attach dotMemory to the same process. Does attach fail immediately now?

Mikhail Khalizev

Created August 03, 2022 10:49

Interesting.
Now I can't reproduce the problem too.

I don't know exactly why, maybe because I added the line to csproj:

<ServerGarbageCollection>true</ServerGarbageCollection>

But the main thing is that now dotmemory works.

Anna Guseva

Created August 03, 2022 13:03

Mikhail,

Profiler uses 2 minutes timeout to wait until concurrent GC is disabled (memory profiling is impossible in concurrent GC mode). In your case, this timeout was ignored by CLR itself by unknown reason. It occurs not in our code and we can't predict what it could be without debugging.

Please sign in to leave a comment.