XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    is there a way (cli) to get the current health (lifetime) of an internal host ssd?

    Scheduled Pinned Locked Moved Hardware
    8 Posts 3 Posters 421 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      manilx
      last edited by

      All in the subject πŸ˜‰

      I wonder if I can get this info from xcpng.

      AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates πŸͺ Co-Founder CEO
        last edited by

        IIRC, we exposed SMART info, but @AtaxyaNetwork might remember more than me the details πŸ˜„

        1 Reply Last reply Reply Quote 0
        • AtaxyaNetworkA Offline
          AtaxyaNetwork Ambassador @manilx
          last edited by

          @manilx Hi !
          I wrote a XAPI plugin for smartctl a while ago. You can use it like this:

          # Get all the info
          [18:06 Chouffe plugins]# xe host-call-plugin host-uuid=8b80bcc2-d31c-4f7d-85a7-e921f67c4ec5  plugin=smartctl.py fn=check_smartctl
          {"/dev/sdf": {"power_on_time": {"hours": 9147}, "ata_version": {"minor_value": 94, "string": "ACS-4 T13/BSR INCITS 529 revision 5", "major_value": 2556}, "form_factor": {"ata_value": 3, "name": "2.5 inches"}, "firmware_version": "SVQ02B6Q", "wwn": {"oui": 9528, "naa": 5, "id": 65536604056}, "smart_status": {"passed": true}, "smartctl": {"build_info": "(local build)", "exit_status": 0, "argv": ["smartctl", "-j", "-a", "/dev/sdf"], "version": [7, 0], "svn_revision": "4883", "platform_info": "x86_64-linux-4.19.0+1"}, "temperature": {"current": 32}, "rotation_rate": 0, [...]}
          
          # get health status
          [18:06 Chouffe plugins]# xe host-call-plugin host-uuid=8b80bcc2-d31c-4f7d-85a7-e921f67c4ec5  plugin=smartctl.py fn=check_health
          {"/dev/sdf": "PASSED", "/dev/sdg": "PASSED", "/dev/sdd": "PASSED", "/dev/sde": "PASSED", "/dev/sdb": "PASSED", "/dev/sdc": "PASSED", "/dev/sda": "PASSED"}
          ``
          M 1 Reply Last reply Reply Quote 0
          • M Offline
            manilx @AtaxyaNetwork
            last edited by

            @AtaxyaNetwork Hi,

            tried but get this:

            Error code: UNKNOWN_XENAPI_PLUGIN_FUNCTION
            Error parameters: check_smartctl
            

            Running 8.3 beta

            AtaxyaNetworkA 1 Reply Last reply Reply Quote 0
            • AtaxyaNetworkA Offline
              AtaxyaNetwork Ambassador @manilx
              last edited by

              @manilx Try:

              xe host-call-plugin host-uuid=<uuid> plugin=smartctl.py fn=information
              xe host-call-plugin host-uuid=<uuid>  plugin=smartctl.py fn=health
              

              We changed the name between my test and the real plugin on xcp-ng πŸ˜„

              M 1 Reply Last reply Reply Quote 0
              • M Offline
                manilx @AtaxyaNetwork
                last edited by

                @AtaxyaNetwork Better:

                [10:19 vp6670 ~]# xe host-call-plugin host-uuid=bf2761ed-a851-4034-9cee-2b698509d46a plugin=smartctl.py fn=information
                {"/dev/nvme0": {"smart_status": {"nvme": {"value": 0}, "passed": true}, "nvme_controller_id": 1, "smartctl": {"build_info": "(local build)", "exit_status": 0, "argv": ["smartctl", "-j", "-a", "/dev/nvme0"], "version": [7, 0], "svn_revision": "4883", "platform_info": "x86_64-linux-4.19.0+1"}, "temperature": {"current": 52}, "power_on_time": {"hours": 2796}, "power_cycle_count": 45, "nvme_smart_health_information_log": {"controller_busy_time": 47870, "host_writes": 624907017, "temperature": 52, "critical_comp_time": 9175, "available_spare": 100, "host_reads": 752263796, "data_units_written": 47041497, "power_on_hours": 2796, "num_err_log_entries": 0, "critical_warning": 0, "power_cycles": 45, "warning_temp_time": 76230, "percentage_used": 3, "available_spare_threshold": 10, "media_errors": 0, "data_units_read": 74276500, "unsafe_shutdowns": 5}, "logical_block_size": 512, "nvme_number_of_namespaces": 1, "user_capacity": {"bytes": 1000204886016, "blocks": 1953525168}, "nvme_namespaces": [{"capacity": {"bytes": 1000204886016, "blocks": 1953525168}, "utilization": {"bytes": 1000204886016, "blocks": 1953525168}, "formatted_lba_size": 512, "eui64": {"oui": 9911, "ext_id": 516855111029}, "id": 1, "size": {"bytes": 1000204886016, "blocks": 1953525168}}], "nvme_ieee_oui_identifier": 9911, "json_format_version": [1, 0], "nvme_pci_vendor": {"id": 9798, "subsystem_id": 9798}, "device": {"protocol": "NVMe", "type": "nvme", "name": "/dev/nvme0", "info_name": "/dev/nvme0"}, "serial_number": "50026B77856F71D7", "firmware_version": "SBM02103", "model_name": "KINGSTON SNV2S1000G", "local_time": {"time_t": 1720084783, "asctime": "Thu Jul  4 10:19:43 2024 WEST"}}, "/dev/sda": {"power_on_time": {"hours": 12570}, "ata_version": {"minor_value": 94, "string": "ACS-4 T13/BSR INCITS 529 revision 5", "major_value": 2556}, "form_factor": {"ata_value": 3, "name": "2.5 inches"}, "firmware_version": "SVQ02B6Q", "wwn": {"oui": 9528, "naa": 5, "id": 65526876152}, "smart_status": {"passed": true}, "smartctl": {"build_info": "(local build)", "exit_status": 0, "argv": ["smartctl", "-j", "-a", "/dev/sda"], "version": [7, 0], "svn_revision": "4883", "platform_info": "x86_64-linux-4.19.0+1"}, "temperature": {"current": 51}, "rotation_rate": 0, "interface_speed": {"current": {"sata_value": 3, "units_per_second": 60, "string": "6.0 Gb/s", "bits_per_unit": 100000000}, "max": {"sata_value": 14, "units_per_second": 60, "string": "6.0 Gb/s", "bits_per_unit": 100000000}}, "user_capacity": {"bytes": 1000204886016, "blocks": 1953525168}, "ata_smart_attributes": {"table": [{"name": "Reallocated_Sector_Ct", "flags": {"error_rate": false, "string": "PO--CK ", "event_count": true, "value": 51, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": true}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 10, "when_failed": "", "worst": 100, "id": 5}, {"name": "Power_On_Hours", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 97, "raw": {"string": "12570", "value": 12570}, "thresh": 0, "when_failed": "", "worst": 97, "id": 9}, {"name": "Power_Cycle_Count", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 99, "raw": {"string": "87", "value": 87}, "thresh": 0, "when_failed": "", "worst": 99, "id": 12}, {"name": "Wear_Leveling_Count", "flags": {"error_rate": false, "string": "PO--C- ", "event_count": true, "value": 19, "updated_online": true, "performance": false, "auto_keep": false, "prefailure": true}, "value": 98, "raw": {"string": "11", "value": 11}, "thresh": 0, "when_failed": "", "worst": 98, "id": 177}, {"name": "Used_Rsvd_Blk_Cnt_Tot", "flags": {"error_rate": false, "string": "PO--C- ", "event_count": true, "value": 19, "updated_online": true, "performance": false, "auto_keep": false, "prefailure": true}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 10, "when_failed": "", "worst": 100, "id": 179}, {"name": "Program_Fail_Cnt_Total", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 10, "when_failed": "", "worst": 100, "id": 181}, {"name": "Erase_Fail_Count_Total", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 10, "when_failed": "", "worst": 100, "id": 182}, {"name": "Runtime_Bad_Block", "flags": {"error_rate": false, "string": "PO--C- ", "event_count": true, "value": 19, "updated_online": true, "performance": false, "auto_keep": false, "prefailure": true}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 10, "when_failed": "", "worst": 100, "id": 183}, {"name": "Reported_Uncorrect", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 0, "when_failed": "", "worst": 100, "id": 187}, {"name": "Airflow_Temperature_Cel", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 49, "raw": {"string": "51", "value": 51}, "thresh": 0, "when_failed": "", "worst": 32, "id": 190}, {"name": "Hardware_ECC_Recovered", "flags": {"error_rate": true, "string": "-O-RC- ", "event_count": true, "value": 26, "updated_online": true, "performance": false, "auto_keep": false, "prefailure": false}, "value": 200, "raw": {"string": "0", "value": 0}, "thresh": 0, "when_failed": "", "worst": 200, "id": 195}, {"name": "UDMA_CRC_Error_Count", "flags": {"error_rate": true, "string": "-OSRCK ", "event_count": true, "value": 62, "updated_online": true, "performance": true, "auto_keep": true, "prefailure": false}, "value": 100, "raw": {"string": "0", "value": 0}, "thresh": 0, "when_failed": "", "worst": 100, "id": 199}, {"name": "Unknown_Attribute", "flags": {"error_rate": false, "string": "-O--C- ", "event_count": true, "value": 18, "updated_online": true, "performance": false, "auto_keep": false, "prefailure": false}, "value": 99, "raw": {"string": "60", "value": 60}, "thresh": 0, "when_failed": "", "worst": 99, "id": 235}, {"name": "Total_LBAs_Written", "flags": {"error_rate": false, "string": "-O--CK ", "event_count": true, "value": 50, "updated_online": true, "performance": false, "auto_keep": true, "prefailure": false}, "value": 99, "raw": {"string": "11997733492", "value": 11997733492}, "thresh": 0, "when_failed": "", "worst": 99, "id": 241}], "revision": 1}, "ata_sct_capabilities": {"data_table_supported": true, "error_recovery_control_supported": true, "value": 61, "feature_control_supported": true}, "json_format_version": [1, 0], "logical_block_size": 512, "sata_version": {"string": "SATA 3.3", "value": 511}, "serial_number": "S5RRNF0RB82155Z", "power_cycle_count": 87, "ata_smart_self_test_log": {"standard": {"count": 0, "revision": 1}}, "ata_smart_selective_self_test_log": {"table": [{"status": {"string": "Not_testing", "value": 0}, "lba_max": 0, "lba_min": 0}, {"status": {"string": "Not_testing", "value": 0}, "lba_max": 0, "lba_min": 0}, {"status": {"string": "Not_testing", "value": 0}, "lba_max": 0, "lba_min": 0}, {"status": {"string": "Not_testing", "value": 0}, "lba_max": 0, "lba_min": 0}, {"status": {"string": "Not_testing", "value": 0}, "lba_max": 0, "lba_min": 0}], "current_read_scan": {"status": {"string": "was never started", "value": 0}, "lba_max": 65535, "lba_min": 0}, "flags": {"remainder_scan_enabled": false, "value": 0}, "power_up_scan_resume_minutes": 0, "revision": 1}, "physical_block_size": 512, "device": {"protocol": "ATA", "type": "sat", "name": "/dev/sda", "info_name": "/dev/sda [SAT]"}, "in_smartctl_database": false, "ata_smart_data": {"self_test": {"status": {"passed": true, "string": "completed without error", "value": 0}, "polling_minutes": {"short": 2, "extended": 85}}, "offline_data_collection": {"status": {"string": "was never started", "value": 0}, "completion_seconds": 0}, "capabilities": {"offline_surface_scan_supported": false, "attribute_autosave_enabled": true, "gp_logging_supported": true, "offline_is_aborted_upon_new_cmd": false, "error_logging_supported": true, "exec_offline_immediate_supported": true, "conveyance_self_test_supported": false, "self_tests_supported": true, "selective_self_test_supported": true, "values": [83, 3]}}, "local_time": {"time_t": 1720084783, "asctime": "Thu Jul  4 10:19:43 2024 WEST"}, "ata_smart_error_log": {"summary": {"count": 0, "revision": 1}}, "model_name": "Samsung SSD 870 QVO 1TB"}}
                [10:19 vp6670 ~]# xe host-call-plugin host-uuid=bf2761ed-a851-4034-9cee-2b698509d46a  plugin=smartctl.py fn=health
                {"/dev/nvme0": "PASSED", "/dev/sda": "PASSED"}
                

                Lot of data. Uff.
                Can we parse that for the values I'm interested in?
                Trying to get the info I got in proxmox:
                ScreenShot 2024-07-04 at 10.22.17.png
                The Passed I get with your 2nd command.

                Appreciate your help!

                M 1 Reply Last reply Reply Quote 0
                • M Offline
                  manilx @manilx
                  last edited by

                  @manilx Used https://jsonformatter.org/json-parser to parse the output:
                  Having this values, what is the wear? 2% ("value": 98) ?

                            "name": "Wear_Leveling_Count",
                            "flags": {
                              "error_rate": false,
                              "string": "PO--C- ",
                              "event_count": true,
                              "value": 19,
                              "updated_online": true,
                              "performance": false,
                              "auto_keep": false,
                              "prefailure": true
                            },
                            "value": 98,
                            "raw": {
                              "string": "11",
                              "value": 11
                            },
                  
                  
                  M 1 Reply Last reply Reply Quote 0
                  • M Offline
                    manilx @manilx
                    last edited by

                    @manilx Got this from ChatGPT, so I think I got this.
                    Would be great to just get these values from a cli command but I'm not up to the task πŸ˜‰

                    The output from smartctl you provided contains information about the Wear Leveling Count attribute of an SSD. This attribute is used to indicate the wear and tear of the NAND flash memory cells in the SSD.

                    Let's break down the relevant parts of the output:

                    "Wear_Leveling_Count": This is the attribute being measured.
                    "value": 98: This is the normalized value of the Wear Leveling Count. Normalized values typically start at 100 for new drives and decrease as the drive ages and wears out.
                    "raw": { "string": "11", "value": 11 }: The raw value is the actual measurement from the SSD's controller. In this case, it is 11.

                    Interpretation

                    Normalized Value (98): This indicates the relative health of the SSD. A value of 98 means the SSD is still in good health, as it is close to the initial value of 100.
                    Raw Value (11): This typically represents the number of cycles or the wear level. In this case, it likely indicates the number of times the cells have been erased and written.

                    Wear Level

                    SSDs generally have a threshold for when they are considered worn out, often around a normalized value of 10 or lower. With a normalized value of 98, your SSD is in very good health and shows minimal wear. The raw value of 11 suggests the SSD has gone through 11 cycles of wear leveling, which is quite low."

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post