5  Prepare Scanner Data

5.1 Load Scanner Logs

We load the raw scanner logs using the load_scanner_logs function. This function parses the text logs to extract timestamps, actions (START_COPY, END_COPY), and case IDs.

For atypical cases, we use the following approach:

2025-06-03 23:05:59 - INFO - Dosya kopyalanıyor: 28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs
2025-06-03 23:06:23 - INFO - Dosya başarıyla taşındı: I:/28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs
2025-06-03 23:06:24 - INFO - Kaynak dosya silindi: K:\28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs

we get the whole file name from the line that contains “Dosya kopyalanıyor”. 28926-25;[41]1KZM2,_2_18_SS7834_1065013.svs and put it in filename column. we get timestamp from the line that contains “Dosya kopyalanıyor”. 2025-06-03 23:05:59 and put it in scan_finished_time column. we get time from the line that contains “Kaynak dosya silindi”. 2025-06-03 23:06:24 and put it in file_transferred_time column. If “Kaynak dosya silindi” is missing use the timestamp from the line that contains “Dosya kopyalanıyor”. If “Dosya kopyalanıyor” is missing use the timestamp from the line that contains “Dosya başarıyla taşındı”. If “Dosya başarıyla taşındı” is missing use the timestamp from the line that contains “Dosya kopyalanıyor”.

Then we extract erisim_numarasi from filename column.

5.2 Data Cleaning and Summary

We summarize the logs to find the start and end times for each case. We focus on “END_COPY” events as the completion of scanning/transfer. We approximate the start time using the minimum timestamp for the case.

# A tibble: 6 × 6
  erisim_numarasi    scan_complete_time  scan_start_time     slide_count_scanner
  <chr>              <dttm>              <dttm>                            <int>
1 "\u001dR"          2025-07-08 06:53:08 2025-07-08 06:52:57                   1
2 "&86%62"           2025-10-04 05:32:00 2025-10-04 05:31:32                   2
3 "&I,_14_9_SS45134… 2025-10-22 05:53:43 2025-10-22 05:53:43                   1
4 "&I,_14_9_SS45134… 2025-10-22 05:53:43 2025-10-22 05:53:35                   1
5 "&n_12_13_SS45328… 2025-08-13 00:15:43 2025-08-13 00:15:43                   1
6 "&n_12_13_SS45328… 2025-08-13 00:15:43 2025-08-13 00:15:19                   1
# ℹ 2 more variables: scanner_name_log <chr>, copied_files <chr>

5.3 Column Definitions

The processed scanner summary contains the following columns:

  • erisim_numarasi: Unique Case ID (e.g., 12345-24).
  • scan_complete_time: The timestamp when the last file in the case finished transferring (“Kaynak dosya silindi”). This represents the time the case was fully available on the network.
  • scan_start_time: The timestamp when the first file in the case started copying (“Dosya kopyalanıyor”). This represents the approximate start of the scanning/transfer process for the case.
  • slide_count_scanner: The total number of files (slides) associated with this case in the logs.
  • scanner_name_log: The name of the scanner that processed the case (e.g., SS7833), extracted from the log file path.
  • copied_files: A semicolon-separated list of all filenames associated with the case.

5.4 Save Processed Data

We save the processed scanner data in both Case Level and Slide Level formats.