PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340401 (2024) https://doi.org/10.1117/12.3055045
This PDF file contains the front matter associated with SPIE Proceedings Volume 13404, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340402 (2024) https://doi.org/10.1117/12.3050079
The development and applications of cable-driven parallel robots (CDPRs) are restricted due to model uncertainties and external disturbances, which decrease the positioning accuracy of CDPRs. To address this issue, an imitated active disturbance rejection controller (IADRC) is proposed in this paper. A fixed-time sliding mode extended state observer (FTSMESO) is designed to estimate the lumped uncertainties via the switching sliding mode variable. The fixed time convergence of estimation errors is guaranteed, and the convergence time is unrelated to the system’s initial state. Based on the FTSMESO, an IADRC is proposed and its robustness is analyzed. The simulations and experiments are conducted, and the results indicate that the designed FTSMESO effectively observes the lumped uncertainties and verifies the practicality and effectiveness of the IADRC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340403 (2024) https://doi.org/10.1117/12.3049924
Accurate object detection in driving scenarios is critical for the safety and reliability of autonomous driving systems. However, detecting small objects such as pedestrians and cyclists poses significant challenges due to inadequate point cloud information. To address these challenges, we propose FENet, a Feature Enhancement Network designed to improve multi-class recognition, particularly for small objects. Feature Enhancement Network introduces two key modules: the Skeleton Point Sampling (SPS) module and the Geometry Knowledge Base (GKB) module. The Skeleton Point Sampling module optimizes the sampling process by selectively retaining skeleton points that are uniformly distributed across the object, preserving detailed shape information. This approach enhances the network’s ability to retain critical information about small objects. The Geometry Knowledge Base module automatically collects and stores high-quality geometric features, creating a comprehensive library of representative object shapes. When encountering weak features, the Geometry Knowledge Base module supplements them with geometrically similar features from the library, thus enriching the feature representation and improving the network’s robustness to occlusion and truncation. Experiments on the KITTI dataset demonstrate that Feature Enhancement Network significantly outperforms previous LiDAR-based methods in detecting small objects, achieving 61.26% AP for pedestrians and 75.95% AP for cyclists at the moderate difficulty level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fujie Wang, Tu Wang, Junxuan Luo, Xing Li, Fang Guo, Yi Qin
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340404 (2024) https://doi.org/10.1117/12.3049958
This paper investigates the state synchronization control problem of a teleoperation system for actuators based on deep reinforcement learning, taking into account input saturation, uncertainties, and stochastic delays in the actuators. A centralized curiosity model multi-agent reinforcement learning algorithm framework is proposed for the teleoperation system composed of a local robotic manipulator and a remote robotic manipulator. Specifically, a centralized world model and a centralized curiosity network are utilized to train the actor network. Subsequently, the robotic manipulator is controlled using traditional controller tuning methods. To learn the nonlinear communication delay perturbations in the system, a Long Short-Term Memory network is applied to learn new features. Based on these features, the state structure of the partially observable Markov model is reconstructed. In simulation experiments compared with traditional controllers and baseline reinforcement learning algorithms, the proposed method demonstrates competitive control performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340405 (2024) https://doi.org/10.1117/12.3049975
Dynamic modeling of sensors is an important means to study their dynamic characteristics, laws of motion, and performance indicators. Aiming at the problem of high nonlinearity in sensor dynamic modeling, which makes it difficult to establish an accurate dynamic model, a sensor dynamic modeling method is proposed. The ideal parameters for the sensor dynamic model are chosen using the Northern Goshawk Optimization Algorithm, which is based on the regression prediction analysis method of the least squares support vector machine. Simulation shows that the NGO-LSSVM model is an economical, high-precision, and reliable sensor dynamic modeling tool.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340406 (2024) https://doi.org/10.1117/12.3050473
As electrochemical energy storage stations are progressively deployed worldwide, their safety concerns have increasingly come to light. To ensure their safe operation, this paper proposes a comprehensive safety assessment method for electrochemical energy storage stations based on the Fuzzy Analytic Hierarchy Process (FAHP) and the cloud model. Firstly, a comprehensive safety evaluation index framework for electrochemical energy storage stations is established from four dimensions: technical safety, environmental safety, fire safety, and operational and maintenance safety. Secondly, FAHP is used to quantitatively assign weights to every assessment index. Then, the cloud model is adopted to achieve a comprehensive safety assessment of the energy storage stations. Finally, the effectiveness of the constructed assessment model and method is verified through case studies, aiming to offer insights into the safety assessment for electrochemical energy storage stations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340407 (2024) https://doi.org/10.1117/12.3050474
The Combined Heat and Power Micro Grid (CHP-MG), an innovative energy infrastructure, demonstrates remarkable potential in optimizing resource utilization and mitigating environmental impact, thus holding substantial practical significance. In this paper, an optimization model of CHP-MG which combines both economic benefits and environmental factors is constructed, covering photovoltaic power supply, wind turbine, micro-gas turbine, diesel generator, electric boiler and energy storage systems. The application of cooperative game theory is proposed in our study to address the multi-stakeholder interests within CHP-MG. By defining different stakeholders and establishing a cooperation mechanism, we have achieved a harmonious balance between power generation efficiency and environmental protection. The model we constructed has been validated through case studies. Results show that this model can achieve a win-win situation in both environmental protection and economic benefits within CHP-MG. This research provides valuable reference in designing and optimizing CHP-MG.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340408 (2024) https://doi.org/10.1117/12.3050248
This article introduces a multi-functional crutch monitoring system designed with STM32F103C8T6 as the main control chip. This smart crutch monitoring system incorporates a global positioning function, allowing family members to install an application APP on their mobile phones to view the location of the user in real-time, thereby reducing the probability of the elderly getting lost. The crutch also boasts a fall detection feature. When the user falls, it will broadcast a voice message seeking help from passersby, send an SMS alert to the bound mobile phone, and display a reminder window on the APP. Furthermore, this multi-functional crutch monitoring system incorporates Road detection Capability for special locations such as zebra crossings and puddles, informing seniors of the road conditions ahead through voice broadcasts. Additionally, the smart crutch is equipped with functions to monitor heart rate and blood oxygen levels, as well as a one-button broadcast of the current time and a night light illumination feature. Experimental test results demonstrate that this multi-functional crutch monitoring system possesses extensive application prospects and value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340409 (2024) https://doi.org/10.1117/12.3049925
In the field of 3D printing, several factors affect the accuracy and robustness of printing results. The time-varying delay has a significant impact on the control of the 3D printing control system (3DPCS). To further optimize the control results, this paper focuses on the 3DPCS state space dynamic model and uses the Matrix-injection- based transformation method to deal with the derivative of Lyapunov-Krasovskii functional(LKF). Furthermore, combined with Reciprocally convex Inequality and Truncated B-L Inequality, the delay-dependent stability analysis and stabilization conditions for 3DPCS with reduced conservatism are derived. Finally, the simulation results show the superiority and improvement of this method compared with the previous research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040A (2024) https://doi.org/10.1117/12.3049986
This article delves into a data-driven approach for solving the adaptive optimal output regulation problem in continuous-time linear systems with uncertain dynamics. Employing a class of parametric Lyapunov equations in conjunction with a nominal system model, the initial stabilizing feedback gain is determined to initiate the iterative method. Then, by utilizing a policy-iteration-based technique, we design an optimal control policy to achieve precise output tracking and disturbance rejection. The prescribed convergence characteristics of the closed-loop system directly result from the properties embedded within the solution of the parametric Lyapunov equation. The applicability and performance of the proposed method are demonstrated through a numerical example involving an input-constrained double integrator system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040B (2024) https://doi.org/10.1117/12.3050656
With the attempt and exploration of intelligent construction of ICS, the security situation of cyberspace is becoming more and more serious, and network security protection must be transformed into a systematic planning and construction model. Based on this, this paper combs the concept of ICS network security operation, starts with the core functions and technical requirements of security operation, introduces the architecture of ICS network security operation, and analyzes the process of ICS network security operation and key support technologies. By summarizing the current research situation of ICS network security operation, the future development of ICS network security operation system is prospected.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
TianXiang Ren, XingShen Song, JiangYong Shi, JinSheng Deng, JiHao Cao
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040C (2024) https://doi.org/10.1117/12.3050469
Recent years have witnessed a spurt in large language models for its strong generalization performance, many continual instruction tuning methods based on parameter-efficient tuning have been proposed to further push large language models towards artificial general intelligence. However, comprehensive assessments of recent advancements in these methods are notably lacking. To fill the gap, a three-dimensional (average performance, continual learning proficiency and general ability delta) evaluation protocol are introduced based on two task streams, which include wide-span tasks and long sequence tasks. Meanwhile, three continual instruction tuning methods following different strategies are thoroughly evaluated on three distinct language models. The empirical analysis reveals that regularization-based methods are well-suited for wide-range task streams, while replay-based methods excel in long sequence task streams, particularly for tasks of rich logic reasoning in maintaining general ability. Simultaneously, the study underscores the need for new continual instruction tuning methods based on parameter-efficient tuning that balance performance on new tasks with the preservation of general capabilities, especially for more sophisticated architecture-based method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040D (2024) https://doi.org/10.1117/12.3049952
Gold remains one of the most coveted commodities, with its futures prices reaching record highs. Predicting these prices accurately is crucial for investors and market managers. This study utilizes gold futures price data from 2002 to 2024 sourced from CSMAR. Applying an LSTM neural network model for forecasting, we optimized various time series structures, parameters, and features. Our findings indicate that the LSTM model effectively predicts daily trends in gold futures prices, achieving higher accuracy compared to annual predictions. However, the model's performance diminishes when forecasting monthly and quarterly prices, revealing instability and reduced precision in discontinuous time series. These results underscore the LSTM model's strength in accurately forecasting continuous daily data over intermittent periods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tianle Li, Sheng Lin, Pengfei Zhao, Ziqing Li, Jianshe Wang
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040E (2024) https://doi.org/10.1117/12.3050605
As is well known, the objective of traditional Federated Learning (FL) is to train a global model collaboratively across multiple clients without directly accessing client data. However, traditional federated learning is frequently impeded by the heterogeneity of data, targets, and models. This work proposes a novel paradigm for federated learning, namely Federated Group Distillation (FedGKD). First, clients are grouped according to their needs and conditions. Subsequently, a knowledge distillation strategy, designated as DML, is employed to perform local distillation and global distillation on the grouped clients sequentially until the model converges. The experiments demonstrate that FedGKD is capable of effectively addressing the aforementioned three types of heterogeneous interference. Furthermore, clients can benefit from FedGKD with diverse tasks and models, ultimately achieving enhanced performance. Additionally, FedGKD is capable of reducing the load on the server to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040F (2024) https://doi.org/10.1117/12.3050450
This paper proposes developing a question-answering (QA) system for Thai learning articles. This system is a closed-domain QA system. Thai text processing was proposed to retrieve answers relevant to the natural language-based questions specified by users. The methodology to generate the keywords/topics used for retrieving answers was introduced. Matching and selecting methods for obtaining the best answer using cosine similarity is the support technique. Creating the corpus database, which contains Thai synset - a set of words with similar meanings or that may relate to each other, was explained. With the proposed corpus database, the question-answering can extract and analyze the meaning of the unbound sentence in Thai. The proposed QA system performed well, with a precision score of 87.67%, a recall score of 73.35%, and an F1 score of 81.91%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xuan Lai, Lianggui Tang, Xiuling Zhu, Liyong Xiao, Zhuo Chen, Jiajun Yang
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040G (2024) https://doi.org/10.1117/12.3050113
Large Language Models (LLMs) have achieved significant advancements in the field of natural language processing. However, they still face challenges related to integrity, timeliness, fidelity, and adaptability, often encountering hallucination issues. To address these challenges, knowledge graphs, as structured data structures storing vast knowledge in a clear manner, can be employed to augment LLMs’ knowledge. However, simple retrieval methods are insufficient for complex multi-hop reasoning tasks. Therefore, this paper proposes a novel approach, utilizing the chain of thought, employing mind maps as key prompts, aimed at dynamically reasoning over knowledge graphs by invoking LLMs while preserving graph structure information. Experimental results demonstrate that KG-CQAM not only significantly enhances LLMs’ reasoning capabilities for multi-hop complex questions but also obviates the need for any model training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040H (2024) https://doi.org/10.1117/12.3050108
In recent years, natural language processing has experienced rapid development, especially in utilizing natural language statements to quickly query target data in databases. Transforming user’s natural language into computer executable SQL statements has become an important research direction for improving user database interaction. However, with the rise of large language models, enhancing their performance in Text-to-SQL tasks through prompt engineering has become a feasible research direction. This article proposes the PCEM-SQL method, which enables a large language model to generate high-quality SQL statements through clear prompt template and carefully constructed prompt words. By introducing self-consistency and self-evaluation Mechanism of the large language model, the accuracy of Text-to-SQL is significantly improved on the open-source large language model Qwen. To verify the effectiveness of the method, we construct a small dataset (Spider-SM) and conducted experiments on GPT3.5 and Qwen, once again proving that our method has a significant improvement (13%) in Qwen’s Text-to-SQL performance, and can achieve similar results as GPT3.5.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040I (2024) https://doi.org/10.1117/12.3050015
Word Sense Disambiguation (WSD) presents significant challenges in natural language processing, particularly for under-resourced languages such as Setswana. This study evaluates six advanced language models on their ability to disambiguate multiple senses of common Setswana conjunctions, employing accuracy, F1-score, and Quadratic Weighted Kappa (QWK) as evaluation metrics. The findings reveal that LaBSE achieved the highest overall scores in simpler contexts with fewer senses, peaking at an accuracy of 83.00% and a QWK of 66.00% for the conjunction ”mme.” In contrast, PuoBERTa, while optimized for Setswana, excelled in more complex scenarios involving conjunctions with multiple senses, underscoring the importance of model choice based on the linguistic complexity of the task.
These results emphasize the critical role of tailored language models in enhancing WSD tasks for under-resourced languages. They demonstrate that specific adjustments to model training and architecture can significantly improve performance, thereby increasing the precision and applicability of NLP technologies in diverse linguistic settings. This research not only augments computational resources for Setswana but also provides a blueprint for applying similar methodologies to other less-represented languages, advancing global communication technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040J (2024) https://doi.org/10.1117/12.3050457
Transforming a sentence into a 2D representation, similar to table structures, reveals a semantic plane where elements can represent potential relations between paired entities. The 2D representation effectively addresses the common issue of overlapping relations. However, previous works directly transformed the representation from raw input, neglecting the utilization of prior knowledge, which is essential to support the entity relation extraction task. In this paper, we propose a 2D feature engineering method. The method innovatively integrates prior knowledge within 2D sentence representations, marking an advancement over previous strategies, and employs an attention mechanism to build the hitherto overlooked association between entities and prior knowledge. The fusion of prior knowledge and targeted attention enhances the model’s relational discernment capabilities. Evaluations on three benchmark datasets—ACE05 Chinese, ACE05 English, and CLTC—demonstrate the efficacy of the proposed method. It surpasses existing techniques, achieving state-of-the-art performance with F1 score improvements of 0.88%, 3.27%, and 2.83% respectively. The results indicate that 2D feature engineering can take advantage of a 2D sentence representation and make full use of prior knowledge.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040K (2024) https://doi.org/10.1117/12.3050074
Cross-lingual transfer is an effective technique in improving word sense disambiguation (WSD) in low resource languages by leveraging knowledge from other higher resource languages. However, the impact of the source language selection for the transfer is still a problem that has not been deeply explored. Current cross-lingual WSD methods employ an experimental or intuitive approach to determine which language is most suitable for transfer, based on the practitioner’s field experience and theoretical knowledge. This may lead to poor performance on languages that are dissimilar or unrelated. In this work, we present a method that combines linguistic similarity and relative linguistic entropy for measuring the transferability between two languages. The experimental results demonstrate that our method is capable of more accurately quantifying the transferability of the languages. Furthermore, we also show that there is a significant correlation between language transferability and WSD performance. These findings facilitate cross-lingual WSD model to generalize over both related and unrelated languages, thus achieving generalized zero-shot learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040L (2024) https://doi.org/10.1117/12.3049950
Effective quality control of ultrasound examination report provides guarantee for the accuracy of clinical ultrasound diagnosis. Two quality control methods of ultrasound diagnosis report text are presented in this paper to detect abnormal words and evaluate completeness of report text, respectively. One is the detection of abnormal words in report text based on deep learning model of Chinese word segmentation (CWS), which realizes the detection of various abnormal words in the report text, such as misspelled words, unwanted repeated words and non-professional words. In this paper, a BILSTMCRF model combining character similarity features is proposed. The character similarity matrix is calculated by cosine similarity, and then the matrix is connected with the character encoding direction as the input data of the BiLSTM input layer. The other is Chinese sentence segment (CSS) in the ultrasonic report, mainly to cluster the clauses according to tissue and function of the heart. The experiment results show: (1) The improved deep learning model in this paper can accurately segment the text of echocardiography reports. The model with the most comprehensive effect on the verification set was selected as the final model of this paper based on the models trained with multiple sets of parameters. The accuracy rate of this model was 0.945, the recall rate was 0.939, and the F1 value was 0.947, all at a relatively high level. Can effectively detect abnormal words in the report. (2) The segmentation algorithm in this paper was used to test the results of automatic segmentation of echocardiographic reports combined with the results of manual segmentation by doctors. The accuracy of the dynamic segmentation method is above 90%, which can effectively segment the original echocardiogram report text according to the structure and function of the heart, and then be used to evaluate the completeness of the report description.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040M (2024) https://doi.org/10.1117/12.3050137
Improving the accuracy of semantic segmentation for remote sensing images (RSIs) is crucial for the geoscientific research and applications. However, existing models tend to focus too much on the subject information for semantic objects, but less on the global-local boundary information of images. To tackle this issue, we introduce a novel approach named the Boundary Enhanced Network (BENet), which applies synchronized attention to both global and local aspects of images. Specifically, the proposed method utilizes the Global-Local Boundary Fusion module (GLBF) to aggregate the boundary information of the objects from a high-dimensional feature perspective. To adapt to the multi-scale characteristics of RSIs, GLBF is embed hierarchically to maintain boundary details across different stages. Besides, a Feature Enhancement Module (FEM) is devised with sequentially DEConv and Squeeze-and-Excitation (SE) blocks to refine the features from decoder. The experiments on the ISPRS Vaihingen and Potsdam datasets showed that BENet outperforms mainstream methodology, OA, mIoU and mF1 metrics were all significantly elevated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040N (2024) https://doi.org/10.1117/12.3049974
As critical infrastructure in both urban and rural transportation systems, road health significantly impacts regional economies and population mobility. Traditional road disease prediction methods are often costly and time-intensive due to manual inspections. This study introduces an innovative prediction method utilizing the morphological features of rutted cross sections. It combines point cloud technology for cross-section data acquisition and statistical analysis for morphological feature extraction. A machine learning-based prediction model is developed, which has demonstrated that the prediction accuracy can reach 83.05% at a pile of 3. The findings indicate that this method enhances the accuracy of road disease predictions under the dataset. This research not only offers a more precise prediction model for road disease but also provides a scientific foundation and decision-making support for road disease prevention, maintenance, and management, thereby offering considerable practical benefits.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040O (2024) https://doi.org/10.1117/12.3049985
The intelligent visual system is conspicuously influenced by external light sources during the imaging process, resulting in an increased probability of feature extraction failure. In this paper, by designing a small closed-loop illumination simulation control circuit, a novel telescopic adaptive variable universe fuzzy control algorithm (VUFP) is innovatively incorporated. Starting from the variability and uncertainty of the fuzzy rule base, the compensation characteristics of the machine vision light source are found for rapid response and identification. Experimental results show that this system can respond quickly to unrecognized results in various scenes, whether in bright or dark environments, and adjust and supplement the light source parameters to enhance the recognition efficiency of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040P (2024) https://doi.org/10.1117/12.3050138
Accurate waterway flow monitoring plays an important role in improving ship-shore coordination and promoting the construction of smart waterways. However, traditional methods, including video surveillance and infrared detectors, are susceptible to light and temperature, making it difficult to obtain accurate waterway flow monitoring results. In contrast, the LiDAR sensor is not sensitive to changes in external light and has strong adaptability to complex environments. It can realize real-time and accurate monitoring of waterway traffic. In order to realize ship target detection to obtain accurate waterway flow, we first designed a real-time 3D point cloud background filtering algorithm based on the intensity values of the background point cloud to efficiently acquire ship point cloud data and reduce the data volume. After that, we design a laser beam-based ship target detection algorithm based on the ship's point cloud features, which utilizes the characteristics of the existence time of ship point cloud on the detection cross-section and the edge characteristics of bow and stern, and the change characteristics of the point cloud on the detection cross-section. We validate the accuracy of the proposed method by ships data collected in real scenarios, and the results show that the accuracy reaches 96.3%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040Q (2024) https://doi.org/10.1117/12.3050196
This paper focuses on the semantic segmentation algorithms of point clouds in industrial defect detection, studies the issues of low accuracy and insufficient feature extraction of current point-based models in industrial application scenarios, and proposes an improved model based on PointNet++ called BDRBNet. Firstly, we designed a plug-and-play downsampling feature extraction module, named BDRB (Boundary Density Residual Block). This module not only extracts features but also primarily focuses on learning the following two types of information: (1) Boundary Information: By creating a novel branch to capture the boundary features ignored by the original set abstraction levels, the ability to recognize complex-shaped workpieces is enhanced; (2) Density information: by introducing a density-based sampling method for downsampling, it strengthens the feature representation in high-density areas, enhancing the features’ representativeness and discriminative power. Moreover, to reduce the impact of class imbalance, we also employed the Poly1FocalLoss function to train the model. Through extensive experiments conducted on an autonomously collected industrial lithium battery cover welding point cloud dataset, BDRBNet outperforms PointNet++ and other advanced technologies in performance. Specifically, the Overall Accuracy (OA) and mean Intersection over Union (mIoU) have improved from 89.2% and 54.8% to 93.8% and 66.8%, with increases of 4.6% and 12%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040R (2024) https://doi.org/10.1117/12.3049935
Aiming at coastline detection, this study proposes a novel dual-encoder edge detection framework. This method enhances edge detection by incorporating attention mechanisms that merge local features derived from traditional models with global contextual information from the Vision-Transformer network. The approach described in this paper, in conjunction with the Unet, RCF, and Swim-UNet models, utilizes an annotated coastline image dataset to conduct experiments on edge detection.The experimental results show that the proposed model has obvious advantages in accuracy, efficiency and interference immunity. The main performance is reflected in the training speed of the proposed model when trained on NVIDIA RTX-3060 GPU is faster than HED and RCF by 2.15% and 5.3%, respectively, and the pixel accuracy is improved by 3.97% compared with the Unet model. When facing more interference lines images, the proposed model also has nice results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040S (2024) https://doi.org/10.1117/12.3050009
To address issues such as low classification accuracy caused by small differences in infrared absorption spectra and severe spectral data overlap during land cover classification, this study proposes a land cover classification method based on the ConvNeXt framework, using near-infrared spectral data released by Eurostat as the research object. This method realizes the rapid differentiation of arable land, forest land, and grassland. The method uses short-time Fourier transform preprocessing to convert one-dimensional infrared absorption spectral data into two-dimensional images, optimizes and improves the ConvNeXt Block module, and integrates the CBAM attention module with it to enhance the model’s feature extraction capability and improve model accuracy. Finally, the original activation function GELU is replaced with the PReLU activation function to increase the neural network model’s nonlinear variability, improving model accuracy and efficiency. The results show that this method achieves a land cover classification accuracy of 86.58%, which is 26.60%, 20.25%, 15.38%, 4.72%, and 2.56% higher than common classification models CNN 1, CNN 2, VGG16, ResNet50, and ConvNeXt-t, respectively, verifying its accuracy and reliability in land cover classification, and providing new ideas and methods for land cover classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xue Lin, Linwei Fan, Chaoran Cui, Lei Li, Xiuyang Zhao, Qi Zou
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040T (2024) https://doi.org/10.1117/12.3050174
The detection of Human-object interaction (HOI) is essential to promote various visual tasks. It is challenging because of tiny difference between fine-grained actions, and data imbalance of HOI classes. Most approaches tackle the problems by designing complex structures or resorting to extra knowledge. However, it still suffer from inferior detection on the fine-grained HOIs. In this paper, we propose a fine-grained action learning method to advance the detection of hard-to-distinguish HOIs. Specially, we introduce the action correlation to figure out the fine-grained actions that are easy to misclassified. Furthermore, we design an action discriminating loss based on the action correlation to increase the penalty on the hard-to-distinguish actions, enhancing the detection of fine-grained HOIs. We do some comprehensive experiments on HICO-DET dataset to validate the effectiveness of the proposed fine-grained action learning method compared with the existing approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040U (2024) https://doi.org/10.1117/12.3050271
Point cloud semantic segmentation significantly facilitates the understanding of driving environment and information of significance for autonomous vehicles. However, the huge amount of annotated data required for obtaining well-performed segmentation models via supervised training has impeded further development of point-wise semantic segmentation. In addressing this challenge, our study leverages few-shot learning to improve model performance while drastically reducing the need for labeled point clouds. We also identify substantial untapped potential in the learning capabilities of segmentation networks within a few-shot learning framework, particularly when encountering novel testing categories unseen during training. Integrating the concept of prototypical networks, our method facilitates the generation of multiple prototypes to effectively model point-wise data structures via a cross-propagation strategy designed to enhance the learning of vital information in support and query samples. Drawing inspiration from the success of attention mechanism across various domains, we introduce an attention relocation module to refine attention-aware feature learning in each episode. The implications of employing multiple attention heads are also explored to further augment our model's performance. To validate the effectiveness of our method, we conduct a series of experiments and ablation studies on the well-known benchmark datasets S3DIS and ScanNet. Compared to baselines and the state-of-the-arts, our approach achieves markedly improved one-shot segmentation performance. Subsequent experiments on more shots of data also showcase the effective generalization ability of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040V (2024) https://doi.org/10.1117/12.3050075
At present, traditional image processing methods for identifying the density of knitted fabrics with different organizational structures face problems such as poor adaptability and low accuracy. A method based on an improved deep residual neural network model is proposed to detect the density of knitted fabrics. Firstly, a collection system for fabric images was established, and a fabric image dataset was constructed; Secondly, all convolutional layers with a stride of 2 are replaced with SPD convolution to reduce the loss of fine-grained information. A Dilation wise Residual (DWR) module is added to the residual layer to enhance the network's ability to capture local and global features; Finally, the SmoothLoss function was introduced to improve the accuracy of detection while accelerating convergence. The experimental results show that the method has an error of less than 2% in detecting the density of various fabrics, and has stronger adaptability to different types.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040W (2024) https://doi.org/10.1117/12.3049953
The proliferation of robot-assisted minimally invasive surgery highlights the need for advanced training tools such as cost-effective robotic endotrainers. Current surgical robots often lack haptic feedback, which is crucial for providing surgeons with a real-time sense of touch. This absence can impact the surgeon’s ability to perform delicate operations effectively. To enhance surgical training and address this deficiency, we have integrated a cost-effective haptic feedback system into a robotic endotrainer. This system incorporates both kinesthetic (force) and tactile feedback, improving the fidelity of surgical simulations and enabling more precise control during operations. Our system incorporates an innovative, cost-effective Force/Torque sensor utilizing optoelectronic technology, specifically designed to accurately detect forces and moments exerted on surgical tools with a 95% accuracy, providing essential kinesthetic feedback. Additionally, we implemented a tactile feedback mechanism that informs the surgeon of the gripping forces between the tool’s tip and the tissue. This dual feedback system enhances the fidelity of training simulations and the execution of robotic surgeries, promoting broader adoption and safer practices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040X (2024) https://doi.org/10.1117/12.3049937
Path planning Bug algorithms play a crucial role in addressing the problem of navigating through unknown environments. This paper presents a practical implementation of the TangentBug algorithm within the Robot Operating System along with a discussion of encountered issues of practical usage. The TangentBug algorithm uses data from laser and odometry sensors to construct locally optimal paths without complete knowledge of an entire environment. Virtual experiments were conducted to evaluate performance of the algorithm within the Gazebo 3D world simulation environment and RViz visualisation software using the TurtleBot 3 Burger robot. Virtual environments included simple convex and concave obstacles, and maze configurations. The developed code is provided as an open source for non-commercial academic use.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zerui Li, Teng Jiang, Dongrui Li, Jiazheng Qin, UKei Cheang
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040Y (2024) https://doi.org/10.1117/12.3049939
With the development of science and technology, cross-scale robots are becoming increasingly popular. For different application scenarios, robots with diverse scales and functions are also required. Micron and nano scale robots are typically used for targeted drug delivery or micro signal detection in the body. In order to enable micro and nano robots to enter the tissue through the finer blood vessels in the body for efficient drug delivery and other functions, it is required that the size of the robot is small and it can be controlled accurately. We here propose an achiral planar nanorobot construction capable of stable motion in the presence of a consistent rotating magnetic field. Using nanoimprint technology, magnetic nanorobots smaller than one micron are manufactured in batches. The motion of the nanorobots is controled by Helmholtz coils which create a revolving magnetic field. The preparation accuracy, moving speed, moving accuracy and biocompatibility of the nanorobots were tested, and anti-cancer drugs were loaded on the surface of the nanorobots by means of surface modification, and the rate of drug release was subsequently evaluated. The fabrication process and magnetic control strategy of the nanorobots in this work helps the future investigations of the drug-carrying robots in body.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xiaogang Zhang, Dongbao Yu, Hui Tang, Shaoxing Wang
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 134040Z (2024) https://doi.org/10.1117/12.3050120
An online automated assessment system was created and put into use to meet the high efficiency and high precision requirements for the quality inspection of nuclear fuel rod welds. This system combines cutting-edge AI technology with a B/S architecture to provide intelligent fuel rod weld analysis, management, and real-time image capture. To ensure operational convenience and system stability, the backend is built on Docker and Spring Cloud, while the frontend uses the Electron.js, React, and Ant Design frameworks. By integrating AI models, the system can effectively identify weld defects, enhancing inspection accuracy and production efficiency. The successful implementation of this system provides strong support for the safe operation of nuclear power plants and quality control in the nuclear energy industry, laying a solid foundation for the future development of automated inspection technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Jie Gao, Xing Li, Weimin Xu, Mengyu Li, Dong Liang
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340410 (2024) https://doi.org/10.1117/12.3050084
With the advancement of technology, employee training has become more flexible and personalized, with virtual interactive learning gaining significant attention. In modern corporate environments, web-based platforms offer wide accessibility, cross-platform support, and real-time updates, making them ideal for developing virtual interactive courseware. However, the performance limitations of web browsers necessitate effective optimization schemes. This paper focuses on optimizing frontend performance and memory usage for virtual interactive applications. Methods such as optimizing document node loading, Unity file loading, network transmission, Level of Detail (LOD) optimization, and large scene optimization were employed. As a result, page rendering time, input delay time, and memory usage have been optimized to meet expected standards. The implemented performance optimization plan successfully achieved the desired outcomes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340411 (2024) https://doi.org/10.1117/12.3050065
A homogeneous team of an unknown size consists of indiscernible mobile robots that carry no communication facilities and cannot play distinct roles and so should be driven a common control rule. The state of any robot is partly described by the value of its output variable, which is scalar and cyclic, like, e.g., an angular measure; the length of the cycle is unknown in general. Robots with complex second-order dynamics and uncertainties are treated, their speeds and control signals are confined within a priori bounds. Every robot has access only to its own speed and to the relative outputs of any peer that is within a finite “visibility range” if nothing “obscures” the view to the peer. A distributed continuous-time computationally cheap protocol is offered that achieves a situation where the outputs of the robots are evenly distributed over the cycle and evolve according to a given common speed profile. This protocol does not use the neighbor’s relative speeds, ensures theoretically exact convergence to the targeted deployment, succeeds in the absence of the initial informational connectivity of the team, is robust against losses and insertions of team members, and excludes collisions among the robots. These findings are illustrated via application to a platoon of wheeled robots performing a sentry mission on a rough terrain with unknown and unpredictably changing road conditions. The proposed algorithm is justified by a mathematically rigorous convergence result; its performance is confirmed by computer simulation tests.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340412 (2024) https://doi.org/10.1117/12.3050116
The COVID-19 pandemic has accelerated the development of remote assessment. This study has developed a "Remote Examination Platform" utilizing webcam eye-tracking technology to investigate respondents' hot zones and gaze trajectories during examinations. Moreover, convolutional neural network (CNN) are employed to identify image features related to respondents' behaviors, enabling the analysis of whether high and low achievers exhibit distinct hot zones and trajectories. Subsequently, a Student-Problem Chart (SP chart) is used to analyze the Caution Index for Problems (CP) to diagnose students' exam performance and detect item reactions. The results indicate that through CNN models identifying respondents' responses and eye-tracking trajectories, high-achieving students in items with abnormal attention indices demonstrate longer average dwell times and more frequent fixations compared to low achievers. Additionally, the CNN model achieves an accuracy rate of 96%. This study effectively predicts respondents' comprehension behaviors and serves as Taiwan's first artificial intelligence-based remote proctoring system (AIPS) for higher education institutions. It can assist educational institutions more efficiently in remote examinations, ensuring fairness and equity in assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Liyong Xiao, Xiaoli Cao, Can Tang, Xuan Lai, Xiuling Zhu, Zhiqiang Han
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340413 (2024) https://doi.org/10.1117/12.3049923
Large Language Models (LLMs) face challenges such as outdated knowledge, generation illusions, and opaque reasoning processes. To address these challenges, Retrieval-Augmented Generation (RAG) has emerged as a solution by integrating external knowledge bases. However, traditional RAG encounters bottlenecks when dealing with long documents, such as information loss, incoherence, and low query precision at the granularity level. To tackle these issues, a novel method based on prompt engineering for efficient extraction of information from long documents is proposed. Firstly, a prompt engineering-based document segmentation algorithm is introduced to enhance the model’s capability to handle long document inputs. Then, utilizing prompt engineering, long documents are transformed into structured long document datasets with hierarchical summaries, including titles, headings, abstracts, and keywords, highlighting key information points to enhance document understanding and retrieval matching efficiency. Experimental results demonstrate that this method significantly improves the quality and precision of information extraction from long documents, outperforming traditional retrievalaugmented generation paradigms, thus paving the way for the development of interactive question-answering systems tailored for multi-document, multi-knowledge point scenarios. This method efficiently locates relevant content in long document sets based on query semantics, providing precise responses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fifth International Conference on Control, Robotics, and Intelligent System (CCRIS 2024), 1340414 (2024) https://doi.org/10.1117/12.3050029
Singing voice synthesis (SVS) has advanced to the point where it can produce high-quality synthetic voices based on input text and musical scores. However, current SVS research predominantly focuses on synthesizing pop songs, with little attention given to other musical genres. This paper presents an end-to-end folk song singing voice synthesis model to address this gap. Firstly, we constructed a Mandarin folk song singing voice dataset named Folk107, which comprises 107 Mandarin folk songs and nursery rhymes. These songs were recorded in professional settings, resulting in a total duration of approximately three hours. Then, we developed a fully end-to-end model for Mandarin folk singing voice synthesis, named ManFolkSyn. Finally, we conducted both SVS and singing voice conversion (SVC) experiments. In the SVS experiments, MOS scores for two singers exceeded 3.60, while in the SVC experiments, similarity scores surpassed 4.00. These results demonstrate the utility of the dataset and the effectiveness of the model we proposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.