Systems and methods for natural language processing (NLP) are described. The systems may be trained by identifying training data including clean data and noisy data; predicting annotation information using an artificial neural network (ANN); computing a loss value for the annotation information using a weighted loss function that applies a first weight to the clean data and at least one second weight to the noisy data; and updating the ANN based on the loss value. The noisy data may be obtained by identifying a set of unannotated sentences in a target domain, delexicalizing the set of unannotated sentences, finding similar sentences in a source domain, filling at least one arbitrary value in the similar delexicalized sentences, generating annotation information for the similar delexicalized sentences using an annotation model for the source domain, and applying a heuristic mapping to produce annotation information for the sentences in the target domain